sp**@grog.net wrote:
Is there a well-known algorithm for replacing many substrings in a
string? For example, I'd like to take the string "abc def ghi jkl mno
pqr" and replace, say, every instance of "abc", "ghi", and "mno" with
another value.
Of course the brute-force approach is straight forward. Just iterate
over the full string N times (once for "abc", "ghi", and "mno"), find
all instances, and replace them using the normal std::string member
functions. Of course this seems terribly inefficient as I will have to
traverse my string many times (in my real applications I may have to
replace tens of strings).
I imagine this sort of thing comes up a lot in parsers, but I'm not
sure where to begin.
There is a "standard" approach. You should look at "flex" (or lex). It
just so happens that I have written somthing recently that I can share
with you.
I have attached somthing that should work (though probably much slower
than flex or lex) but it has the property that it has more or less the
same performance regardless of the number of strings you're matching.
The attached code not general. It will only match strings, not classes
of strings or regular expressions (like lex or flex) but because of
that, it has the property that you can add new terminals (strings to
check for) dynamically without any post-processing.
The attached code requires that you construct a "traverser" which can
simply wrap a string or a std::istream object so it should be able to
read from just about anything you can think of. I've used it in an old
editor that needed a better way of dealing with escape sequences.
The basic algorithm is a simple state machine. (DFA) Every character
that comes in is a transition on in the state machine, if there is no
transition for a given character, then the sequence is deemed to be
"invalid" or simply not one of the "terminals" you're looking for in
which case you need to pop off one character in the "push-back" buffer
and try again. In your case, the characters that are "popped off" are
untouched parts of the stream.
Another possible approach and probably very fast, is to use a modified
Burrows-Wheeler transform. Using the first phase of the BW transform,
you create a list of pointers to sorted substrings of the original
string. Once you do that, checking with your list of strings to
substitute is a modified merge sort, where you match the sub-strings
with the "terminals" you're looking for, and then you re-sort the
pointers that match the original order. Then it's a simple matter of
scanning the list of remaining pointers and perfoming the substitution
taking into account overlap and end-conditions.
Enjoy