By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,857 Members | 1,816 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,857 IT Pros & Developers. It's quick & easy.

Finding and Replacing Substrings In A String

P: n/a
I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?

Sep 23 '07 #1
Share this Question
Share on Google+
7 Replies


P: n/a

"DarthBob88" <da********@gmail.comwrote in message
news:11**********************@50g2000hsm.googlegro ups.com...
>I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
You'll make life a lot easier for yourself if you can specify that the
search string cannot contain newlines.

Load each line. Call strstr() repeatedly to count the number of ocurrences
of each target string. Then calculate how much extra memory is required.

(You need to think what happens if one search string is a substring of
another, or contains an overlap)

Allocate another buffer of the right length, not forgetting the terminal
nul. Then do a search and replace. Probably the easiest way to do this is to
have two buffers, search one and replace into the other, iteratively until
you have done all the targets.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Sep 23 '07 #2

P: n/a
Malcolm McLean said:

<snip>
You'll make life a lot easier for yourself if you can specify that the
search string cannot contain newlines.
This is not in fact necessary. If you're prepared to shift stuff around in
memory a fair bit, all you need is a source buffer twice the size of the
needle. Search for the needle; if you find it, copy everything up to but
not including it to a temporary file, write the replacement needle to the
file, and then move all the subsequent contents of the buffer (i.e. the
stuff following the needle) to its beginning, and replenish it from the
input file. (Newlines are merely more grist to the mill.)

If you *don't* find it, write the first half of the buffer to the temporary
file, and then shift the second half into the first half and replenish
from the input.

When the input is exhausted and you're sure the buffer contains no needles,
write the remainder to the temporary file. Then remove and rename in the
canonical fashion.

Depending on just how much data you've got, it might be worth investigating
the Boyer-Moore string searching algorithm, since native strstr
implementations can be a bit dumb.

<snip>
(You need to think what happens if one search string is a substring of
another, or contains an overlap)
Indeed.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Sep 23 '07 #3

P: n/a
DarthBob88 <da********@gmail.comwrites:
I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
Maybe it would be a good idea to look for a library for handling that
kind of stuff? Maybe some regular expresson libraries would come in
handy?

Regards
Friedrich

--
Please remove just-for-news- to reply via e-mail.
Sep 23 '07 #4

P: n/a
On Sun, 23 Sep 2007 06:22:23 +0000, DarthBob88 wrote:
I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
Try to memmove() the remainder of the string forward, like this:
"This is a bug. \n\0"
"feature" is four characters longer than "bug", so slide the part
of the string starting with the period four characters forward,
then memcpy() "feature" where the 'b' of "bug" was. Probably there
are better ways to do that, try asking in comp.programming.

e.g.
char str[1000] = "This is a bug. \n"
char *search = "bug";
char *replace = "feature";
size_t len = strlen(str);
size_t s_len = strlen(search);
size_t r_len = strlen(replace);
char *current = str;
while (current = strstr(current, search)) /*assignment*/ {
memmove(current + r_len , current + s_len,
len - (current - str) - s_len + 1);
memcpy(current, replace, r_len);
} /*not compiled, not tested. make sure there's enough space past
*the end of the string in str. */
--
Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.

Sep 23 '07 #5

P: n/a
DarthBob88 wrote:
) I have to go through a file and replace any occurrences of a given
) string with the desired string, like replacing "bug" with "feature".
) This is made more complicated by the fact that I have to do this with
) a lot of replacements and by the fact that some of the target strings
) are two words or more long, so I can't just break up the file at
) whitespace, commas, and periods. How's the best way to do this? I've
) thought about using strstr() to find the string and strncpy() to
) replace it, but it occurs to me that it would screw up the string to
) overwrite part of it with strncpy(). How should I do this?

The Knuth-Morris-Pratt algorithm reads the charachers in the searched
string sequentially, one by one. So if you use that algo, you can quite
simply read from the file one char at a time, searching for a match.
Writing to the output should be fairly easy as well, just make sure you
only write characters when they are known to be a mismatch.

You'll have to rely on the system to make it I/O efficient.

After you've got it working, you can always optimize it by dropping in
a platform-specific I/O routine, if needed.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Sep 23 '07 #6

P: n/a
On Sun, 23 Sep 2007 12:08:58 +0200, Army1987 wrote:
char str[1000] = "This is a bug. \n"
char *search = "bug";
char *replace = "feature";
size_t len = strlen(str);
size_t s_len = strlen(search);
size_t r_len = strlen(replace);
char *current = str;
while (current = strstr(current, search)) /*assignment*/ {
memmove(current + r_len , current + s_len,
len - (current - str) - s_len + 1);
memcpy(current, replace, r_len);
}
Finding two bugs and correcting them is left as an exercise.
(Hint: one of them only shows up when search is a substring of
replace.)
--
Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.

Sep 23 '07 #7

P: n/a
Army1987 <ar******@NOSPAM.itwrites:
On Sun, 23 Sep 2007 06:22:23 +0000, DarthBob88 wrote:
>I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
Try to memmove() the remainder of the string forward, like this:
"This is a bug. \n\0"
"feature" is four characters longer than "bug", so slide the part
of the string starting with the period four characters forward,
then memcpy() "feature" where the 'b' of "bug" was. Probably there
are better ways to do that, try asking in comp.programming.

e.g.
char str[1000] = "This is a bug. \n"
char *search = "bug";
char *replace = "feature";
size_t len = strlen(str);
size_t s_len = strlen(search);
size_t r_len = strlen(replace);
char *current = str;
while (current = strstr(current, search)) /*assignment*/ {
memmove(current + r_len , current + s_len,
len - (current - str) - s_len + 1);
memcpy(current, replace, r_len);
} /*not compiled, not tested. make sure there's enough space past
*the end of the string in str. */
You're copying the buffer (well, half of it on average) every time you
do a replacement.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sep 23 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.