473,394 Members | 1,865 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Finding and Replacing Substrings In A String

I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?

Sep 23 '07 #1
7 4882

"DarthBob88" <da********@gmail.comwrote in message
news:11**********************@50g2000hsm.googlegro ups.com...
>I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
You'll make life a lot easier for yourself if you can specify that the
search string cannot contain newlines.

Load each line. Call strstr() repeatedly to count the number of ocurrences
of each target string. Then calculate how much extra memory is required.

(You need to think what happens if one search string is a substring of
another, or contains an overlap)

Allocate another buffer of the right length, not forgetting the terminal
nul. Then do a search and replace. Probably the easiest way to do this is to
have two buffers, search one and replace into the other, iteratively until
you have done all the targets.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Sep 23 '07 #2
Malcolm McLean said:

<snip>
You'll make life a lot easier for yourself if you can specify that the
search string cannot contain newlines.
This is not in fact necessary. If you're prepared to shift stuff around in
memory a fair bit, all you need is a source buffer twice the size of the
needle. Search for the needle; if you find it, copy everything up to but
not including it to a temporary file, write the replacement needle to the
file, and then move all the subsequent contents of the buffer (i.e. the
stuff following the needle) to its beginning, and replenish it from the
input file. (Newlines are merely more grist to the mill.)

If you *don't* find it, write the first half of the buffer to the temporary
file, and then shift the second half into the first half and replenish
from the input.

When the input is exhausted and you're sure the buffer contains no needles,
write the remainder to the temporary file. Then remove and rename in the
canonical fashion.

Depending on just how much data you've got, it might be worth investigating
the Boyer-Moore string searching algorithm, since native strstr
implementations can be a bit dumb.

<snip>
(You need to think what happens if one search string is a substring of
another, or contains an overlap)
Indeed.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Sep 23 '07 #3
DarthBob88 <da********@gmail.comwrites:
I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
Maybe it would be a good idea to look for a library for handling that
kind of stuff? Maybe some regular expresson libraries would come in
handy?

Regards
Friedrich

--
Please remove just-for-news- to reply via e-mail.
Sep 23 '07 #4
On Sun, 23 Sep 2007 06:22:23 +0000, DarthBob88 wrote:
I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
Try to memmove() the remainder of the string forward, like this:
"This is a bug. \n\0"
"feature" is four characters longer than "bug", so slide the part
of the string starting with the period four characters forward,
then memcpy() "feature" where the 'b' of "bug" was. Probably there
are better ways to do that, try asking in comp.programming.

e.g.
char str[1000] = "This is a bug. \n"
char *search = "bug";
char *replace = "feature";
size_t len = strlen(str);
size_t s_len = strlen(search);
size_t r_len = strlen(replace);
char *current = str;
while (current = strstr(current, search)) /*assignment*/ {
memmove(current + r_len , current + s_len,
len - (current - str) - s_len + 1);
memcpy(current, replace, r_len);
} /*not compiled, not tested. make sure there's enough space past
*the end of the string in str. */
--
Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.

Sep 23 '07 #5
DarthBob88 wrote:
) I have to go through a file and replace any occurrences of a given
) string with the desired string, like replacing "bug" with "feature".
) This is made more complicated by the fact that I have to do this with
) a lot of replacements and by the fact that some of the target strings
) are two words or more long, so I can't just break up the file at
) whitespace, commas, and periods. How's the best way to do this? I've
) thought about using strstr() to find the string and strncpy() to
) replace it, but it occurs to me that it would screw up the string to
) overwrite part of it with strncpy(). How should I do this?

The Knuth-Morris-Pratt algorithm reads the charachers in the searched
string sequentially, one by one. So if you use that algo, you can quite
simply read from the file one char at a time, searching for a match.
Writing to the output should be fairly easy as well, just make sure you
only write characters when they are known to be a mismatch.

You'll have to rely on the system to make it I/O efficient.

After you've got it working, you can always optimize it by dropping in
a platform-specific I/O routine, if needed.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Sep 23 '07 #6
On Sun, 23 Sep 2007 12:08:58 +0200, Army1987 wrote:
char str[1000] = "This is a bug. \n"
char *search = "bug";
char *replace = "feature";
size_t len = strlen(str);
size_t s_len = strlen(search);
size_t r_len = strlen(replace);
char *current = str;
while (current = strstr(current, search)) /*assignment*/ {
memmove(current + r_len , current + s_len,
len - (current - str) - s_len + 1);
memcpy(current, replace, r_len);
}
Finding two bugs and correcting them is left as an exercise.
(Hint: one of them only shows up when search is a substring of
replace.)
--
Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.

Sep 23 '07 #7
Army1987 <ar******@NOSPAM.itwrites:
On Sun, 23 Sep 2007 06:22:23 +0000, DarthBob88 wrote:
>I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
Try to memmove() the remainder of the string forward, like this:
"This is a bug. \n\0"
"feature" is four characters longer than "bug", so slide the part
of the string starting with the period four characters forward,
then memcpy() "feature" where the 'b' of "bug" was. Probably there
are better ways to do that, try asking in comp.programming.

e.g.
char str[1000] = "This is a bug. \n"
char *search = "bug";
char *replace = "feature";
size_t len = strlen(str);
size_t s_len = strlen(search);
size_t r_len = strlen(replace);
char *current = str;
while (current = strstr(current, search)) /*assignment*/ {
memmove(current + r_len , current + s_len,
len - (current - str) - s_len + 1);
memcpy(current, replace, r_len);
} /*not compiled, not tested. make sure there's enough space past
*the end of the string in str. */
You're copying the buffer (well, half of it on average) every time you
do a replacement.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Sep 23 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Kelly | last post by:
I am reading in a file and I want to go through and change the punctuation to spaces before I continue with the rest of my regexprs. I have tried using preg_replace() and str_replace() and...
2
by: Paweł | last post by:
Hello! I'm looking for efficient code or site where I can find code for finding one string in another string. String which I search should have "wild" characters like '?' for any one char and...
3
by: Will McGugan | last post by:
Hi, Is there a simple way of replacing a large number of substrings in a string? I was hoping that str.replace could take a dictionary and use it to replace the occurrences of the keys with the...
1
by: Tung Chau | last post by:
Hi, I need help with an efficient implementation of the above problem in C. Suffix tree does not seem to help much in this case. Any idea? Please help. Thanks. Tung Chau
9
by: C3 | last post by:
I have to process some data in C that is given to me as a char * array. I have a fairly large number of substrings (well, they're not actually printable, but let's treat them as strings) that I...
5
by: Jim Lawton | last post by:
Hello group, I'm not very experienced in C#, but I can't find any example of this (common) programming problem. I have a string which contains a repeated substring - lets say it's like :- ...
8
by: girish | last post by:
Hi, I want to generate all non-empty substrings of a string of length >=2. Also, each substring is to be paired with 'string - substring' part and vice versa. Thus, gives me , , , , , ] etc....
8
by: Choi | last post by:
Hi ! I wonder how I can replace a quote by a "\n" in a string. For example my string is rrrrrrr'ttttttt and I want to obtain : rrrrrrr ttttttt I tried the function find, but I don't know...
4
by: sandvet03 | last post by:
I am trying to expand on a earlier program for counting subs and now i am trying to replace substrings within a given string. For example if the main string was "The cat in the hat" i am trying to...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.