improve my search and replace function

pembed2003

Hi all,
I asked this question in the C group but no one seems to be interested
in answering it. :-( Basically, I wrote a search and replace function
so I can do:

char[] source = "abcd?1234? x";
char search = '?';
char* replace = "***";

char* result = search_and_repl ace(source,sear ch,replace);

result will then be "abcd***1234*** x". I understand that I can
probably use string instead of char* and there's probably some API in
the C++ standard library that I can use but I want to code it as an
exercise to learn the algorithm. Can someone suggest ways to improve
the performance of my search and replace algorithm? The function lacks
some error checkings but I am more interested in the algorithm.
Thanks!

char* search_and_repl ace(char* source,char search,char* replace){

char* result;

size_t l = strlen(source), r = strlen(replace) , i, k;

int number_of_repla ces = 0;

for(i = 0; i < l; i++){
if(source[i] == search)
number_of_repla ces++;
}

result = malloc((l - number_of_repla ces) +
number_of_repla ces * r + 1);
i = 0; k = 0;

while(k < l){
if(source[k] == search){
int j;
for(j = 0; j < r; j++){
result[i++] = replace[j];
}
}else{
result[i++] = source[k];
}
k++;
}

result[i] = 0;
return result;
}

Jul 22 '05 #1

Subscribe Reply

3109

John Harrison

"pembed2003 " <pe********@yah oo.com> wrote in message
news:db******** *************** ***@posting.goo gle.com...

Hi all,
I asked this question in the C group but no one seems to be interested
in answering it. :-( Basically, I wrote a search and replace function
so I can do:

char[] source = "abcd?1234? x";
char search = '?';
char* replace = "***";

char* result = search_and_repl ace(source,sear ch,replace);

result will then be "abcd***1234*** x". I understand that I can
probably use string instead of char* and there's probably some API in
the C++ standard library that I can use but I want to code it as an
exercise to learn the algorithm. Can someone suggest ways to improve
the performance of my search and replace algorithm? The function lacks
some error checkings but I am more interested in the algorithm.
Thanks!

char* search_and_repl ace(char* source,char search,char* replace){

char* result;

size_t l = strlen(source), r = strlen(replace) , i, k;

int number_of_repla ces = 0;

for(i = 0; i < l; i++){
if(source[i] == search)
number_of_repla ces++;
}

result = malloc((l - number_of_repla ces) +
number_of_repla ces * r + 1);
i = 0; k = 0;

while(k < l){
if(source[k] == search){
int j;
for(j = 0; j < r; j++){
result[i++] = replace[j];
}
}else{
result[i++] = source[k];
}
k++;
}

result[i] = 0;
return result;
}

I can't see any obvious way to improve the efficiency, you seem to be doing
all the right things, like preallocating the result buffer. This loop

for(j = 0; j < r; j++){
result[i++] = replace[j];
}

might be a little faster as a memcpy

memcpy(result + i, replace, r);
i += r;

You could also try a pointer version instead of using ints for all your loop
variables. It might be faster but it might not, worth a try though.

john

Jul 22 '05 #2

Siemel Naran

"pembed2003 " <pe********@yah oo.com> wrote in message

char[] source = abcd?1234?x";
char search = '?';
char* replace = "***";

char* result = search_and_repl ace(source,sear ch,replace);

result will then be "abcd***1234*** x". I understand that I can
Wait a sec. Did you want four **** before x?
probably use string instead of char* and there's probably some API in
the C++ standard library that I can use but I want to code it as an
exercise to learn the algorithm. Can someone suggest ways to improve
the performance of my search and replace algorithm? The function lacks
some error checkings but I am more interested in the algorithm.
Thanks!

char* search_and_repl ace(char* source,char search,char* replace){

char* result;

size_t l = strlen(source), r = strlen(replace) , i, k;

int number_of_repla ces = 0;

for(i = 0; i < l; i++){
if(source[i] == search)
number_of_repla ces++;
}

result = malloc((l - number_of_repla ces) +
number_of_repla ces * r + 1);
i = 0; k = 0;

while(k < l){
if(source[k] == search){
int j;
for(j = 0; j < r; j++){
result[i++] = replace[j];
}
}else{
result[i++] = source[k];
}
k++;
}

result[i] = 0;
return result;
}

This looks good. It's efficient. Maybe put in a few comments.

One thing I can think of is that use of standard functions like memcpy
(suitable for your case) and strcpy may use special assembly instructions
created especially for memcpy, and thus the resulting code would be faster.
But I'm not an expert on what these statements are, what platforms do what,
what compilers support what, and so on.

Another thing I can think of is when you scan the array to find the number
of elements to replace, you put the index of the element into an array, so
for example in "abcd?1234? x" the array will contain 4 and 9 (the index of
the ?). Then in the next loop you can just look up the ?. This approach
may make the algorithm faster if there are few replacements in a long
stream. Also, the resulting code is more complicated, and thus harder to
maintain.

Maybe others can see other problems.

Maybe the next challenge is to do the same in place! Note, this algorithm
is not necessarily better, just different.

Jul 22 '05 #3

David Harmon

On 19 Jun 2004 23:14:29 -0700 in comp.lang.c++, pe********@yaho o.com
(pembed2003) wrote,

Hi all,
I asked this question in the C group but no one seems to be interested

Sorry, I am not interested until I see it
use std::string::fi nd() and std::string::re place()

Jul 22 '05 #4

pembed2003

"John Harrison" <jo************ *@hotmail.com> wrote in message news:<2j******* ******@uni-berlin.de>...

"pembed2003 " <pe********@yah oo.com> wrote in message
news:db******** *************** ***@posting.goo gle.com...

[snip]

char[] source = "abcd?1234? x";
char search = '?';
char* replace = "***";

char* result = search_and_repl ace(source,sear ch,replace);

result will then be "abcd***1234*** x".
[snip]

char* search_and_repl ace(char* source,char search,char* replace){

char* result;

size_t l = strlen(source), r = strlen(replace) , i, k;

int number_of_repla ces = 0;

for(i = 0; i < l; i++){
if(source[i] == search)
number_of_repla ces++;
}

result = malloc((l - number_of_repla ces) +
number_of_repla ces * r + 1);
i = 0; k = 0;

while(k < l){
if(source[k] == search){
int j;
for(j = 0; j < r; j++){
result[i++] = replace[j];
}
}else{
result[i++] = source[k];
}
k++;
}

result[i] = 0;
return result;
}

I can't see any obvious way to improve the efficiency, you seem to be doing
all the right things, like preallocating the result buffer. This loop

Hi John / Siemel,
Thanks for your comment. I found another way of doing it:

char* search_and_repl ace2(char* source, char search, char* replace){
int i = 0;
size_t r = strlen(replace) ;
char* tmp = malloc(strlen(s ource) * r + 1), *result;
while(*source){
if(*source == search){
size_t j;
for(j = 0; j < r; j++){
tmp[i++] = replace[j];
}
}else{
tmp[i++] = *source;
}
source++;
}
tmp[i] = 0;
result = malloc(i);
strcpy(result,t mp);
free(tmp);
return result;
}

With this version, I only go through source once, but it calls malloc
twice. I will time these two version and see which one is faster. I
will also change it to use the suggestions you made to see how much
improvement I can got. Just curious, which version do you think will
be faster?

Thanks!

Jul 22 '05 #5

John Harrison

"pembed2003 " <pe********@yah oo.com> wrote in message
news:db******** *************** ***@posting.goo gle.com...

"John Harrison" <jo************ *@hotmail.com> wrote in message

news:<2j******* ******@uni-berlin.de>...

"pembed2003 " <pe********@yah oo.com> wrote in message
news:db******** *************** ***@posting.goo gle.com...

[snip]

char[] source = "abcd?1234? x";
char search = '?';
char* replace = "***";

char* result = search_and_repl ace(source,sear ch,replace);

result will then be "abcd***1234*** x".
[snip]

char* search_and_repl ace(char* source,char search,char* replace){

char* result;

size_t l = strlen(source), r = strlen(replace) , i, k;

int number_of_repla ces = 0;

for(i = 0; i < l; i++){
if(source[i] == search)
number_of_repla ces++;
}

result = malloc((l - number_of_repla ces) +
number_of_repla ces * r + 1);
i = 0; k = 0;

while(k < l){
if(source[k] == search){
int j;
for(j = 0; j < r; j++){
result[i++] = replace[j];
}
}else{
result[i++] = source[k];
}
k++;
}

result[i] = 0;
return result;
}

I can't see any obvious way to improve the efficiency, you seem to be doing all the right things, like preallocating the result buffer. This loop

Hi John / Siemel,
Thanks for your comment. I found another way of doing it:

char* search_and_repl ace2(char* source, char search, char* replace){
int i = 0;
size_t r = strlen(replace) ;
char* tmp = malloc(strlen(s ource) * r + 1), *result;
while(*source){
if(*source == search){
size_t j;
for(j = 0; j < r; j++){
tmp[i++] = replace[j];
}
}else{
tmp[i++] = *source;
}
source++;
}
tmp[i] = 0;
result = malloc(i);
strcpy(result,t mp);
free(tmp);
return result;
}

With this version, I only go through source once, but it calls malloc
twice. I will time these two version and see which one is faster. I
will also change it to use the suggestions you made to see how much
improvement I can got. Just curious, which version do you think will
be faster?

Thanks!

I would expect the first to be faster. Timing malloc can be tricky however,
it could be that the second is faster at first but if your program runs for
a while the extra malloc starts to slow it down.

A third possibility would be to use a fixed size temporary buffer and only
call malloc if the temporary space needed exceeds the size of the temporary
buffer. Like this

char* search_and_repl ace3(char* source, char search, char* replace){
int i = 0;
size_t r = strlen(replace) ;
char tmp_buffer[1000], *tmp, *result;
if (strlen(source) * r + 1 > 1000)
tmp = malloc(strlen(s ource) * r + 1);
else
tmp = tmp_buffer;

// code as before

if (tmp != tmp_buffer)
free(tmp);
return result;
}

This way you avoid the cost of the extra malloc most of the time.

john

Jul 22 '05 #6

Siemel Naran

"John Harrison" <jo************ *@hotmail.com> wrote in message

for(i = 0; i < l; i++){
if(source[i] == search)
number_of_repla ces++;
}
You could also try a pointer version instead of using ints for all your loop variables. It might be faster but it might not, worth a try though.

This is a good point. The expression p[i] implies an arithmetic operation
as in *(p+sizeof(*p)* i). I imagine this would be slower on all platforms
than just *(p2), but could be wrong. Also, some compilers may realize
you're using the index variables as pointers to an array, and replace them
with pointer version. Or you could just do it explicitly:

const char * scan = source;
while (true) {
const char c = *scan;
if (!c) break;
if (c == search) ++number_of_rep laces;
++scan;
}

An advantage of the iterator style is that now it's easy to generalize to
any container, say a list of chars or any other object. Though it does take
some getting used to. When I do interviews or talk with my friends from
work, and ask them to write a function to find the first occurrence of a
certain character in a string (i.e.. like the std::find algorithm), they
almost always use the p[i] style like so:

const char * find(const char * s, char c) {
int N = strlen(s);
for (int i=0; i<N; i++) {
if (p[i] == c) return &p[i];
}
return NULL;
}

Jul 22 '05 #7

Siemel Naran

"pembed2003 " <pe********@yah oo.com> wrote in message

char* search_and_repl ace2(char* source, char search, char* replace){
int i = 0;
size_t r = strlen(replace) ;
char* tmp = malloc(strlen(s ource) * r + 1), *result;
while(*source){
if(*source == search){
size_t j;
for(j = 0; j < r; j++){
tmp[i++] = replace[j];
}
}else{
tmp[i++] = *source;
}
source++;
}
tmp[i] = 0;
result = malloc(i);
strcpy(result,t mp);
free(tmp);
return result;
}

With this version, I only go through source once, but it calls malloc
twice. I will time these two version and see which one is faster. I
will also change it to use the suggestions you made to see how much
improvement I can got. Just curious, which version do you think will
be faster?

My guess is the first way will be faster because malloc and free are
generally expensive calls (though see John's excellent reply on this issue
too). Also note that the strcpy at the end implies the 2nd pass through the
loop, but if it translates to a special assembler function, it might be
faster than an explicit byte by byte scan.

The 2nd way also uses a lot of space.

Jul 22 '05 #8

Siemel Naran

"pembed2003 " <pe********@yah oo.com> wrote in message

size_t l = strlen(source), r = strlen(replace) , i, k;

int number_of_repla ces = 0;

for(i = 0; i < l; i++){
if(source[i] == search)
number_of_repla ces++;
}

Also, strlen(source) implies one pass through the string, although it might
be very fast if it translates to an assembler instruction, though it's
probably still O(N). Assuming there's no special assembler instruction,
this way would be faster

const char * scan = source;
for ( ; ; ++scan) {
const char c = *scan;
if (!c) break;
if (c == search)
number_of_repla ces++;
}
l = scan - source; // same as strlen(source)

Though it's possible you don't need to know l as John pointed out.

Jul 22 '05 #9

pembed2003

pe********@yaho o.com (pembed2003) wrote in message

[snip]

char[] source = "abcd?1234? x";
char search = '?';
char* replace = "***";

char* result = search_and_repl ace(source,sear ch,replace);

result will then be "abcd***1234*** x".
[snip]

char* search_and_repl ace(char* source,char search,char* replace){

char* result;

size_t l = strlen(source), r = strlen(replace) , i, k;

int number_of_repla ces = 0;

for(i = 0; i < l; i++){
if(source[i] == search)
number_of_repla ces++;
}

result = malloc((l - number_of_repla ces) +
number_of_repla ces * r + 1);
i = 0; k = 0;

while(k < l){
if(source[k] == search){
int j;
for(j = 0; j < r; j++){
result[i++] = replace[j];
}
}else{
result[i++] = source[k];
}
k++;
}

result[i] = 0;
return result;
}

char* search_and_repl ace2(char* source, char search, char* replace){
int i = 0;
size_t r = strlen(replace) ;
char* tmp = malloc(strlen(s ource) * r + 1), *result;
while(*source){
if(*source == search){
size_t j;
for(j = 0; j < r; j++){
tmp[i++] = replace[j];
}
}else{
tmp[i++] = *source;
}
source++;
}
tmp[i] = 0;
result = malloc(i);
strcpy(result,t mp);
free(tmp);
return result;
}

With this version, I only go through source once, but it calls malloc
twice. I will time these two version and see which one is faster. I
will also change it to use the suggestions you made to see how much
improvement I can got. Just curious, which version do you think will
be faster?

I time these 2 version and found out that the first version is faster.
I time the 2 functions with 10 million iterations and here are the
numbers:

time test
17.0u 0.0s 0:17.01 99.9% 0+0K 0+0io 2pf+0w

time test2
28.2u 0.0s 0:28.29 99.8% 0+0K 0+0io 2pf+0w

test = first version (walks the source twice with one malloc)
test2 = second version (walks the srouce onec with two malloc)

Thanks!

Jul 22 '05 #10

Similar topics

2386

string search function

by: gyromagnetic | last post by:

Hi, I have written a function that searches a text string for various words. The text is searched using a boolean 'and' or a boolean 'or' of the input list of search terms. Since I need to use this function for many long strings and many search words, I would like to use as efficient a method as possible. Are there improvements that can...

Python

8707

The Ultimate HTML Search Script. But.......!

by: Les Juby | last post by:

A year or two back I needed a search script to scan thru HTML files on a client site. Usual sorta thing. A quick search turned up a neat script that provided great search results. It was fast, returned the hyperlinked page title, filename, and the body txt (30 preceding and following words) in context with the search word highlighted. ...

ASP / Active Server Pages

2666

String question: Returning portion of string with words surrounding highlighted search term?

by: Ken Fine | last post by:

I'm looking to find or create an ASP script that will take a string, examine it for a search term, and if it finds the search term in the string, return the highlighted search term along with the words that surround it. In other words, I want the search term highlighted and shown in an excerpt of the context in which it appears. Any...

ASP / Active Server Pages

5865

RegEx conditional search and replace

by: Martin Evans | last post by:

Sorry, yet another REGEX question. I've been struggling with trying to get a regular expression to do the following example in Python: Search and replace all instances of "sleeping" with "dead". This parrot is sleeping. Really, it is sleeping. to This parrot is dead. Really, it is dead.

Python

2794

search and replace operation in a text editor

by: coolami4u | last post by:

I need a program that simulates the search-and-replace operation in a text editor. The program is to have only three function calls in main. The first function prompts the user to type a string of fewer than 80 characters. It then prompts the user to type the search-substring of 10 or fewer characters. Finally, it prompts the user to type the...

C / C++

5827

searching Access modules

by: peterhall | last post by:

In VBA an Access module has a find method - works perfectly to find a string inside a module. i'm working in A97 (legacy) systems (large ones) and want to write code that searches all modules so that I can check on the possible effects of a change. But the 97 modules collection is open modules only. Access 97 doesn't have an allmodules...

Microsoft Access / VBA

2502

how to stop the search in the middle

by: ravindarjobs | last post by:

hi...... i am using ms access 2003,vb6 i have a form. in that i have 2 buttons 1. start search 2 stop search when i click the "start search" button the fucntion SearchSystem() is called, it will search for a particular file in the computer(searches entire drives).

Visual Basic 4 / 5 / 6

7518

How To Dynamically Search A Database Table

by: Merlin1857 | last post by:

How to search multiple fields using ASP A major issue for me when I first started writing in VB Script was constructing the ability to search a table using multiple field input from a form and having the sql statement dynamically built according to the input provided by the user. I have used the method described here hundreds of times it is...

ASP / Active Server Pages

1864

search engine help

by: silmana | last post by:

Hi i have this script that i want to use as php or html but i cant find the problem, could anyone solve the problem, i dont know why i cannot use it in php or html file // OBS! Några saker måste justeras för att skriptet skall fungera // 1) Du måste ange antal webbsidor som skall genomsökas // 2) Du måste lista dessa webbsidor // 3) Du...

PHP

7408

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...

Windows Server

7661

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...

C / C++

7815

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...

Online Marketing

7433

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...

Windows Server

7763

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...

General

3458

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...

Networking - Hardware / Configuration

3444

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1891

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1020

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP