By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
 424,853 Members | 938 Online + Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,853 IT Pros & Developers. It's quick & easy.

# improve my search and replace function

 P: n/a Hi all, I asked this question in the C group but no one seems to be interested in answering it. :-( Basically, I wrote a search and replace function so I can do: char[] source = "abcd?1234?x"; char search = '?'; char* replace = "***"; char* result = search_and_replace(source,search,replace); result will then be "abcd***1234***x". I understand that I can probably use string instead of char* and there's probably some API in the C++ standard library that I can use but I want to code it as an exercise to learn the algorithm. Can someone suggest ways to improve the performance of my search and replace algorithm? The function lacks some error checkings but I am more interested in the algorithm. Thanks! char* search_and_replace(char* source,char search,char* replace){ char* result; size_t l = strlen(source), r = strlen(replace), i, k; int number_of_replaces = 0; for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; } result = malloc((l - number_of_replaces) + number_of_replaces * r + 1); i = 0; k = 0; while(k < l){ if(source[k] == search){ int j; for(j = 0; j < r; j++){ result[i++] = replace[j]; } }else{ result[i++] = source[k]; } k++; } result[i] = 0; return result; } Jul 22 '05 #1
Share this Question
10 Replies

 P: n/a "pembed2003" wrote in message news:db**************************@posting.google.c om... Hi all, I asked this question in the C group but no one seems to be interested in answering it. :-( Basically, I wrote a search and replace function so I can do: char[] source = "abcd?1234?x"; char search = '?'; char* replace = "***"; char* result = search_and_replace(source,search,replace); result will then be "abcd***1234***x". I understand that I can probably use string instead of char* and there's probably some API in the C++ standard library that I can use but I want to code it as an exercise to learn the algorithm. Can someone suggest ways to improve the performance of my search and replace algorithm? The function lacks some error checkings but I am more interested in the algorithm. Thanks! char* search_and_replace(char* source,char search,char* replace){ char* result; size_t l = strlen(source), r = strlen(replace), i, k; int number_of_replaces = 0; for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; } result = malloc((l - number_of_replaces) + number_of_replaces * r + 1); i = 0; k = 0; while(k < l){ if(source[k] == search){ int j; for(j = 0; j < r; j++){ result[i++] = replace[j]; } }else{ result[i++] = source[k]; } k++; } result[i] = 0; return result; } I can't see any obvious way to improve the efficiency, you seem to be doing all the right things, like preallocating the result buffer. This loop for(j = 0; j < r; j++){ result[i++] = replace[j]; } might be a little faster as a memcpy memcpy(result + i, replace, r); i += r; You could also try a pointer version instead of using ints for all your loop variables. It might be faster but it might not, worth a try though. john Jul 22 '05 #2

 P: n/a "pembed2003" wrote in message char[] source = abcd?1234?x"; char search = '?'; char* replace = "***"; char* result = search_and_replace(source,search,replace); result will then be "abcd***1234***x". I understand that I can Wait a sec. Did you want four **** before x? probably use string instead of char* and there's probably some API in the C++ standard library that I can use but I want to code it as an exercise to learn the algorithm. Can someone suggest ways to improve the performance of my search and replace algorithm? The function lacks some error checkings but I am more interested in the algorithm. Thanks! char* search_and_replace(char* source,char search,char* replace){ char* result; size_t l = strlen(source), r = strlen(replace), i, k; int number_of_replaces = 0; for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; } result = malloc((l - number_of_replaces) + number_of_replaces * r + 1); i = 0; k = 0; while(k < l){ if(source[k] == search){ int j; for(j = 0; j < r; j++){ result[i++] = replace[j]; } }else{ result[i++] = source[k]; } k++; } result[i] = 0; return result; } This looks good. It's efficient. Maybe put in a few comments. One thing I can think of is that use of standard functions like memcpy (suitable for your case) and strcpy may use special assembly instructions created especially for memcpy, and thus the resulting code would be faster. But I'm not an expert on what these statements are, what platforms do what, what compilers support what, and so on. Another thing I can think of is when you scan the array to find the number of elements to replace, you put the index of the element into an array, so for example in "abcd?1234?x" the array will contain 4 and 9 (the index of the ?). Then in the next loop you can just look up the ?. This approach may make the algorithm faster if there are few replacements in a long stream. Also, the resulting code is more complicated, and thus harder to maintain. Maybe others can see other problems. Maybe the next challenge is to do the same in place! Note, this algorithm is not necessarily better, just different. Jul 22 '05 #3

 P: n/a On 19 Jun 2004 23:14:29 -0700 in comp.lang.c++, pe********@yahoo.com (pembed2003) wrote,Hi all,I asked this question in the C group but no one seems to be interested Sorry, I am not interested until I see it use std::string::find() and std::string::replace() Jul 22 '05 #4

 P: n/a "John Harrison" wrote in message news:<2j*************@uni-berlin.de>... "pembed2003" wrote in message news:db**************************@posting.google.c om... [snip] char[] source = "abcd?1234?x"; char search = '?'; char* replace = "***"; char* result = search_and_replace(source,search,replace); result will then be "abcd***1234***x". [snip] char* search_and_replace(char* source,char search,char* replace){ char* result; size_t l = strlen(source), r = strlen(replace), i, k; int number_of_replaces = 0; for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; } result = malloc((l - number_of_replaces) + number_of_replaces * r + 1); i = 0; k = 0; while(k < l){ if(source[k] == search){ int j; for(j = 0; j < r; j++){ result[i++] = replace[j]; } }else{ result[i++] = source[k]; } k++; } result[i] = 0; return result; } I can't see any obvious way to improve the efficiency, you seem to be doing all the right things, like preallocating the result buffer. This loop Hi John / Siemel, Thanks for your comment. I found another way of doing it: char* search_and_replace2(char* source, char search, char* replace){ int i = 0; size_t r = strlen(replace); char* tmp = malloc(strlen(source) * r + 1), *result; while(*source){ if(*source == search){ size_t j; for(j = 0; j < r; j++){ tmp[i++] = replace[j]; } }else{ tmp[i++] = *source; } source++; } tmp[i] = 0; result = malloc(i); strcpy(result,tmp); free(tmp); return result; } With this version, I only go through source once, but it calls malloc twice. I will time these two version and see which one is faster. I will also change it to use the suggestions you made to see how much improvement I can got. Just curious, which version do you think will be faster? Thanks! Jul 22 '05 #5

 P: n/a "pembed2003" wrote in message news:db**************************@posting.google.c om... "John Harrison" wrote in message news:<2j*************@uni-berlin.de>... "pembed2003" wrote in message news:db**************************@posting.google.c om... [snip] char[] source = "abcd?1234?x"; char search = '?'; char* replace = "***"; char* result = search_and_replace(source,search,replace); result will then be "abcd***1234***x". [snip] char* search_and_replace(char* source,char search,char* replace){ char* result; size_t l = strlen(source), r = strlen(replace), i, k; int number_of_replaces = 0; for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; } result = malloc((l - number_of_replaces) + number_of_replaces * r + 1); i = 0; k = 0; while(k < l){ if(source[k] == search){ int j; for(j = 0; j < r; j++){ result[i++] = replace[j]; } }else{ result[i++] = source[k]; } k++; } result[i] = 0; return result; } I can't see any obvious way to improve the efficiency, you seem to be doing all the right things, like preallocating the result buffer. This loop Hi John / Siemel, Thanks for your comment. I found another way of doing it: char* search_and_replace2(char* source, char search, char* replace){ int i = 0; size_t r = strlen(replace); char* tmp = malloc(strlen(source) * r + 1), *result; while(*source){ if(*source == search){ size_t j; for(j = 0; j < r; j++){ tmp[i++] = replace[j]; } }else{ tmp[i++] = *source; } source++; } tmp[i] = 0; result = malloc(i); strcpy(result,tmp); free(tmp); return result; } With this version, I only go through source once, but it calls malloc twice. I will time these two version and see which one is faster. I will also change it to use the suggestions you made to see how much improvement I can got. Just curious, which version do you think will be faster? Thanks! I would expect the first to be faster. Timing malloc can be tricky however, it could be that the second is faster at first but if your program runs for a while the extra malloc starts to slow it down. A third possibility would be to use a fixed size temporary buffer and only call malloc if the temporary space needed exceeds the size of the temporary buffer. Like this char* search_and_replace3(char* source, char search, char* replace){ int i = 0; size_t r = strlen(replace); char tmp_buffer, *tmp, *result; if (strlen(source) * r + 1 > 1000) tmp = malloc(strlen(source) * r + 1); else tmp = tmp_buffer; // code as before if (tmp != tmp_buffer) free(tmp); return result; } This way you avoid the cost of the extra malloc most of the time. john Jul 22 '05 #6

 P: n/a "John Harrison" wrote in message for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; } You could also try a pointer version instead of using ints for all your loop variables. It might be faster but it might not, worth a try though. This is a good point. The expression p[i] implies an arithmetic operation as in *(p+sizeof(*p)*i). I imagine this would be slower on all platforms than just *(p2), but could be wrong. Also, some compilers may realize you're using the index variables as pointers to an array, and replace them with pointer version. Or you could just do it explicitly: const char * scan = source; while (true) { const char c = *scan; if (!c) break; if (c == search) ++number_of_replaces; ++scan; } An advantage of the iterator style is that now it's easy to generalize to any container, say a list of chars or any other object. Though it does take some getting used to. When I do interviews or talk with my friends from work, and ask them to write a function to find the first occurrence of a certain character in a string (i.e.. like the std::find algorithm), they almost always use the p[i] style like so: const char * find(const char * s, char c) { int N = strlen(s); for (int i=0; i

 P: n/a "pembed2003" wrote in message char* search_and_replace2(char* source, char search, char* replace){ int i = 0; size_t r = strlen(replace); char* tmp = malloc(strlen(source) * r + 1), *result; while(*source){ if(*source == search){ size_t j; for(j = 0; j < r; j++){ tmp[i++] = replace[j]; } }else{ tmp[i++] = *source; } source++; } tmp[i] = 0; result = malloc(i); strcpy(result,tmp); free(tmp); return result; } With this version, I only go through source once, but it calls malloc twice. I will time these two version and see which one is faster. I will also change it to use the suggestions you made to see how much improvement I can got. Just curious, which version do you think will be faster? My guess is the first way will be faster because malloc and free are generally expensive calls (though see John's excellent reply on this issue too). Also note that the strcpy at the end implies the 2nd pass through the loop, but if it translates to a special assembler function, it might be faster than an explicit byte by byte scan. The 2nd way also uses a lot of space. Jul 22 '05 #8

 P: n/a "pembed2003" wrote in message size_t l = strlen(source), r = strlen(replace), i, k; int number_of_replaces = 0; for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; } Also, strlen(source) implies one pass through the string, although it might be very fast if it translates to an assembler instruction, though it's probably still O(N). Assuming there's no special assembler instruction, this way would be faster const char * scan = source; for ( ; ; ++scan) { const char c = *scan; if (!c) break; if (c == search) number_of_replaces++; } l = scan - source; // same as strlen(source) Though it's possible you don't need to know l as John pointed out. Jul 22 '05 #9

 P: n/a pe********@yahoo.com (pembed2003) wrote in message [snip] char[] source = "abcd?1234?x"; char search = '?'; char* replace = "***"; char* result = search_and_replace(source,search,replace); result will then be "abcd***1234***x". [snip] char* search_and_replace(char* source,char search,char* replace){ char* result; size_t l = strlen(source), r = strlen(replace), i, k; int number_of_replaces = 0; for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; } result = malloc((l - number_of_replaces) + number_of_replaces * r + 1); i = 0; k = 0; while(k < l){ if(source[k] == search){ int j; for(j = 0; j < r; j++){ result[i++] = replace[j]; } }else{ result[i++] = source[k]; } k++; } result[i] = 0; return result; } char* search_and_replace2(char* source, char search, char* replace){ int i = 0; size_t r = strlen(replace); char* tmp = malloc(strlen(source) * r + 1), *result; while(*source){ if(*source == search){ size_t j; for(j = 0; j < r; j++){ tmp[i++] = replace[j]; } }else{ tmp[i++] = *source; } source++; } tmp[i] = 0; result = malloc(i); strcpy(result,tmp); free(tmp); return result; } With this version, I only go through source once, but it calls malloc twice. I will time these two version and see which one is faster. I will also change it to use the suggestions you made to see how much improvement I can got. Just curious, which version do you think will be faster? I time these 2 version and found out that the first version is faster. I time the 2 functions with 10 million iterations and here are the numbers: time test 17.0u 0.0s 0:17.01 99.9% 0+0K 0+0io 2pf+0w time test2 28.2u 0.0s 0:28.29 99.8% 0+0K 0+0io 2pf+0w test = first version (walks the source twice with one malloc) test2 = second version (walks the srouce onec with two malloc) Thanks! Jul 22 '05 #10

 P: n/a "pembed2003" wrote in message time test 17.0u 0.0s 0:17.01 99.9% 0+0K 0+0io 2pf+0w time test2 28.2u 0.0s 0:28.29 99.8% 0+0K 0+0io 2pf+0w test = first version (walks the source twice with one malloc) test2 = second version (walks the srouce onec with two malloc) Did John's suggestion of using pointers rather than integers make a difference? Jul 22 '05 #11

### This discussion thread is closed

Replies have been disabled for this discussion. 