Hi all,
I asked this question in the C group but no one seems to be interested
in answering it. :-( Basically, I wrote a search and replace function
so I can do:
char[] source = "abcd?1234?x";
char search = '?';
char* replace = "***";
char* result = search_and_replace(source,search,replace);
result will then be "abcd***1234***x". I understand that I can
probably use string instead of char* and there's probably some API in
the C++ standard library that I can use but I want to code it as an
exercise to learn the algorithm. Can someone suggest ways to improve
the performance of my search and replace algorithm? The function lacks
some error checkings but I am more interested in the algorithm.
Thanks!
char* search_and_replace(char* source,char search,char* replace){
char* result;
size_t l = strlen(source), r = strlen(replace), i, k;
int number_of_replaces = 0;
for(i = 0; i < l; i++){
if(source[i] == search)
number_of_replaces++;
}
result = malloc((l - number_of_replaces) +
number_of_replaces * r + 1);
i = 0; k = 0;
while(k < l){
if(source[k] == search){
int j;
for(j = 0; j < r; j++){
result[i++] = replace[j];
}
}else{
result[i++] = source[k];
}
k++;
}
result[i] = 0;
return result;
} 10 2962
"pembed2003" <pe********@yahoo.com> wrote in message
news:db**************************@posting.google.c om... Hi all, I asked this question in the C group but no one seems to be interested in answering it. :-( Basically, I wrote a search and replace function so I can do:
char[] source = "abcd?1234?x"; char search = '?'; char* replace = "***";
char* result = search_and_replace(source,search,replace);
result will then be "abcd***1234***x". I understand that I can probably use string instead of char* and there's probably some API in the C++ standard library that I can use but I want to code it as an exercise to learn the algorithm. Can someone suggest ways to improve the performance of my search and replace algorithm? The function lacks some error checkings but I am more interested in the algorithm. Thanks!
char* search_and_replace(char* source,char search,char* replace){
char* result;
size_t l = strlen(source), r = strlen(replace), i, k;
int number_of_replaces = 0;
for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; }
result = malloc((l - number_of_replaces) + number_of_replaces * r + 1); i = 0; k = 0;
while(k < l){ if(source[k] == search){ int j; for(j = 0; j < r; j++){ result[i++] = replace[j]; } }else{ result[i++] = source[k]; } k++; }
result[i] = 0; return result; }
I can't see any obvious way to improve the efficiency, you seem to be doing
all the right things, like preallocating the result buffer. This loop
for(j = 0; j < r; j++){
result[i++] = replace[j];
}
might be a little faster as a memcpy
memcpy(result + i, replace, r);
i += r;
You could also try a pointer version instead of using ints for all your loop
variables. It might be faster but it might not, worth a try though.
john
"pembed2003" <pe********@yahoo.com> wrote in message char[] source = abcd?1234?x"; char search = '?'; char* replace = "***";
char* result = search_and_replace(source,search,replace);
result will then be "abcd***1234***x". I understand that I can
Wait a sec. Did you want four **** before x?
probably use string instead of char* and there's probably some API in the C++ standard library that I can use but I want to code it as an exercise to learn the algorithm. Can someone suggest ways to improve the performance of my search and replace algorithm? The function lacks some error checkings but I am more interested in the algorithm. Thanks!
char* search_and_replace(char* source,char search,char* replace){
char* result;
size_t l = strlen(source), r = strlen(replace), i, k;
int number_of_replaces = 0;
for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; }
result = malloc((l - number_of_replaces) + number_of_replaces * r + 1); i = 0; k = 0;
while(k < l){ if(source[k] == search){ int j; for(j = 0; j < r; j++){ result[i++] = replace[j]; } }else{ result[i++] = source[k]; } k++; }
result[i] = 0; return result; }
This looks good. It's efficient. Maybe put in a few comments.
One thing I can think of is that use of standard functions like memcpy
(suitable for your case) and strcpy may use special assembly instructions
created especially for memcpy, and thus the resulting code would be faster.
But I'm not an expert on what these statements are, what platforms do what,
what compilers support what, and so on.
Another thing I can think of is when you scan the array to find the number
of elements to replace, you put the index of the element into an array, so
for example in "abcd?1234?x" the array will contain 4 and 9 (the index of
the ?). Then in the next loop you can just look up the ?. This approach
may make the algorithm faster if there are few replacements in a long
stream. Also, the resulting code is more complicated, and thus harder to
maintain.
Maybe others can see other problems.
Maybe the next challenge is to do the same in place! Note, this algorithm
is not necessarily better, just different.
On 19 Jun 2004 23:14:29 -0700 in comp.lang.c++, pe********@yahoo.com
(pembed2003) wrote, Hi all, I asked this question in the C group but no one seems to be interested
Sorry, I am not interested until I see it
use std::string::find() and std::string::replace()
"John Harrison" <jo*************@hotmail.com> wrote in message news:<2j*************@uni-berlin.de>... "pembed2003" <pe********@yahoo.com> wrote in message news:db**************************@posting.google.c om...
[snip] char[] source = "abcd?1234?x"; char search = '?'; char* replace = "***";
char* result = search_and_replace(source,search,replace);
result will then be "abcd***1234***x".
[snip] char* search_and_replace(char* source,char search,char* replace){
char* result;
size_t l = strlen(source), r = strlen(replace), i, k;
int number_of_replaces = 0;
for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; }
result = malloc((l - number_of_replaces) + number_of_replaces * r + 1); i = 0; k = 0;
while(k < l){ if(source[k] == search){ int j; for(j = 0; j < r; j++){ result[i++] = replace[j]; } }else{ result[i++] = source[k]; } k++; }
result[i] = 0; return result; }
I can't see any obvious way to improve the efficiency, you seem to be doing all the right things, like preallocating the result buffer. This loop
Hi John / Siemel,
Thanks for your comment. I found another way of doing it:
char* search_and_replace2(char* source, char search, char* replace){
int i = 0;
size_t r = strlen(replace);
char* tmp = malloc(strlen(source) * r + 1), *result;
while(*source){
if(*source == search){
size_t j;
for(j = 0; j < r; j++){
tmp[i++] = replace[j];
}
}else{
tmp[i++] = *source;
}
source++;
}
tmp[i] = 0;
result = malloc(i);
strcpy(result,tmp);
free(tmp);
return result;
}
With this version, I only go through source once, but it calls malloc
twice. I will time these two version and see which one is faster. I
will also change it to use the suggestions you made to see how much
improvement I can got. Just curious, which version do you think will
be faster?
Thanks!
"pembed2003" <pe********@yahoo.com> wrote in message
news:db**************************@posting.google.c om... "John Harrison" <jo*************@hotmail.com> wrote in message
news:<2j*************@uni-berlin.de>... "pembed2003" <pe********@yahoo.com> wrote in message news:db**************************@posting.google.c om...
[snip]
char[] source = "abcd?1234?x"; char search = '?'; char* replace = "***";
char* result = search_and_replace(source,search,replace);
result will then be "abcd***1234***x". [snip]
char* search_and_replace(char* source,char search,char* replace){
char* result;
size_t l = strlen(source), r = strlen(replace), i, k;
int number_of_replaces = 0;
for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; }
result = malloc((l - number_of_replaces) + number_of_replaces * r + 1); i = 0; k = 0;
while(k < l){ if(source[k] == search){ int j; for(j = 0; j < r; j++){ result[i++] = replace[j]; } }else{ result[i++] = source[k]; } k++; }
result[i] = 0; return result; }
I can't see any obvious way to improve the efficiency, you seem to be
doing all the right things, like preallocating the result buffer. This loop
Hi John / Siemel, Thanks for your comment. I found another way of doing it:
char* search_and_replace2(char* source, char search, char* replace){ int i = 0; size_t r = strlen(replace); char* tmp = malloc(strlen(source) * r + 1), *result; while(*source){ if(*source == search){ size_t j; for(j = 0; j < r; j++){ tmp[i++] = replace[j]; } }else{ tmp[i++] = *source; } source++; } tmp[i] = 0; result = malloc(i); strcpy(result,tmp); free(tmp); return result; }
With this version, I only go through source once, but it calls malloc twice. I will time these two version and see which one is faster. I will also change it to use the suggestions you made to see how much improvement I can got. Just curious, which version do you think will be faster?
Thanks!
I would expect the first to be faster. Timing malloc can be tricky however,
it could be that the second is faster at first but if your program runs for
a while the extra malloc starts to slow it down.
A third possibility would be to use a fixed size temporary buffer and only
call malloc if the temporary space needed exceeds the size of the temporary
buffer. Like this
char* search_and_replace3(char* source, char search, char* replace){
int i = 0;
size_t r = strlen(replace);
char tmp_buffer[1000], *tmp, *result;
if (strlen(source) * r + 1 > 1000)
tmp = malloc(strlen(source) * r + 1);
else
tmp = tmp_buffer;
// code as before
if (tmp != tmp_buffer)
free(tmp);
return result;
}
This way you avoid the cost of the extra malloc most of the time.
john
"John Harrison" <jo*************@hotmail.com> wrote in message for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; }
You could also try a pointer version instead of using ints for all your
loop variables. It might be faster but it might not, worth a try though.
This is a good point. The expression p[i] implies an arithmetic operation
as in *(p+sizeof(*p)*i). I imagine this would be slower on all platforms
than just *(p2), but could be wrong. Also, some compilers may realize
you're using the index variables as pointers to an array, and replace them
with pointer version. Or you could just do it explicitly:
const char * scan = source;
while (true) {
const char c = *scan;
if (!c) break;
if (c == search) ++number_of_replaces;
++scan;
}
An advantage of the iterator style is that now it's easy to generalize to
any container, say a list of chars or any other object. Though it does take
some getting used to. When I do interviews or talk with my friends from
work, and ask them to write a function to find the first occurrence of a
certain character in a string (i.e.. like the std::find algorithm), they
almost always use the p[i] style like so:
const char * find(const char * s, char c) {
int N = strlen(s);
for (int i=0; i<N; i++) {
if (p[i] == c) return &p[i];
}
return NULL;
}
"pembed2003" <pe********@yahoo.com> wrote in message char* search_and_replace2(char* source, char search, char* replace){ int i = 0; size_t r = strlen(replace); char* tmp = malloc(strlen(source) * r + 1), *result; while(*source){ if(*source == search){ size_t j; for(j = 0; j < r; j++){ tmp[i++] = replace[j]; } }else{ tmp[i++] = *source; } source++; } tmp[i] = 0; result = malloc(i); strcpy(result,tmp); free(tmp); return result; }
With this version, I only go through source once, but it calls malloc twice. I will time these two version and see which one is faster. I will also change it to use the suggestions you made to see how much improvement I can got. Just curious, which version do you think will be faster?
My guess is the first way will be faster because malloc and free are
generally expensive calls (though see John's excellent reply on this issue
too). Also note that the strcpy at the end implies the 2nd pass through the
loop, but if it translates to a special assembler function, it might be
faster than an explicit byte by byte scan.
The 2nd way also uses a lot of space.
"pembed2003" <pe********@yahoo.com> wrote in message size_t l = strlen(source), r = strlen(replace), i, k;
int number_of_replaces = 0;
for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; }
Also, strlen(source) implies one pass through the string, although it might
be very fast if it translates to an assembler instruction, though it's
probably still O(N). Assuming there's no special assembler instruction,
this way would be faster
const char * scan = source;
for ( ; ; ++scan) {
const char c = *scan;
if (!c) break;
if (c == search)
number_of_replaces++;
}
l = scan - source; // same as strlen(source)
Though it's possible you don't need to know l as John pointed out. pe********@yahoo.com (pembed2003) wrote in message [snip]
char[] source = "abcd?1234?x"; char search = '?'; char* replace = "***";
char* result = search_and_replace(source,search,replace);
result will then be "abcd***1234***x". [snip]
char* search_and_replace(char* source,char search,char* replace){
char* result;
size_t l = strlen(source), r = strlen(replace), i, k;
int number_of_replaces = 0;
for(i = 0; i < l; i++){ if(source[i] == search) number_of_replaces++; }
result = malloc((l - number_of_replaces) + number_of_replaces * r + 1); i = 0; k = 0;
while(k < l){ if(source[k] == search){ int j; for(j = 0; j < r; j++){ result[i++] = replace[j]; } }else{ result[i++] = source[k]; } k++; }
result[i] = 0; return result; }
char* search_and_replace2(char* source, char search, char* replace){ int i = 0; size_t r = strlen(replace); char* tmp = malloc(strlen(source) * r + 1), *result; while(*source){ if(*source == search){ size_t j; for(j = 0; j < r; j++){ tmp[i++] = replace[j]; } }else{ tmp[i++] = *source; } source++; } tmp[i] = 0; result = malloc(i); strcpy(result,tmp); free(tmp); return result; }
With this version, I only go through source once, but it calls malloc twice. I will time these two version and see which one is faster. I will also change it to use the suggestions you made to see how much improvement I can got. Just curious, which version do you think will be faster?
I time these 2 version and found out that the first version is faster.
I time the 2 functions with 10 million iterations and here are the
numbers:
time test
17.0u 0.0s 0:17.01 99.9% 0+0K 0+0io 2pf+0w
time test2
28.2u 0.0s 0:28.29 99.8% 0+0K 0+0io 2pf+0w
test = first version (walks the source twice with one malloc)
test2 = second version (walks the srouce onec with two malloc)
Thanks!
"pembed2003" <pe********@yahoo.com> wrote in message time test 17.0u 0.0s 0:17.01 99.9% 0+0K 0+0io 2pf+0w
time test2 28.2u 0.0s 0:28.29 99.8% 0+0K 0+0io 2pf+0w
test = first version (walks the source twice with one malloc) test2 = second version (walks the srouce onec with two malloc)
Did John's suggestion of using pointers rather than integers make a
difference? This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: gyromagnetic |
last post by:
Hi,
I have written a function that searches a text string for various
words. The text is searched using a boolean 'and' or a boolean 'or' of
the input list of search terms.
Since I need to use...
|
by: Les Juby |
last post by:
A year or two back I needed a search script to scan thru HTML files
on a client site. Usual sorta thing. A quick search turned up a
neat script that provided great search results. It was fast,...
|
by: Ken Fine |
last post by:
I'm looking to find or create an ASP script that will take a string, examine
it for a search term, and if it finds the search term in the string, return
the highlighted search term along with the...
|
by: Martin Evans |
last post by:
Sorry, yet another REGEX question. I've been struggling with trying to get
a regular expression to do the following example in Python:
Search and replace all instances of "sleeping" with "dead"....
|
by: coolami4u |
last post by:
I need a program that simulates the search-and-replace operation in a text editor. The program is to have only three function calls in main. The first function prompts the user to type a string of...
|
by: peterhall |
last post by:
In VBA an Access module has a find method - works perfectly to find a string inside a module. i'm working in A97 (legacy) systems (large ones) and want to write code that searches all modules so that...
|
by: ravindarjobs |
last post by:
hi......
i am using ms access 2003,vb6
i have a form.
in that i have 2 buttons
1. start search
2 stop search
when i click the "start search" button the fucntion SearchSystem() is called,...
|
by: Merlin1857 |
last post by:
How to search multiple fields using ASP
A major issue for me when I first started writing in VB Script was constructing the ability to search a table using multiple field input from a form and...
|
by: silmana |
last post by:
Hi i have this script that i want to use as php or html but i cant find the problem, could anyone solve the problem, i dont know why i cannot use it in php or html file
// OBS! Några saker måste...
|
by: Kemmylinns12 |
last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
|
by: WisdomUfot |
last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
|
by: Carina712 |
last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
|
by: BLUEPANDA |
last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...
|
by: Johno34 |
last post by:
I have this click event on my form. It speaks to a Datasheet Subform
Private Sub Command260_Click()
Dim r As DAO.Recordset
Set r = Form_frmABCD.Form.RecordsetClone
r.MoveFirst
Do
If...
|
by: ezappsrUS |
last post by:
Hi,
I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...
|
by: jack2019x |
last post by:
hello, Is there code or static lib for hook swapchain present?
I wanna hook dxgi swapchain present for dx11 and dx9.
|
by: DizelArs |
last post by:
Hi all)
Faced with a problem, element.click() event doesn't work in Safari browser.
Tried various tricks like emulating touch event through a function:
let clickEvent = new Event('click', {...
|
by: F22F35 |
last post by:
I am a newbie to Access (most programming for that matter). I need help in creating an Access database that keeps the history of each user in a database. For example, a user might have lesson 1 sent...
| |