Connecting Tech Pros Worldwide Forums | Help | Site Map

is there an alternative to strstr

Ramprasad A Padmanabhan
Guest
 
Posts: n/a
#1: Nov 13 '05
In my program I am checking wether a emailid exists in a list

I have in the complete_list a string like
"<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
string of emailids sorted

Now I want to find out if another ID "<lm@abc.com>" exists in this list


How is it possible to optimize the search given that the complete list
string is sorted in order

Thanks
Ram



David Rubin
Guest
 
Posts: n/a
#2: Nov 13 '05

re: is there an alternative to strstr


Ramprasad A Padmanabhan wrote:[color=blue]
> In my program I am checking wether a emailid exists in a list
>
> I have in the complete_list a string like
> "<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
> string of emailids sorted
>
> Now I want to find out if another ID "<lm@abc.com>" exists in this list
>
>
> How is it possible to optimize the search given that the complete list
> string is sorted in order[/color]

bsearch

/david

--
"As a scientist, Throckmorton knew that if he were ever to break wind in
the echo chamber, he would never hear the end of it."

Dan Pop
Guest
 
Posts: n/a
#3: Nov 13 '05

re: is there an alternative to strstr


In <8W8nb.24958$ri.4366350@twister.nyc.rr.com> David Rubin <fullname@warpmail.net> writes:
[color=blue]
>Ramprasad A Padmanabhan wrote:[color=green]
>> In my program I am checking wether a emailid exists in a list
>>
>> I have in the complete_list a string like
>> "<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
>> string of emailids sorted
>>
>> Now I want to find out if another ID "<lm@abc.com>" exists in this list
>>
>>
>> How is it possible to optimize the search given that the complete list
>> string is sorted in order[/color]
>
>bsearch[/color]

Doesn't seem to work very well when you have to search inside a string,
does it?

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan.Pop@ifh.de
Tom St Denis
Guest
 
Posts: n/a
#4: Nov 13 '05

re: is there an alternative to strstr



"Dan Pop" <Dan.Pop@cern.ch> wrote in message
news:bnjakc$fan$2@sunnews.cern.ch...[color=blue]
> In <8W8nb.24958$ri.4366350@twister.nyc.rr.com> David Rubin[/color]
<fullname@warpmail.net> writes:[color=blue]
>[color=green]
> >Ramprasad A Padmanabhan wrote:[color=darkred]
> >> In my program I am checking wether a emailid exists in a list
> >>
> >> I have in the complete_list a string like
> >> "<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
> >> string of emailids sorted
> >>
> >> Now I want to find out if another ID "<lm@abc.com>" exists in this[/color][/color][/color]
list[color=blue][color=green][color=darkred]
> >>
> >>
> >> How is it possible to optimize the search given that the complete list
> >> string is sorted in order[/color]
> >
> >bsearch[/color]
>
> Doesn't seem to work very well when you have to search inside a string,
> does it?[/color]

in theory if they're all the same size it could work. Though yeah you're
right for general strings bsearch isn't too helpful.

Tom


Dan Pop
Guest
 
Posts: n/a
#5: Nov 13 '05

re: is there an alternative to strstr


In <bnj5m0$126eor$1@ID-166953.news.uni-berlin.de> Ramprasad A Padmanabhan <ramprasad@netcore.co.in> writes:
[color=blue]
>In my program I am checking wether a emailid exists in a list
>
>I have in the complete_list a string like
>"<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
>string of emailids sorted
>
>Now I want to find out if another ID "<lm@abc.com>" exists in this list
>
>How is it possible to optimize the search given that the complete list
>string is sorted in order[/color]

To exploit this fact, you need a different data format, a plain string is
not particularly suitable for a binary search.

If you can store the email addresses in an array of strings, one per
string, and keep them still sorted, the bsearch() function from the
standard library is likely to be faster than a strstr() in a plain
string. The comparison function can be a simple wrapper around strcmp().

If you're performing this search often enough on large lists of email
addresses, it may be worth considering changing the format in which
you keep the email addresses, so that bsearch becomes an option.
Compare the performance of the two approaches on real data sets to
see which is better.

OTOH, if you build the list of addresses on the fly and the input is NOT
already sorted, you may consider putting them in a binary tree structure.
Except for pathological cases (input data already sorted) the search is
simple and fast and new data can be added with very little overhead (see
also the example in K&R2).

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan.Pop@ifh.de
David Rubin
Guest
 
Posts: n/a
#6: Nov 13 '05

re: is there an alternative to strstr


Dan Pop wrote:
[color=blue][color=green][color=darkred]
>>>In my program I am checking wether a emailid exists in a list
>>>
>>>I have in the complete_list a string like
>>>"<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
>>>string of emailids sorted
>>>
>>>Now I want to find out if another ID "<lm@abc.com>" exists in this list
>>>
>>>
>>>How is it possible to optimize the search given that the complete list
>>>string is sorted in order[/color]
>>
>>bsearch[/color]
>
>
> Doesn't seem to work very well when you have to search inside a string,
> does it?[/color]

No, but it's not a stretch to see how one *could* use bsearch given the
format of the data being searched.

/david

--
"As a scientist, Throckmorton knew that if he were ever to break wind in
the echo chamber, he would never hear the end of it."

Dan Pop
Guest
 
Posts: n/a
#7: Nov 13 '05

re: is there an alternative to strstr


In <66bnb.26982$ri.4416513@twister.nyc.rr.com> David Rubin <fullname@warpmail.net> writes:
[color=blue]
>Dan Pop wrote:
>[color=green][color=darkred]
>>>>In my program I am checking wether a emailid exists in a list
>>>>
>>>>I have in the complete_list a string like
>>>>"<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
>>>>string of emailids sorted
>>>>
>>>>Now I want to find out if another ID "<lm@abc.com>" exists in this list
>>>>
>>>>
>>>>How is it possible to optimize the search given that the complete list
>>>>string is sorted in order
>>>
>>>bsearch[/color]
>>
>>
>> Doesn't seem to work very well when you have to search inside a string,
>> does it?[/color]
>
>No, but it's not a stretch to see how one *could* use bsearch given the
>format of the data being searched.[/color]

I strongly suspect the *real* data consists of variable length email
addresses...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan.Pop@ifh.de
tom_usenet
Guest
 
Posts: n/a
#8: Nov 13 '05

re: is there an alternative to strstr


On Mon, 27 Oct 2003 18:43:32 +0530, Ramprasad A Padmanabhan
<ramprasad@netcore.co.in> wrote:
[color=blue]
>In my program I am checking wether a emailid exists in a list
>
>I have in the complete_list a string like
>"<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
>string of emailids sorted
>
>Now I want to find out if another ID "<lm@abc.com>" exists in this list
>
>
>How is it possible to optimize the search given that the complete list
>string is sorted in order[/color]

You need an array of char*s that point to the start of each string.
Then it's easy. e.g.

char **index;
int nStrings = 0;
size_t length = strlen(complete_list);
/*count strings*/
for (int i = 0; i != length; ++i)
if(complete_list[i] == '<')
++nStrings;

/*make index*/
index = malloc(nStrings * sizeof *index);
nStrings = 0;
for (int i = 0; i != length; ++i)
if(complete_list[i] == '<')
index[nStrings++] = complete_list + i;

Now you can use bsearch on your index array, being careful to make
your comparison function compare correctly (only up to the " ").

Tom
Thomas Stegen CES2000
Guest
 
Posts: n/a
#9: Nov 13 '05

re: is there an alternative to strstr


Dan Pop wrote:[color=blue]
> In <66bnb.26982$ri.4416513@twister.nyc.rr.com> David Rubin <fullname@warpmail.net> writes:
>
>[color=green]
>>Dan Pop wrote:
>>
>>[color=darkred]
>>>>>In my program I am checking wether a emailid exists in a list
>>>>>
>>>>>I have in the complete_list a string like
>>>>>"<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a[/color][/color][/color]
[color=blue][color=green]
>>No, but it's not a stretch to see how one *could* use bsearch given the
>>format of the data being searched.[/color]
>
>
> I strongly suspect the *real* data consists of variable length email
> addresses...
>[/color]

But they all seem to be delimited by < and > so the compare function
could just look for those. Would have to look both backwards and
forwards though in case one ends up in the middle of an address.

There might be some other issues though. Like what to do if you end
up directly on a < or > or maybe even between them. Might not be an
issue though.

This function makes a few simplifications and might do strange things
if there can be whitespace before the first address. It is more a show
of concept. And I hope I get the return values right :)

And I know it is not a good idea to post untested code, but
I am in a hurry.

int compare(void *p1, void *p2)
{
char *s = p1;
char *target = p2;

while(*s != '<')
{
s--;
}

while((*s != '>') && (*target != '\0'))
{
if(*s > *target)
{
return 1;
}
if(*s < *target)
{
return -1;
}
s++;
target++;
}
return 0;
}

--
Thomas.

Dan Pop
Guest
 
Posts: n/a
#10: Nov 13 '05

re: is there an alternative to strstr


In <3f9d4fe1$1@nntphost.cis.strath.ac.uk> Thomas Stegen CES2000 <tstegen+usenet@cis.strath.ac.uk> writes:
[color=blue]
>Dan Pop wrote:[color=green]
>> In <66bnb.26982$ri.4416513@twister.nyc.rr.com> David Rubin <fullname@warpmail.net> writes:
>>
>>[color=darkred]
>>>Dan Pop wrote:
>>>
>>>
>>>>>>In my program I am checking wether a emailid exists in a list
>>>>>>
>>>>>>I have in the complete_list a string like
>>>>>>"<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a[/color][/color]
>[color=green][color=darkred]
>>>No, but it's not a stretch to see how one *could* use bsearch given the
>>>format of the data being searched.[/color]
>>
>>
>> I strongly suspect the *real* data consists of variable length email
>> addresses...
>>[/color]
>
>But they all seem to be delimited by < and > so the compare function
>could just look for those. Would have to look both backwards and
>forwards though in case one ends up in the middle of an address.[/color]

bsearch can operate on constant size objects only. The compare function
is a non-issue here. As another poster showed, you need to index the
string if you want to use bsearch, with a properly crafted compare
function.

The trivial case is when all addresses have the same length: you can
get the item count from (strlen(addresses) + 1) / (addrlen + 1) and use
a wrapper around memcmp() as the compare function.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan.Pop@ifh.de
Glen Herrmannsfeldt
Guest
 
Posts: n/a
#11: Nov 13 '05

re: is there an alternative to strstr



"David Rubin" <fullname@warpmail.net> wrote in message
news:8W8nb.24958$ri.4366350@twister.nyc.rr.com...[color=blue]
> Ramprasad A Padmanabhan wrote:[color=green]
> > In my program I am checking wether a emailid exists in a list
> >
> > I have in the complete_list a string like
> > "<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
> > string of emailids sorted
> >
> > Now I want to find out if another ID "<lm@abc.com>" exists in this list
> >
> >
> > How is it possible to optimize the search given that the complete list
> > string is sorted in order[/color]
>
> bsearch[/color]

A hash table might be a better solution, and it doesn't require the list to
be ordered.

Note, though, that it should be a list, such as an array of strings, for
bsearch, and it is a little easier even for hsearch.

Though with strtok on a copy of the string it shouldn't be too hard to do
it.

-- glen


Ramprasad A Padmanabhan
Guest
 
Posts: n/a
#12: Nov 13 '05

re: is there an alternative to strstr


Dan Pop wrote:[color=blue]
> In <bnj5m0$126eor$1@ID-166953.news.uni-berlin.de> Ramprasad A Padmanabhan <ramprasad@netcore.co.in> writes:
>
>[color=green]
>>In my program I am checking wether a emailid exists in a list
>>
>>I have in the complete_list a string like
>>"<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
>>string of emailids sorted
>>
>>Now I want to find out if another ID "<lm@abc.com>" exists in this list
>>
>>How is it possible to optimize the search given that the complete list
>>string is sorted in order[/color]
>
>
> To exploit this fact, you need a different data format, a plain string is
> not particularly suitable for a binary search.
>
> If you can store the email addresses in an array of strings, one per
> string, and keep them still sorted, the bsearch() function from the
> standard library is likely to be faster than a strstr() in a plain
> string. The comparison function can be a simple wrapper around strcmp().
>
> If you're performing this search often enough on large lists of email
> addresses, it may be worth considering changing the format in which
> you keep the email addresses, so that bsearch becomes an option.
> Compare the performance of the two approaches on real data sets to
> see which is better.
>
> OTOH, if you build the list of addresses on the fly and the input is NOT
> already sorted, you may consider putting them in a binary tree structure.
> Except for pathological cases (input data already sorted) the search is
> simple and fast and new data can be added with very little overhead (see
> also the example in K&R2).
>
> Dan[/color]

Ok I have put the email ids in a sorted array.
Can I use bsearch on an array with differrent sized elements. Because in
a real scenario the email ids will have different sizes

Thanks
Ram


Thomas Stegen CES2000
Guest
 
Posts: n/a
#13: Nov 13 '05

re: is there an alternative to strstr


Ramprasad A Padmanabhan wrote:

[color=blue]
> Ok I have put the email ids in a sorted array.
> Can I use bsearch on an array with differrent sized elements. Because in
> a real scenario the email ids will have different sizes[/color]

If by array you mean something like char *ids[100] or something
similar, then there is no problem using bsearch to search for
a given string in the array. The problems discussed elsethread
only apply when you store all the addresses in the same string
which is not the case if you use char *ids[100]. Then you have
100 different strings (or pointers to characters to be precise).

But they do need to be sorted. You can either use qsort or, as
Dan Pop suggested, a binary search tree. Note that sorting using
a binary search tree is very much equivalent to sorting using the
quicksort algorithm (assuming of course the qsort is implemented
using qsort) since both will be O(n log n). Then looking up the
tree or using bsearch on the array is also very much the same,
both will be O(log n).

If you need to continually add addresses to your program a tree
might be better, if you only add addresses once then the qsort/
bsearch combo is probably faster since there is some overhead
in the tree approach.

--
Thomas.

Paul Hsieh
Guest
 
Posts: n/a
#14: Nov 13 '05

re: is there an alternative to strstr


Ramprasad A Padmanabhan <ramprasad@netcore.co.in> wrote:[color=blue]
> In my program I am checking wether a emailid exists in a list
>
> I have in the complete_list a string like
> "<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
> string of emailids sorted
>
> Now I want to find out if another ID "<lm@abc.com>" exists in this list
>
> How is it possible to optimize the search given that the complete list
> string is sorted in order.[/color]

Well to be optimized for performance, you are going to need to
preprocess your list of strings to allow for indexing and possibly
using bsearch().

However, if you are more concerned with optimizing for "safety" and
"simplicity", you could try using bstrlib (http://bstring.sf.net/):

#include <stdio.h>
#include "bstrlib.h"

struct cbsScanEmailAddress {
bstring db, newID;
int ret;
};

static int testID (void * parm, int ofs, int len) {
struct cbsScanEmailAddress * s = (struct cbsScanEmailAddress *)parm;
struct tagbstring t;

blk2tbstr (t, s->db->data + ofs, len);
s->ret = bstrcmp (&t, s->newID);
return -(s->ret >= 0);
}

int scanForEmailAddress (const bstring dbIDs, const bstring newID) {
struct cbsScanEmailAddress s;

s.db = dbIDs;
s.newID = newID;
bsplitcb (dbIDs, ' ', 0, testID, &s);
return s.ret == 0;
}

struct tagbstring dbID = bsStatic ("<aa@abc.com> <ab@abc.com> " \
"<bc@abc.com> <xy@abc.com>");
struct tagbstring newID1 = bsStatic ("<bc@abc.com>");
struct tagbstring newID2 = bsStatic ("<xbc@abc.com>");

int main () {
printf ("%d\n", scanForEmailAddress (&dbID, &newID1));
printf ("%d\n", scanForEmailAddress (&dbID, &newID2));
return 0;
}

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Glen Herrmannsfeldt
Guest
 
Posts: n/a
#15: Nov 13 '05

re: is there an alternative to strstr



"Ramprasad A Padmanabhan" <ramprasad@netcore.co.in> wrote in message
news:bnj5m0$126eor$1@ID-166953.news.uni-berlin.de...[color=blue]
> In my program I am checking wether a emailid exists in a list
>
> I have in the complete_list a string like
> "<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
> string of emailids sorted
>
> Now I want to find out if another ID "<lm@abc.com>" exists in this list
>
>
> How is it possible to optimize the search given that the complete list
> string is sorted in order[/color]

As a completely different idea, you could search it with the Boyer-Moore
algorithm. I don't believe strstr usually does that, as the startup
overhead is a little high, and strstr is often used on short strings.

-- glen


Al Bowers
Guest
 
Posts: n/a
#16: Nov 13 '05

re: is there an alternative to strstr




Ramprasad A Padmanabhan wrote:[color=blue]
> Dan Pop wrote:[/color]
[color=blue][color=green][color=darkred]
>>> I have in the complete_list a string like
>>> "<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
>>> string of emailids sorted
>>>
>>> Now I want to find out if another ID "<lm@abc.com>" exists in this list
>>>
>>> How is it possible to optimize the search given that the complete
>>> list string is sorted in order[/color]
>>
>>
>>
>> To exploit this fact, you need a different data format, a plain string is
>> not particularly suitable for a binary search.
>>
>> If you can store the email addresses in an array of strings, one per
>> string, and keep them still sorted, the bsearch() function from the
>> standard library is likely to be faster than a strstr() in a plain
>> string. The comparison function can be a simple wrapper around strcmp().
>>
>> If you're performing this search often enough on large lists of email
>> addresses, it may be worth considering changing the format in which
>> you keep the email addresses, so that bsearch becomes an option.
>> Compare the performance of the two approaches on real data sets to
>> see which is better.
>>
>> OTOH, if you build the list of addresses on the fly and the input is NOT
>> already sorted, you may consider putting them in a binary tree structure.
>> Except for pathological cases (input data already sorted) the search is
>> simple and fast and new data can be added with very little overhead
>> (see also the example in K&R2).
>>[/color]
>
>
> Ok I have put the email ids in a sorted array.
> Can I use bsearch on an array with differrent sized elements. Because in
> a real scenario the email ids will have different sizes
>[/color]

No bsearch and qsort is not suitable for different sized elements.
However you can use an array of char pointers and use
qsort to sort the array of char pointers and use bsearch to find the key
in the array.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef struct EMAILLIST
{
char **email;
size_t email_count;
} EMAILLIST;

int AddEmailAddr(const char *s, EMAILLIST *p);
char **SearchEmailList(const char *key, EMAILLIST *p);
void FreeEmailList(EMAILLIST *p);
int cmp(const void *v1, const void *v2);

int main(void)
{
EMAILLIST my = {NULL};
char **tmp, *s;
char emails[] = "<aa@abc.com> <ab@abc.com> "
"<bc@abc.com> <xy@abc.com>";
s = strtok(emails," ");
while(s)
{
AddEmailAddr(s,&my);
s = strtok(NULL," ");
}
AddEmailAddr("<xx@abc.com>",&my);
if((tmp = SearchEmailList("<ab@abc.com>",&my)) != NULL)
printf("Email \"%s\" found in the list\n",*tmp);
printf("TEST: my.email_count = %u\n",my.email_count);
FreeEmailList(&my);
return 0;
}

int AddEmailAddr(const char *s, EMAILLIST *p)
{
char **tmp;
size_t num = p->email_count;

if(p->email != NULL)
if(SearchEmailList(s,p) != NULL)
{
printf("Email Dupe, Email \"%s\" not added\n",s);
return 0;
}
if((tmp = realloc(p->email,
(num+1)*(sizeof *p->email))) == NULL)
{
printf("Memory Allocation Error, Email \"%s\" not added\n",s);
return 0;
}
p->email = tmp;
if((p->email[num] = malloc(strlen(s)+1)) == NULL)
{
printf("Memory Allocation Error, Email \"%s\" not added\n",s);
return 0;
}
strcpy(p->email[num],s);
qsort(p->email,++p->email_count,sizeof *p->email,cmp);
return 1;
}

void FreeEmailList(EMAILLIST *p)
{
size_t i;

for(i = 0;i < p->email_count;i++)
free(p->email[i]);
free(p->email);
p->email = NULL;
p->email_count = 0;
return;
}

int cmp(const void *v1, const void *v2)
{
const char **s1 = (const char **)v1;
const char **s2 = (const char **)v2;

return strcmp(*s1,*s2);
}

char **SearchEmailList(const char *key, EMAILLIST *p)
{
if(p->email == NULL) return NULL;
return bsearch(&key,p->email,p->email_count,sizeof(char *),cmp);
}


--
Al Bowers
Tampa, Fl USA
mailto: xabowers@myrapidsys.com (remove the x to send email)
http://www.geocities.com/abowers822/

Dan Pop
Guest
 
Posts: n/a
#17: Nov 13 '05

re: is there an alternative to strstr


In <o8xnb.51291$HS4.234455@attbi_s01> "Glen Herrmannsfeldt" <gah@ugcs.caltech.edu> writes:

[color=blue]
>"Ramprasad A Padmanabhan" <ramprasad@netcore.co.in> wrote in message
>news:bnj5m0$126eor$1@ID-166953.news.uni-berlin.de...[color=green]
>> In my program I am checking wether a emailid exists in a list
>>
>> I have in the complete_list a string like
>> "<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
>> string of emailids sorted
>>
>> Now I want to find out if another ID "<lm@abc.com>" exists in this list
>>
>> How is it possible to optimize the search given that the complete list
>> string is sorted in order[/color]
>
>As a completely different idea, you could search it with the Boyer-Moore
>algorithm. I don't believe strstr usually does that, as the startup
>overhead is a little high, and strstr is often used on short strings.[/color]

Would you really expect a newbie who has trouble *using* bsearch to be
able to implement Boyer-Moore?

Although likely to be faster than strstr, it's not likely to be faster
than bsearch, but it has the merit that it works on the original data
format.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan.Pop@ifh.de
Glen Herrmannsfeldt
Guest
 
Posts: n/a
#18: Nov 13 '05

re: is there an alternative to strstr



"Dan Pop" <Dan.Pop@cern.ch> wrote in message
news:bnm8re$55j$1@sunnews.cern.ch...[color=blue]
> In <o8xnb.51291$HS4.234455@attbi_s01> "Glen Herrmannsfeldt"[/color]
<gah@ugcs.caltech.edu> writes:

(snip)

I wrote:
[color=blue][color=green]
> >As a completely different idea, you could search it with the Boyer-Moore
> >algorithm. I don't believe strstr usually does that, as the startup
> >overhead is a little high, and strstr is often used on short strings.[/color][/color]
[color=blue]
> Would you really expect a newbie who has trouble *using* bsearch to be
> able to implement Boyer-Moore?[/color]

It isn't all that hard to do, though knowing the language you are writing in
does help. Also, already written versions may exist.
[color=blue]
> Although likely to be faster than strstr, it's not likely to be faster
> than bsearch, but it has the merit that it works on the original data
> format.[/color]

If the strings keep the '<' and '>' on them, it might be pretty fast, once
the initialization is done. It does save the time of parsing the input
data into separate strings, which should also count in the time, as should
the Boyer-Moore table building.

-- glen


Jarno A Wuolijoki
Guest
 
Posts: n/a
#19: Nov 13 '05

re: is there an alternative to strstr


On 28 Oct 2003, Dan Pop wrote:
[color=blue][color=green]
> >As a completely different idea, you could search it with the Boyer-Moore
> >algorithm. I don't believe strstr usually does that, as the startup
> >overhead is a little high, and strstr is often used on short strings.[/color]
>
> Would you really expect a newbie who has trouble *using* bsearch to be
> able to implement Boyer-Moore?
>
> Although likely to be faster than strstr, it's not likely to be faster
> than bsearch, but it has the merit that it works on the original data
> format.[/color]

Of course, if that is an absolute requirement, you can do binary
search on the original data as well. (just not with bsearch())


char *findaddr(char *needle, size_t needlen,
char *haystack, size_t hslen)
{
char *segbeg=haystack, *segend=haystack+hslen;

for (;;) {
char *midbeg, *midend;
int cmpres;

while (segbeg<segend && *segbeg!='<') segbeg++;
while (segbeg<segend && *(segend-1)!='>') segend--;
if (segbeg==segend) return 0;

midbeg=segbeg+(segend-segbeg)/2;
while (*midbeg!='<') midbeg--;

midend=midbeg;
while (*midend!='>') midend++;
midend++;

cmpres=compare(midbeg, midend-midbeg, needle, needlen);
if (cmpres==0) return midbeg;
if (cmpres<0) segbeg=midend; else segend=midbeg;
}
}

Closed Thread