Ramprasad A Padmanabhan wrote:[color=blue]
> Dan Pop wrote:[/color]
[color=blue][color=green][color=darkred]
>>> I have in the complete_list a string like
>>> "<aa@abc.com> <ab@abc.com> <bc@abc.com> ... <xy@abc.com>" that is a
>>> string of emailids sorted
>>>
>>> Now I want to find out if another ID "<lm@abc.com>" exists in this list
>>>
>>> How is it possible to optimize the search given that the complete
>>> list string is sorted in order[/color]
>>
>>
>>
>> To exploit this fact, you need a different data format, a plain string is
>> not particularly suitable for a binary search.
>>
>> If you can store the email addresses in an array of strings, one per
>> string, and keep them still sorted, the bsearch() function from the
>> standard library is likely to be faster than a strstr() in a plain
>> string. The comparison function can be a simple wrapper around strcmp().
>>
>> If you're performing this search often enough on large lists of email
>> addresses, it may be worth considering changing the format in which
>> you keep the email addresses, so that bsearch becomes an option.
>> Compare the performance of the two approaches on real data sets to
>> see which is better.
>>
>> OTOH, if you build the list of addresses on the fly and the input is NOT
>> already sorted, you may consider putting them in a binary tree structure.
>> Except for pathological cases (input data already sorted) the search is
>> simple and fast and new data can be added with very little overhead
>> (see also the example in K&R2).
>>[/color]
>
>
> Ok I have put the email ids in a sorted array.
> Can I use bsearch on an array with differrent sized elements. Because in
> a real scenario the email ids will have different sizes
>[/color]
No bsearch and qsort is not suitable for different sized elements.
However you can use an array of char pointers and use
qsort to sort the array of char pointers and use bsearch to find the key
in the array.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
typedef struct EMAILLIST
{
char **email;
size_t email_count;
} EMAILLIST;
int AddEmailAddr(const char *s, EMAILLIST *p);
char **SearchEmailList(const char *key, EMAILLIST *p);
void FreeEmailList(EMAILLIST *p);
int cmp(const void *v1, const void *v2);
int main(void)
{
EMAILLIST my = {NULL};
char **tmp, *s;
char emails[] = "<aa@abc.com> <ab@abc.com> "
"<bc@abc.com> <xy@abc.com>";
s = strtok(emails," ");
while(s)
{
AddEmailAddr(s,&my);
s = strtok(NULL," ");
}
AddEmailAddr("<xx@abc.com>",&my);
if((tmp = SearchEmailList("<ab@abc.com>",&my)) != NULL)
printf("Email \"%s\" found in the list\n",*tmp);
printf("TEST: my.email_count = %u\n",my.email_count);
FreeEmailList(&my);
return 0;
}
int AddEmailAddr(const char *s, EMAILLIST *p)
{
char **tmp;
size_t num = p->email_count;
if(p->email != NULL)
if(SearchEmailList(s,p) != NULL)
{
printf("Email Dupe, Email \"%s\" not added\n",s);
return 0;
}
if((tmp = realloc(p->email,
(num+1)*(sizeof *p->email))) == NULL)
{
printf("Memory Allocation Error, Email \"%s\" not added\n",s);
return 0;
}
p->email = tmp;
if((p->email[num] = malloc(strlen(s)+1)) == NULL)
{
printf("Memory Allocation Error, Email \"%s\" not added\n",s);
return 0;
}
strcpy(p->email[num],s);
qsort(p->email,++p->email_count,sizeof *p->email,cmp);
return 1;
}
void FreeEmailList(EMAILLIST *p)
{
size_t i;
for(i = 0;i < p->email_count;i++)
free(p->email[i]);
free(p->email);
p->email = NULL;
p->email_count = 0;
return;
}
int cmp(const void *v1, const void *v2)
{
const char **s1 = (const char **)v1;
const char **s2 = (const char **)v2;
return strcmp(*s1,*s2);
}
char **SearchEmailList(const char *key, EMAILLIST *p)
{
if(p->email == NULL) return NULL;
return bsearch(&key,p->email,p->email_count,sizeof(char *),cmp);
}
--
Al Bowers
Tampa, Fl USA
mailto:
xabowers@myrapidsys.com (remove the x to send email)
http://www.geocities.com/abowers822/