473,900 Members | 4,979 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

strncmp performance

Hi,
I have an application where I use the strncmp function extensively to
compare 2 string to see if they are the same or not. Does anyone know
a faster replacement for strncmp? I notice there is a memncmp function
which is very fast but it doesn't work the same like strncmp so I
don't think I can use it. I also tried to write the string_equal
function myself like:

int string_equal(co nst char* s1,const char* s2){
while(*s1 && *s2 && *s1 == *s2){
s1++; s2++;
}
return *s1 == *s2;
}

but I don't get much performance gain. Any suggestion?

Thanks!
Nov 14 '05
26 13046

"pembed2003 " <pe********@yah oo.com> wrote in message
news:db******** *************** ***@posting.goo gle.com...
"Sean Kenwrick" <sk*******@hotm ail.com> wrote in message

news:<bv******* ***@sparta.btin ternet.com>...
"pembed2003 " <pe********@yah oo.com> wrote in message
news:db******** *************** **@posting.goog le.com...
Hi,
I have an application where I use the strncmp function extensively to
compare 2 string to see if they are the same or not. Does anyone know
a faster replacement for strncmp? I notice there is a memncmp function
which is very fast but it doesn't work the same like strncmp so I
don't think I can use it. I also tried to write the string_equal
function myself like:

int string_equal(co nst char* s1,const char* s2){
while(*s1 && *s2 && *s1 == *s2){
s1++; s2++;
}
return *s1 == *s2;
}

but I don't get much performance gain. Any suggestion?

Thanks!


Have you tried forcing your compiler to use inline code rather than a
function call (either by implementing this as a Macro or by using the
'inline' keyword. You might find that setting up the stack frame (or whatever) prior to making each call to your string_equal() function takes just as long as the function itself.

Also it appears that you are implementing (a simplified version of) strcmp() here NOT strncmp() and you are evaluating an extra condition in your loop (drop the check on s2 for '\0'):.

int string_equal(co nst char * s1, const char * s2){
while(*s1 && * s1==*s2){
s1++; s2++;
}
return *s1==*s2;
}
Regarding memcmp() this is likely to be highly optimised and much faster
than strcmp(), but the problem is that you need to pass to it the length of the strings that you are comparing. There might be a way you can make use of memcmp() if you know that the strings are the same length. Do you get this information somewhere earlier in your code (e.g when you first read in the key value?). Otherwise try:

memcpy(s1,s2,st rlen(s1));


Do you mean memcmp() here instead? Yes I think memcpy is faster than
strcmp or strncmp but I need to find out the longer string and pass in
the lenght of that. Otherwise, something like:

char* s1 = "auto";
char* s2 = "auto insurance";

memcmp(s1,s2,st rlen(s1));

will return 0 which isn't. I will need to do the extra work like:

int l1 = strlen(s1);
int l2 = strlen(s2);

memcmp(s1,s2,l1 > l2 ? l1 : l2);

Do you think that will be faster than a strcmp or strncmp?


You need to examine your code for data-caching possibilities. What I mean
by this is that you evaluate a value at some stage which you keep and use
multiple times later on. In this case the important value is the length
of the strings. From a previous post it looks like you are evaluating
hash_keys() prior to posting keys into your lookup table - it seems that
this function is a likely candidate for calculating the length of the string
with little overhead (e.g save the original pointer and use pointer
arithmetic at the end to calculate the strlen()). You could then store
the length along with the other information in your lookup table.
Then you only need to do a memcmp() if the strings are of equal length, and
you already have the string lengths calculated if you do need to call
memcmp()...

Sean
Nov 14 '05 #21
Nick Landsberg wrote:
pembed2003 wrote:

[ snip ]

I have a very simple hash table where the keys are string and
values are also string. What I want to let people do is:

hash_insert("ke y","value");

and then later retrive "value" by saying:

hash_lookup("ke y");

The hash algr. is very simple. It calculates the hash code like:

unsigned long hash_code(const char* s){
unsigned long int c = 0;
while(*s){
c = 131 * c + *s;
s++;
}
return c % MAX_HASH_SIZE;
}


At first glance, that does not look like a very robust
hash algorithm and it MAY be why you are doing so many
strncmp() calls. Have you any data on how many strings
hash into the same hash code? If it's more than 2 or 3
on average, then you should revise the hash algorithm
rather than trying to optimize strcmp().

A good hash algorithm can get down to about 1.5 probes/search.
Try the CRC algorithm for starters.


His function sounds much too dependant on the low order bits of
the last character hashed.

To experiment with hash functions and immediately see the
probes/search and other statistics, the OP could try using the
hashlib package. It was born out of an investigation into hash
functions. There are some sample string hashing routines, and
references to other methods.

<http://cbfalconer.home .att.net/download/hashlib.zip>

--
Chuck F (cb********@yah oo.com) (cb********@wor ldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net> USE worldnet address!

Nov 14 '05 #22
pe********@yaho o.com (pembed2003) wrote in message news:<db******* *************** ***@posting.goo gle.com>...
Christian Bau <ch***********@ cbau.freeserve. co.uk> wrote in message news:<ch******* *************** ***********@slb-newsm1.svr.pol. co.uk>...
In article <db************ *************@p osting.google.c om>,
pe********@yaho o.com (pembed2003) wrote:
Hi,
I have an application where I use the strncmp function extensively to
compare 2 string to see if they are the same or not. Does anyone know
a faster replacement for strncmp? I notice there is a memncmp function
which is very fast but it doesn't work the same like strncmp so I
don't think I can use it. I also tried to write the string_equal
function myself like:

int string_equal(co nst char* s1,const char* s2){
while(*s1 && *s2 && *s1 == *s2){


This definitely does to many comparisons. If *s1 == *s2 then *s1 and *s2
are either both zero, or non of them is zero, so it is absolutely
pointless to check both of them.

s1++; s2++;
}
return *s1 == *s2;
}

but I don't get much performance gain. Any suggestion?


Instead of worrying about the performance of strncmp, you should ask
yourself why you make so many calls to it that it matters at all.


I have a very simple hash table where the keys are string and values
are also string. What I want to let people do is:

hash_insert("ke y","value");

and then later retrive "value" by saying:

hash_lookup("ke y");

The hash algr. is very simple. It calculates the hash code like:

unsigned long hash_code(const char* s){
unsigned long int c = 0;
while(*s){
c = 131 * c + *s;
s++;
}
return c % MAX_HASH_SIZE;
}

It's possible for 2 different string to have the same hash code, so
when I am doing lookup, I am doing something like:

char* hash_lookup(con st char* s){
unsigned long c = hash_code(s);
// Note I can't simply return the slot at c because
// it's possible that another different string is at
// this spot because of the hash algr. So I need to...
if(strcmp(s,(ha sh+c)->value) == 0){
// Found it...
}else{
// Do something else
}
}

So because of my hash algr., an extra strcmp is needed.


Since it is a hashing algorithm I would assume that the chance of the
first character been the same as the second character is very high.
Only if the hash table becomes extremely full does this change.

If this is the case replace:

if (strcmp (s, (hash + c)->value) == 0) {
// Found it...
}

with

if (s[0] == *((hash + c)->value)) /* Compare the first two chars. */
{
if (s, (hash + c)->value) == 0) /* Only now compare strings. */
{
// Found it...
}
else
{
// Do something else ..
}
}
else
{
// Do something else ..
}

This will kick out the cases where the strings don't match rather
faster. It removes the overhead of a function call and the beginning
of a loop (though neither ammount to much these days).

You may want to consider using a flag that labels a bucket as full
rather than comparing the string in the bucket to the key. This way
the first time the program looks up a label no comparison is done -
this will be insertion.

You may also want to consider:

* Using a better hash function ( read
http://burtleburtle.net/bob/hash/index.html#lookup )

* Resizing the hash when it's nearly full

* Using linked lists as buckets.
Nov 14 '05 #23
"Sean Kenwrick" <sk*******@hotm ail.com> wrote in message news:<bv6r33

[snip]

Do you mean memcmp() here instead? Yes I think memcpy is faster than
strcmp or strncmp but I need to find out the longer string and pass in
the lenght of that. Otherwise, something like:

char* s1 = "auto";
char* s2 = "auto insurance";

memcmp(s1,s2,st rlen(s1));

will return 0 which isn't. I will need to do the extra work like:

int l1 = strlen(s1);
int l2 = strlen(s2);

memcmp(s1,s2,l1 > l2 ? l1 : l2);

Do you think that will be faster than a strcmp or strncmp?


You need to examine your code for data-caching possibilities. What I mean
by this is that you evaluate a value at some stage which you keep and use
multiple times later on. In this case the important value is the length
of the strings. From a previous post it looks like you are evaluating
hash_keys() prior to posting keys into your lookup table - it seems that
this function is a likely candidate for calculating the length of the string
with little overhead (e.g save the original pointer and use pointer
arithmetic at the end to calculate the strlen()). You could then store
the length along with the other information in your lookup table.
Then you only need to do a memcmp() if the strings are of equal length, and
you already have the string lengths calculated if you do need to call
memcmp()...

Sean


If fact, that's exactly what I did now. If the strlen isn't the same,
there is no chance for the strings to be the same. If they are the
same length, memcmp can be used. So instead of doing strcmp (or
strncmp) I am doing either strlen and/or memcmp which should be
faster. Another problem now I have encountered is that the string
passed in to my function is not from a C program, it's from a PHP
extension (which is written in C). Because of this, I sometimes got
segfault which I think is related to PHP not padding the string with
the NULL marker. My question is: Does memcmp care whethere there is a
NULL marker somewhere or not? Is there any circumstance where memcmp
might segfault?

Thanks!
Nov 14 '05 #24
I've just been reading thread and two things pop to mind. First of
all, the hash function you have chosen looks a little bit questionable
in terms of collisions. The FNV hash is well known to behave quite
well and will have performance identical to your hash function:

http://www.isthe.com/chongo/tech/comp/fnv/

Second, if your program still boils down to string comparison no
matter what, then you should consider converting your program over to
a library like The Better String library:

http://bstring.sf.net/

In this library, the length of each string is predetermined as they
are created or modified (this is very cheap, while leading to massive
performance improvements in some string functionality.) In this way
you can use memcmp() (which has the potential of being implemented to
use block comparisons) directly without incurring the string traversal
costs (in general.) The Better String library also includes its own
string comparison functions, of course, which additionally capture
trivial cases like strings having different lengths and aliased string
pointers in O(1) time.

Additionally calling strlen(), or using strcmp, or strncmp, or
whatever based on the assumption of using raw char * buffers will all
incurr an additional O(n) cost no matter how you slice it, which may
be a big factor in what is showing up on your bottom line. Using
libraries like Bstrlib (which essential has a O(1) strlen) as
described above is really the only way to avoid this cost.

As a point of disclosure, I am the author of Bstrlib -- other
libraries like Vstr (www.and.org/vstr) have comparable mechanisms.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 14 '05 #25
In article <79************ **************@ posting.google. com>,
qe*@pobox.com (Paul Hsieh) wrote:
I've just been reading thread and two things pop to mind. First of
all, the hash function you have chosen looks a little bit questionable
in terms of collisions. The FNV hash is well known to behave quite
well and will have performance identical to your hash function:

http://www.isthe.com/chongo/tech/comp/fnv/

Second, if your program still boils down to string comparison no
matter what, then you should consider converting your program over to
a library like The Better String library:

http://bstring.sf.net/

In this library, the length of each string is predetermined as they
are created or modified (this is very cheap, while leading to massive
performance improvements in some string functionality.) In this way
you can use memcmp() (which has the potential of being implemented to
use block comparisons) directly without incurring the string traversal
costs (in general.) The Better String library also includes its own
string comparison functions, of course, which additionally capture
trivial cases like strings having different lengths and aliased string
pointers in O(1) time.


If the data is completely under your control, you could make different
changes: Store all the strings in arrays of unsigned long instead of
unsigned char. In the table, end every string with at least one zero and
a 1, with the 1 being the last byte in an unsigned long. In the strings
that you pass in, end every string with at least two zeroes, with the
last zero being the last byte in an unsigned long.

You can know compare one unsigned long at a time. You don't need to
check for the end of the strings because the data you pass in and the
data in your table will be different. After finding the first unsigned
long that is different, the strings are equal if the difference between
the two strings is 1 and the last byte of the unsigned long that you
took from the table is 1.
Nov 14 '05 #26
qe*@pobox.com (Paul Hsieh) wrote in message news:<79******* *************** ****@posting.go ogle.com>...
I've just been reading thread and two things pop to mind. First of
all, the hash function you have chosen looks a little bit questionable
in terms of collisions. The FNV hash is well known to behave quite
well and will have performance identical to your hash function:

http://www.isthe.com/chongo/tech/comp/fnv/

Second, if your program still boils down to string comparison no
matter what, then you should consider converting your program over to
a library like The Better String library:

http://bstring.sf.net/


Thanks for pointing out a better hash algr. and string library. I will
consider using both in my application. As I have pointed out in
another thread, my problem now seems to be in PHP not padding the
string with \0 which result in segfault.
Nov 14 '05 #27

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
1884
by: jacob navia | last post by:
What is the result of that? I am reimplementing the C library, and I would like to know. Now I have it to return zero. Is that correct? The standard says: >> The strncmp function compares not more than n characters (characters that follow a
1
4225
by: Bert | last post by:
I have a question about buffer overflows with strcmp or strncmp (strnicmp). It's best described by providing an example Example: There is a string "sBuf" of length 5000. In a for loop from 0 to 5000, several NUL terminated strings are compared, these strings vary in size from 3 to 9. sBuf is user input (a file), memory is dynamically allocated using malloc (file length+1) and it is properly nul terminated. The other
4
14395
by: Preets | last post by:
Hi, Can anybody please tell me the difference between memcmp() and strncmp() functions ?
3
1900
by: arunraj2002in | last post by:
Hi All, I have doubt in strncmp. consider the following prog: int main() { int i; i = strncmp("Nstring","Astring",6);
12
1879
by: Krumble Bunk | last post by:
Hi all, Having some trouble with a seemingly-simple task. Need to have a basic CLI functionality as per: prompt <- \n prompt <- \n promptquit <- should quit as per my snippet, but doesn't. I've thrown a few random fflush()'s in, but no joy. I have never
0
9845
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11276
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10866
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10497
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9671
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8043
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7204
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4721
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3320
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.