473,890 Members | 1,346 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

strncmp performance

Hi,
I have an application where I use the strncmp function extensively to
compare 2 string to see if they are the same or not. Does anyone know
a faster replacement for strncmp? I notice there is a memncmp function
which is very fast but it doesn't work the same like strncmp so I
don't think I can use it. I also tried to write the string_equal
function myself like:

int string_equal(co nst char* s1,const char* s2){
while(*s1 && *s2 && *s1 == *s2){
s1++; s2++;
}
return *s1 == *s2;
}

but I don't get much performance gain. Any suggestion?

Thanks!
Nov 14 '05
26 13045
pembed2003 wrote:

I have an application where I use the strncmp function extensively to
compare 2 string to see if they are the same or not. Does anyone know
a faster replacement for strncmp? I notice there is a memncmp function
which is very fast but it doesn't work the same like strncmp so I
don't think I can use it. I also tried to write the string_equal
function myself like:


String comparison can be inherently slow, because it has to
examine each character. It would be better to detail your
application and ask for advice on algorithms. comp.programmin g is
a suitable place for that.

For example, strings might be hashed, so that very few actual
comparisons need be made.

--
Chuck F (cb********@yah oo.com) (cb********@wor ldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net> USE worldnet address!
Nov 14 '05 #11
pembed2003 wrote:
"Tom St Denis" <to********@iah u.ca> wrote in message news:<RN******* ************@ne ws04.bloor.is.n et.cable.rogers .com>...
"pembed2003 " <pe********@yah oo.com> wrote in message
news:db****** *************** ****@posting.go ogle.com...
Hi,
I have an application where I use the strncmp function extensively to
compare 2 string to see if they are the same or not. Does anyone know
a faster replacement for strncmp? I notice there is a memncmp function
which is very fast but it doesn't work the same like strncmp so I
don't think I can use it. I also tried to write the string_equal
function myself like:

int string_equal(co nst char* s1,const char* s2){
while(*s1 && *s2 && *s1 == *s2){
s1++; s2++;
}
return *s1 == *s2;
}

but I don't get much performance gain. Any suggestion?


Probably because the native strncmp is highly optimized code? E.g. doing
word-comparisons instead of char comparisons can easily give you upto
2x/4x/8x/etc speedup [based on your machine register size]. It also makes
the code a bit more complex but for strings longer than a dozen or so chars
is faster.

Tom

Thanks but I can't use word-comparisons because I need to know if the
strings match exactly or not, regardless of case.


Alas, you misunderstand here. ;-(
"Word", in the respondents case refers not to words like, well "word"
but "machine word", i.e. the number of bytes-at-a-time that's native to
your machine's architecture.

HTH,
--ag

--
Artie Gold -- Austin, Texas
Nov 14 '05 #12
"Sean Kenwrick" <sk*******@hotm ail.com> wrote in message news:<bv******* ***@sparta.btin ternet.com>...
"pembed2003 " <pe********@yah oo.com> wrote in message
news:db******** *************** **@posting.goog le.com...
Hi,
I have an application where I use the strncmp function extensively to
compare 2 string to see if they are the same or not. Does anyone know
a faster replacement for strncmp? I notice there is a memncmp function
which is very fast but it doesn't work the same like strncmp so I
don't think I can use it. I also tried to write the string_equal
function myself like:

int string_equal(co nst char* s1,const char* s2){
while(*s1 && *s2 && *s1 == *s2){
s1++; s2++;
}
return *s1 == *s2;
}

but I don't get much performance gain. Any suggestion?

Thanks!
Have you tried forcing your compiler to use inline code rather than a
function call (either by implementing this as a Macro or by using the
'inline' keyword. You might find that setting up the stack frame (or
whatever) prior to making each call to your string_equal() function takes
just as long as the function itself.

Also it appears that you are implementing (a simplified version of) strcmp()
here NOT strncmp() and you are evaluating an extra condition in your loop
(drop the check on s2 for '\0'):.

int string_equal(co nst char * s1, const char * s2){
while(*s1 && * s1==*s2){
s1++; s2++;
}
return *s1==*s2;
}
Regarding memcmp() this is likely to be highly optimised and much faster
than strcmp(), but the problem is that you need to pass to it the length of
the strings that you are comparing. There might be a way you can make
use of memcmp() if you know that the strings are the same length. Do you
get this information somewhere earlier in your code (e.g when you first read
in the key value?). Otherwise try:

memcpy(s1,s2,st rlen(s1));


Do you mean memcmp() here instead? Yes I think memcpy is faster than
strcmp or strncmp but I need to find out the longer string and pass in
the lenght of that. Otherwise, something like:

char* s1 = "auto";
char* s2 = "auto insurance";

memcmp(s1,s2,st rlen(s1));

will return 0 which isn't. I will need to do the extra work like:

int l1 = strlen(s1);
int l2 = strlen(s2);

memcmp(s1,s2,l1 > l2 ? l1 : l2);

Do you think that will be faster than a strcmp or strncmp?

It might still be faster than doing strcmp() even with the extra strlen()
call...

Sean

Nov 14 '05 #13
Christian Bau <ch***********@ cbau.freeserve. co.uk> wrote in message news:<ch******* *************** ***********@slb-newsm1.svr.pol. co.uk>...
In article <db************ *************@p osting.google.c om>,
pe********@yaho o.com (pembed2003) wrote:
Hi,
I have an application where I use the strncmp function extensively to
compare 2 string to see if they are the same or not. Does anyone know
a faster replacement for strncmp? I notice there is a memncmp function
which is very fast but it doesn't work the same like strncmp so I
don't think I can use it. I also tried to write the string_equal
function myself like:

int string_equal(co nst char* s1,const char* s2){
while(*s1 && *s2 && *s1 == *s2){


This definitely does to many comparisons. If *s1 == *s2 then *s1 and *s2
are either both zero, or non of them is zero, so it is absolutely
pointless to check both of them.

s1++; s2++;
}
return *s1 == *s2;
}

but I don't get much performance gain. Any suggestion?


Instead of worrying about the performance of strncmp, you should ask
yourself why you make so many calls to it that it matters at all.


I have a very simple hash table where the keys are string and values
are also string. What I want to let people do is:

hash_insert("ke y","value");

and then later retrive "value" by saying:

hash_lookup("ke y");

The hash algr. is very simple. It calculates the hash code like:

unsigned long hash_code(const char* s){
unsigned long int c = 0;
while(*s){
c = 131 * c + *s;
s++;
}
return c % MAX_HASH_SIZE;
}

It's possible for 2 different string to have the same hash code, so
when I am doing lookup, I am doing something like:

char* hash_lookup(con st char* s){
unsigned long c = hash_code(s);
// Note I can't simply return the slot at c because
// it's possible that another different string is at
// this spot because of the hash algr. So I need to...
if(strcmp(s,(ha sh+c)->value) == 0){
// Found it...
}else{
// Do something else
}
}

So because of my hash algr., an extra strcmp is needed.
Nov 14 '05 #14
pe********@yaho o.com (pembed2003) wrote in message news:<db******* *************** ****@posting.go ogle.com>...
"Tom St Denis" <to********@iah u.ca> wrote in message news:<RN******* ************@ne ws04.bloor.is.n et.cable.rogers .com>...
"pembed2003 " <pe********@yah oo.com> wrote in message
news:db******** *************** **@posting.goog le.com...
Hi,
I have an application where I use the strncmp function extensively to
compare 2 string to see if they are the same or not. Does anyone know
a faster replacement for strncmp? I notice there is a memncmp function
which is very fast but it doesn't work the same like strncmp so I
don't think I can use it. I also tried to write the string_equal
function myself like:

int string_equal(co nst char* s1,const char* s2){
while(*s1 && *s2 && *s1 == *s2){
s1++; s2++;
}
return *s1 == *s2;
}

but I don't get much performance gain. Any suggestion?


Probably because the native strncmp is highly optimized code? E.g. doing
word-comparisons instead of char comparisons can easily give you upto
2x/4x/8x/etc speedup [based on your machine register size]. It also makes
the code a bit more complex but for strings longer than a dozen or so chars
is faster.

Tom


Thanks but I can't use word-comparisons because I need to know if the
strings match exactly or not, regardless of case.


Your usage from above doesn't explain strncmp, no length is passed in.
Nor does it work "regardless of case". You should describe your
problem more exactly, as just not doing the strncmp (or strcmps) may
be the correct answer. e.g. if you are looking up key/value pairs as
you say elsethread, the problem may be solved using a hashtable/hash
algorithm that minimizes collisions. You won't do many strcmps
per lookup under those circumstances.

If you are already using a nice algorithm, and still DO need to do
lots of strcmps, a few things might be useful to save time. For
example, you could compute the length of the keys of the key/value
pairs once and save that length with the key. Figure out the length
of the query. If those lengths aren't identical, skip the strcmp
call. If the lengths are known to be identical, your string_equal (a
name that is not well advised, as things starting str followed by a
lower case letter are reserved for the implementation) could be as
below.

#define STRN_EQUAL(s1,s 2,len) (memcmp(s1,s2,l en) == 0)

If all your keys are of the same length, well, you probably lose on
what I said above. If their lengths vary greatly, you probably win.

-David
Nov 14 '05 #15


Christian Bau wrote:
In article <db************ *************@p osting.google.c om>,
pe********@yaho o.com (pembed2003) wrote:

Hi,
I have an application where I use the strncmp function extensively to
compare 2 string to see if they are the same or not. Does anyone know
a faster replacement for strncmp? I notice there is a memncmp function
which is very fast but it doesn't work the same like strncmp so I
don't think I can use it. I also tried to write the string_equal
function myself like:

int string_equal(co nst char* s1,const char* s2){
while(*s1 && *s2 && *s1 == *s2){

This definitely does to many comparisons. If *s1 == *s2 then *s1 and *s2
are either both zero, or non of them is zero, so it is absolutely
pointless to check both of them.
s1++; s2++;
}
return *s1 == *s2;
}

but I don't get much performance gain. Any suggestion?

Instead of worrying about the performance of strncmp, you should ask
yourself why you make so many calls to it that it matters at all.

What are you doing that requires millions of calls to strncmp?


This is indeed the key question. I have had a case where I had a large
number of names and had to repeatedly look for names in that list and
came up with a very different solution: I made an index over the list
and "matched" the names I was looking for. The point is, I didn't simply
iterate over the list or make some tree search or hash function, but a
finite state machine (FSM). For fixed strings such as names it is very easy
to make a 'deterministic finite automaton' (DFA), a class of FSMs. What you
do is, you create an FSM of which each state represents the substring matched
so far from the beginning, the start state representing nothing matched yet.
Each state that represents a complete string of the original list is marked
an 'accept' state'. The matching algorithm is very simple: you start
from the start state and for each character of the name you are looking
for you find a transition to another state. If no transition is found the
name is not in the list and the search terminates. Otherwise, you follow
the transition to the next state and the process repeats until you don't
find a transition for the current character or you are at the end of the
name you are looking for. If the latter happens and you are in an accept
state, you have matched the name.

The advantage of this algorithm: the time complexity is O(n) where n is
the length of the name you are looking for. You might be tempted to say
it's O(n * t) where t is the number of transitions you have to follow at
each state, but t is limited by the size |T| of the character set and thus
we get O(n * |T|) = O(n).

If you have to, say, find x names in a list of y names the total complexity
becomes O(x * n) where n is now the average length of the x names, compared
with O(x * y * n) for linear search (the factor n then stems from strcmp) or
O(x * log y * n) for tree search. Hashing should have O(x * n) in the best
case. You probably don't get any faster than an FSM.

Building the FSM is about just as simple: for each name that you add to the
FSM you follow the FSM built so far similarly to the matching algorithm
described above. When you don't find a transition you add one to a state that
you also add and you move to that state, from which point on the algorithm
continues. The state you are in when you are at the end of the added string
is marked 'accept'. You can probably figure out for yourself how you would
remove names from the DFA, it's not a big deal.

Note that this is a very simplistic form of DFA: in general they can be much
more powerful and match patterns with repetitions (think of * in Unix file
name patterns). These are called regular expressions. To learn how to construct
such DFAs, read for instance 'Compilers: theory, techniques and tools' by Aho,
Sethi and Ullman (the 'dragon book'). Or you can generate them with e.g. lex.

Another note: you can use such an FSM as real index, pointing back to your
list of names by turning the 'accept' field into a pointer to the corresponding
list item (and NULL in states that are not accept states).

Finally, if it has advantages, it probably has disadvantages as well. That's
right: it consumes more memory and you had better take some care to do your
memory allocation efficiently: allocate states and transitions 1024 at a time
or some such.

--
ir. H.J.H.N. Kenter ^^
Electronic Design & Tools oo ) Philips Research Labs
Building WAY 3.23 =x= \ ar**********@ph ilips.com
Prof. Holstlaan 4 (WAY31) | \ tel. +31 40 27 45334
5656 AA Eindhoven /|__ \ tfx. +31 40 27 44626
The Netherlands (____)_/ http://www.kenter.demon.nl/

Famous last words: Segmentation Fault (core dumped)

Nov 14 '05 #16

You probably missed my point -

Comparing strings using any flavor of strcmp(), strncmp(),
requires that both strings be "walked" until a mismatch
occurs or until the end of the string or until "n" is reached.
This is true whether it is written in C or assembler.

If you have to call strncmp() thousands of times,
of *course* you will be seeing that strncmp() takes the
bulk of your time when you profile the code.

My point was rather than trying to optimize strncmp(),
you should try to minimize the calls to strncmp() by being
more clever in your use of data structures. For example,
if the list of KEYS in the KEY/value pairs can be sorted
a priori, then the use of a binary search algorithm
within the list, rather than a linear search, can drastically
reduce the number of times strncmp() will be called.
Similarly, a hashing algorithm of some kind might be used.
That's why I recommended going to comp.programmin g, or
(if there is such a thing) comp.algorithme s or something like
that.

pembed2003 wrote:
Nick Landsberg <hu*****@att.ne t> wrote in message news:<fA******* *************** @bgtnsc05-news.ops.worldn et.att.net>...
The problem is probably not in strncmp(), but elsewhere.

No, I time the function and found out that the strcmp is the slowest
function of all. Without it (of course, I can't really remove it), the
whole function is much much faster.

Comparing strings linearly is an expensive proposition
when there are many strings to be compared. If there's
only a handful (10 or so) it's not bad, if there are
thousands, then strncmp() without an intelligent
algorithm (e.g. binary search) behind it is probably
not the way to go.

The function will eventually be moved to a server to do key/value pair
lookup so it will be called tens of thousands of times and that's why
I want to optimize it.

Try a newsgroup dealing with algorithms. Maybe
comp.programm ing?

If I can't get a good answer here, I will try there. Thanks!


pembed2003 wrote:

Hi,
I have an application where I use the strncmp function extensively to
compare 2 string to see if they are the same or not. Does anyone know
a faster replacement for strncmp? I notice there is a memncmp function
which is very fast but it doesn't work the same like strncmp so I
don't think I can use it. I also tried to write the string_equal
function myself like:

int string_equal(co nst char* s1,const char* s2){
while(*s1 && *s2 && *s1 == *s2){
s1++; s2++;
}
return *s1 == *s2;
}

but I don't get much performance gain. Any suggestion?

Thanks!


--
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #17


pembed2003 wrote:

[ snip ]

I have a very simple hash table where the keys are string and values
are also string. What I want to let people do is:

hash_insert("ke y","value");

and then later retrive "value" by saying:

hash_lookup("ke y");

The hash algr. is very simple. It calculates the hash code like:

unsigned long hash_code(const char* s){
unsigned long int c = 0;
while(*s){
c = 131 * c + *s;
s++;
}
return c % MAX_HASH_SIZE;
}

At first glance, that does not look like a very robust
hash algorithm and it MAY be why you are doing so many
strncmp() calls. Have you any data on how many strings
hash into the same hash code? If it's more than 2 or 3
on average, then you should revise the hash algorithm
rather than trying to optimize strcmp().

A good hash algorithm can get down
to about 1.5 probes/search. Try the CRC algorithm
for starters.

It's possible for 2 different string to have the same hash code,
Very, very true. See above about a better hash algorithm.

so when I am doing lookup, I am doing something like:

char* hash_lookup(con st char* s){
unsigned long c = hash_code(s);
// Note I can't simply return the slot at c because
// it's possible that another different string is at
// this spot because of the hash algr. So I need to...
if(strcmp(s,(ha sh+c)->value) == 0){
// Found it...
}else{
// Do something else
}
}

So because of my hash algr., an extra strcmp is needed.


--
"It is impossible to make anything foolproof because fools are so
ingenious" - A. Bloch

Nov 14 '05 #18
pe********@yaho o.com (pembed2003) wrote in message
<stuff snipped>

Do you mean memcmp() here instead? Yes I think memcpy is faster than
strcmp or strncmp but I need to find out the longer string and pass in
the lenght of that. Otherwise, something like:

char* s1 = "auto";
char* s2 = "auto insurance";

memcmp(s1,s2,st rlen(s1));

will return 0 which isn't. I will need to do the extra work like:

int l1 = strlen(s1);
int l2 = strlen(s2);

memcmp(s1,s2,l1 > l2 ? l1 : l2);

Do you think that will be faster than a strcmp or strncmp?


As I mentioned in another part of the thread, if the lengths aren't the
same the strings aren't identical, in which case you must skip the memcmp.
Your memcmp will read memory past the end of one of the strings if the
length isn't equal, as you selected the longer of the two. But just
don't compare is a better solution.

Having seen the rest of the thread, your best bet seems to me
to get a better hashing algorithm and a large
enough (growing if needed) hashtable that you have few collisions.
A little profiling of your hashtable should show how well/poorly
your hash function is doing.

-David
Nov 14 '05 #19
In article <db************ *************@p osting.google.c om>,
pe********@yaho o.com (pembed2003) wrote:
It's possible for 2 different string to have the same hash code, so
when I am doing lookup, I am doing something like:

char* hash_lookup(con st char* s){
unsigned long c = hash_code(s);
// Note I can't simply return the slot at c because
// it's possible that another different string is at
// this spot because of the hash algr. So I need to...
if(strcmp(s,(ha sh+c)->value) == 0){
// Found it...
}else{
// Do something else
}
}

So because of my hash algr., an extra strcmp is needed.


So you are worrying about the performance of not ten thousands of
strcmp's per key, but ten thousands of strcmp's in total? Please tell
us, what does the "Giga" in "Gigahertz" mean? How many strcmp's per
seconds can your machine do? Did you measure how long things take? Will
anybody notice if the code runs faster?
Nov 14 '05 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
1884
by: jacob navia | last post by:
What is the result of that? I am reimplementing the C library, and I would like to know. Now I have it to return zero. Is that correct? The standard says: >> The strncmp function compares not more than n characters (characters that follow a
1
4225
by: Bert | last post by:
I have a question about buffer overflows with strcmp or strncmp (strnicmp). It's best described by providing an example Example: There is a string "sBuf" of length 5000. In a for loop from 0 to 5000, several NUL terminated strings are compared, these strings vary in size from 3 to 9. sBuf is user input (a file), memory is dynamically allocated using malloc (file length+1) and it is properly nul terminated. The other
4
14395
by: Preets | last post by:
Hi, Can anybody please tell me the difference between memcmp() and strncmp() functions ?
3
1900
by: arunraj2002in | last post by:
Hi All, I have doubt in strncmp. consider the following prog: int main() { int i; i = strncmp("Nstring","Astring",6);
12
1879
by: Krumble Bunk | last post by:
Hi all, Having some trouble with a seemingly-simple task. Need to have a basic CLI functionality as per: prompt <- \n prompt <- \n promptquit <- should quit as per my snippet, but doesn't. I've thrown a few random fflush()'s in, but no joy. I have never
0
11236
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10830
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10468
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9641
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8018
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7172
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
6061
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4682
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4276
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.