Compare without regard to case

JKop

Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".
Thanks,

-JKop

Jul 22 '05 #1

Subscribe Post Reply

2215

Gernot Frisch

"JKop" <NU**@NULL.NULL> schrieb im Newsbeitrag
news:ab******************@news.indigo.ie...

Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

stricmp(str1, str2);

Jul 22 '05 #2

Sharad Kala

"JKop" <NU**@NULL.NULL> wrote in message

Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

One way is to inherit from char_traits<...> and provide the necessary
overrides. Then use that class instead of std::string. This is discussed in
this GotW series - http://www.gotw.ca/gotw/029.htm
Sharad

Jul 22 '05 #3

John Harrison

"Gernot Frisch" <Me@Privacy.net> wrote in message
news:2q*************@uni-berlin.de...

"JKop" <NU**@NULL.NULL> schrieb im Newsbeitrag
news:ab******************@news.indigo.ie...
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

stricmp(str1, str2);

stricmp is not standard C or C++.

john

Jul 22 '05 #4

John Harrison

"JKop" <NU**@NULL.NULL> wrote in message
news:ab******************@news.indigo.ie...

Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".
Thanks,

-JKop

There is no standard C++ function for doing that. You could write something
yourself using the toupper or tolower functions which operate on individual
characters.

john

Jul 22 '05 #5

Sharad Kala

"Gernot Frisch" <Me@Privacy.net> wrote in message
news:2q*************@uni-berlin.de...

"JKop" <NU**@NULL.NULL> schrieb im Newsbeitrag
news:ab******************@news.indigo.ie...
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

stricmp(str1, str2);

Isn't that non-standard ?

Jul 22 '05 #6

Gernot Frisch

> stricmp is not standard C or C++.

Uh!? But strcmp is?

Jul 22 '05 #7

John Harrison

"Gernot Frisch" <Me@Privacy.net> wrote in message
news:2q*************@uni-berlin.de...

stricmp is not standard C or C++.

Uh!? But strcmp is?

Right.

john

Jul 22 '05 #8

JKop

John Harrison posted:

"JKop" <NU**@NULL.NULL> wrote in message
news:ab******************@news.indigo.ie...
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for comparing two strings without regard to case. Both for
working with "char*", and with "std::string".
Thanks,

-JKop
There is no standard C++ function for doing that. You

could write something yourself using the toupper or tolower functions which operate on individual characters.

john

Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function, could they please
copy-paste it here, or perhaps post the code for that
"stricmp" function.

Thanks,

-JKop

Jul 22 '05 #9

Tim Love

JKop <NU**@NULL.NULL> writes:

Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function,

Functions like this have been posted here in the past, based around
lines like
transform(text.begin(),text.end(),text.begin(),tou pper);

Jul 22 '05 #10

Peter Koch Larsen

"JKop" <NU**@NULL.NULL> skrev i en meddelelse
news:ab******************@news.indigo.ie...

Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".
Thanks,

-JKop

This is not so simple as it sounds - and this in one of the reasons, there
is no "standard" C++ function of that type.
One difficulty is related to the fact that different countries have
different rules for collation. Sometimes the rules even differ corresponding
to context (is it a dictionary or a telephone book) and sometimes the rules
even differ according to the meaning of the word.
But for an explanation of this, do go to comp.lang.cpp.moderated and search
for recent discussions there (i believe it started in august and lasted
about a month).

Kind regards
Peter

Jul 22 '05 #11

David Fisher

JKop wrote:

Can anyone please inform me of a Standard C++
function for comparing two strings without regard
to case. Both for "char*" and "std::string".

Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function, could they please
copy-paste it here, or perhaps post the code for that
"stricmp" function.

#include <cctype> // for tolower()
#include <cassert>

// returns < 0 if s1 < s2, > 0 if s1 > s2 or 0 if the strings
// are equal (without regard to case)
// ie. behaves like strcmp()

int stricmp(const char *s1, const char *s2)
{
while (*s1 && *s2)
{
if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}
}

return (*s1 ? 1 : (*s2 ? -1 : 0));
}

int stricmp(std::string s1, std::string s2)
{
return stricmp(s1.c_str(), s2.c_str());
}

void test_stricmp()
{
assert(stricmp("abc", "abc") == 0);
assert(stricmp("abc", "ABC") == 0);
assert(stricmp("abc", "DEF") < 0);
assert(stricmp("ABC", "def") < 0);
assert(stricmp("DEF", "abc") > 0);
assert(stricmp("def", "ABC") > 0);
assert(stricmp("abc", "abca") < 0);
assert(stricmp("abca", "abc") > 0);
assert(stricmp("", "") == 0);
assert(stricmp("", "a") < 0);
assert(stricmp("a", "") > 0);
}

David Fisher
Sydney, Australia

Jul 22 '05 #12

Julie

JKop wrote:

Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

Do you want to compare (for collation), or to strictly test for equality?

Jul 22 '05 #13

John Harrison

"David Fisher" <da***@hsa.com.au> wrote in message
news:AK****************@nasal.pacific.net.au...

JKop wrote:
Can anyone please inform me of a Standard C++
function for comparing two strings without regard
to case. Both for "char*" and "std::string".

Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function, could they please
copy-paste it here, or perhaps post the code for that
"stricmp" function.

#include <cctype> // for tolower()
#include <cassert>

// returns < 0 if s1 < s2, > 0 if s1 > s2 or 0 if the strings
// are equal (without regard to case)
// ie. behaves like strcmp()

int stricmp(const char *s1, const char *s2)
{
while (*s1 && *s2)
{
if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}
}

return (*s1 ? 1 : (*s2 ? -1 : 0));
}

int stricmp(std::string s1, std::string s2)
{
return stricmp(s1.c_str(), s2.c_str());
}

void test_stricmp()
{
assert(stricmp("abc", "abc") == 0);
assert(stricmp("abc", "ABC") == 0);
assert(stricmp("abc", "DEF") < 0);
assert(stricmp("ABC", "def") < 0);
assert(stricmp("DEF", "abc") > 0);
assert(stricmp("def", "ABC") > 0);
assert(stricmp("abc", "abca") < 0);
assert(stricmp("abca", "abc") > 0);
assert(stricmp("", "") == 0);
assert(stricmp("", "a") < 0);
assert(stricmp("a", "") > 0);
}

David Fisher
Sydney, Australia

It's an error to pass a char to tolower. The valid inputs for tolower are
integers in the range 0 to UCHAR_MAX and EOF. Because chars maybe signed
then passing a char to tolower may result in a negative value being passed
and that has undefined behaviour. Unsigned char is not a problem.

For the same reason

transform(text.begin(),text.end(),text.begin(),tou pper);

suggested by Tim Love is also invalid.

Also this statement

if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}

is bugged because s1 and s2 will be incremented in the if statement before
the subtraction is done.

So I'd suggest

int stricmp(const char *s1, const char *s2)
{
while (*s1 && *s2)
{
unsigned char ch1 = *s1;
unsigned char ch2 = *s2;
if (tolower(ch1) != tolower(ch2))
{
return (int)ch1 - (int)ch2;
}
++s1;
++s2;
}

return (*s1 ? 1 : (*s2 ? -1 : 0));
}

but I haven't tested it.

John

Jul 22 '05 #14

David Fisher

John Harrison wrote:

It's an error to pass a char to tolower. The valid inputs for tolower are
integers in the range 0 to UCHAR_MAX and EOF. Because chars maybe signed
then passing a char to tolower may result in a negative value being passed
and that has undefined behaviour. Unsigned char is not a problem.
I see your point, but it's very surprising ... most people would expect
something like tolower('A') to return 'a'. I guess it's only a problem
for character sets with upper case characters >= 128 decimal (in ASCII,
upper case letters are from 65 to 90). Are there any character sets
like this you are aware of ? (I don't know EBCDIC).

BTW the UNIX manual entry on my machine says that for any values other
than upper case letters, the argument value is returned unchanged (rather
than being undefined behaviour).
Also this statement

if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}

is bugged because s1 and s2 will be incremented in the if statement before
the subtraction is done.

Oops .. of course it was a deliberate mistake ... :-P

Thanks for the comments,

David Fisher
Sydney, Australia

Jul 22 '05 #15

Kai-Uwe Bux

JKop wrote:

John Harrison posted:

"JKop" <NU**@NULL.NULL> wrote in message
news:ab******************@news.indigo.ie...
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for comparing two strings without regard to case. Both for
working with "char*", and with "std::string".
Thanks,

-JKop

There is no standard C++ function for doing that. You

could write
something yourself using the toupper or tolower functions

which operate
on individual characters.

john

Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function, could they please
copy-paste it here, or perhaps post the code for that
"stricmp" function.

Thanks,

-JKop

Ignoring case is relative to the locale you want to use. The following code
uses the global locale by default:
#include <locale>
#include <string>
#include <iostream>

bool
string_equal_to_ignoring_case ( std::string a,
std::string b,
std::locale loc = std::locale() ) {
if ( a.size() != b.size() ) {
return( false );
}
std::string::size_type length = a.size();
for ( std::string::size_type i = 0;
i < length;
++i ) {
if ( std::tolower( a[i], loc ) != std::tolower( b[i], loc ) ) {
return( false );
}
}
return( true );
}
int main ( void ) {
std::string a ( "Hello World!" );
std::string b ( "hello world!" );

std::cout << string_equal_to_ignoring_case( a, b ) << '\n';
}
Best

Kai-Uwe Bux

Jul 22 '05 #16

John Harrison

"David Fisher" <da***@hsa.com.au> wrote in message
news:S3****************@nasal.pacific.net.au...

John Harrison wrote:
It's an error to pass a char to tolower. The valid inputs for tolower are
integers in the range 0 to UCHAR_MAX and EOF. Because chars maybe signed
then passing a char to tolower may result in a negative value being
passed
and that has undefined behaviour. Unsigned char is not a problem.
I see your point, but it's very surprising ... most people would expect
something like tolower('A') to return 'a'.

tolower((unsigned char)'A') will return 'a'.
I guess it's only a problem
for character sets with upper case characters >= 128 decimal (in ASCII,
upper case letters are from 65 to 90). Are there any character sets
like this you are aware of ? (I don't know EBCDIC).
Passing any negative value (other than EOF) to any of the character
classification routines (tolower, islower, isalpha etc) is undefined
behaviour. If you are sure that your 8 bit char string will only ever
contains character codes in the range 0 to 127 then there is no problem. But
you can't be sure of that in a library routine like stricmp.

BTW the UNIX manual entry on my machine says that for any values other
than upper case letters, the argument value is returned unchanged (rather
than being undefined behaviour).

C99 standard 7.4 para 1, 'In all cases [talking about <ctype.h>] the
argument is an int, the value of which shall be representable as an unsigned
char or shall equal the macro EOF'.

But passing a char to ctype.h routines is such a common practise that I
wouldn't be surprised if most compilers accepted negative values and defined
some reasonable behaviour for them.

john

Jul 22 '05 #17

JKop

Julie posted:

JKop wrote:

Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for comparing two strings without regard to case. Both for working with "char*", and with "std::string".
Do you want to compare (for collation), or to strictly

test for equality?

Actually, it's for filenames.

kernel32.dll

and

Kernel32.DLL

and

KerNel32.DlL

are the same file!

-JKop

Jul 22 '05 #18

JKop

tolower((unsigned char)'A') will return 'a'.

(unsigned char)'A' disgusts me!
unsigned char('A')
Also, if you're going for ultimate efficency:

The inputed char:

char k = 'A';

unsigned char& uk = *reinterpret_cast<unsigned char*>(&k);

(I first thought of using a union but the above is better)
-JKop

Jul 22 '05 #19

Peter Koch Larsen

"JKop" <NU**@NULL.NULL> skrev i en meddelelse
news:8G******************@news.indigo.ie...

Julie posted:
JKop wrote:

Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for comparing two strings without regard to case. Both for working with "char*", and with "std::string".

Do you want to compare (for collation), or to strictly

test for equality?

Actually, it's for filenames.

kernel32.dll

and

Kernel32.DLL

and

KerNel32.DlL

are the same file!

-JKop

In that case you should compare the same way windows does. I do not know if
Windows compares according to the standard locale on the machine, but my
guess is that they would use some homegrown scheme, where e.g. the danish
letter "ø" compares equal to "Ø" but the german small double s (looks like
the greek beta) is not equal to "SS".

/Peter

Jul 22 '05 #20

Richard Herring

In message <AK****************@nasal.pacific.net.au>, David Fisher
<da***@hsa.com.au> writes

JKop wrote:
Can anyone please inform me of a Standard C++
function for comparing two strings without regard
to case. Both for "char*" and "std::string".

Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function, could they please
copy-paste it here, or perhaps post the code for that
"stricmp" function.

#include <cctype> // for tolower()
#include <cassert>

// returns < 0 if s1 < s2, > 0 if s1 > s2 or 0 if the strings
// are equal (without regard to case)
// ie. behaves like strcmp()

int stricmp(const char *s1, const char *s2)
{
while (*s1 && *s2)
{
if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}
}

return (*s1 ? 1 : (*s2 ? -1 : 0));
}

int stricmp(std::string s1, std::string s2)
{
return stricmp(s1.c_str(), s2.c_str());
}

Won't behave correctly if either of the strings contains embedded '\0',
as comparison will stop at the first one.
--
Richard Herring

Jul 22 '05 #21

Andre Heinen

On Thu, 16 Sep 2004 08:03:44 GMT, JKop <NU**@NULL.NULL> wrote:

char k = 'A';

unsigned char& uk = *reinterpret_cast<unsigned char*>(&k);

Do we have any guarantee that the bit representations are the
same for signed char and unsigned char? What about machines that
use one's complement or BCD?

--
Andre Heinen
My address is "a dot heinen at europeanlink dot com"

Jul 22 '05 #22

Mark Wright

One joyful day (Thu, 16 Sep 2004 09:58:35 +1000 to be precise), "David
Fisher" <da***@hsa.com.au> decided that the Usenet community would
benefit from this remarkable comment:

<...>

int stricmp(std::string s1, std::string s2)
{
return stricmp(s1.c_str(), s2.c_str());
}

After many years of doing this myself, this kind of thing now bugs me
intensely. What, you may ask?

Well, I once had a performance issue with a core piece of code I'd
written and ran it through a profiler. To my surprise, way at the top of
the list was the copy constructor for std::string! Replacing this:

int stricmp(std::string s1, std::string s2)

with this:

int stricmp(const std::string &s1, const std::string &s2)

is a whole lot more efficient.

Mark Wright
- ma**@giallo.demon.nl

================Today's Thought====================
"In places where books are burned, one day,
people will be burned" - Heinrich Heine, Germany -
100 years later, Hitler proved him right
================================================== =

Jul 22 '05 #23

JKop

Andre Heinen posted:

On Thu, 16 Sep 2004 08:03:44 GMT, JKop <NU**@NULL.NULL> wrote:

char k = 'A';

unsigned char& uk = *reinterpret_cast<unsigned char*>(&k);

Do we have any guarantee that the bit representations are the
same for signed char and unsigned char? What about machines that
use one's complement or BCD?

Yes, you're guaranteed that all positive numbers will have the same bit
representation for both signed and unsigned and that's for all the integer
types. (Taking this from my memory, it's there in the standard...
somewhere...).
-JKop

Jul 22 '05 #24

Andre Heinen

On Thu, 16 Sep 2004 11:10:50 GMT, JKop <NU**@NULL.NULL> wrote:

Yes, you're guaranteed that all positive numbers will have the same bit
representation for both signed and unsigned and that's for all the integer
types. (Taking this from my memory, it's there in the standard...
somewhere...).

Actually I was worrying about negative numbers (e.g. ASCII
between 128 and 255).

--
Andre Heinen
My address is "a dot heinen at europeanlink dot com"

Jul 22 '05 #25

Julie

JKop wrote:

Julie posted:
JKop wrote:

Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for comparing two strings without regard to case. Both for working with "char*", and with "std::string".

Do you want to compare (for collation), or to strictly

test for equality?

Actually, it's for filenames.

kernel32.dll

and

Kernel32.DLL

and

KerNel32.DlL

are the same file!

-JKop

Then you are better suited using an equality test. I'd suggest reposting under
that premise, and you should get better responses (reason being:
compare/collation includes a lot of unnecessary features that aren't needed for
a simple equality test, and those features aren't simple to resolve in a
platform independent/locale independent way).

Jul 22 '05 #26

Rich Grise

On Thursday 16 September 2004 02:52 am, Richard Herring did deign to grace
us with the following:

In message <AK****************@nasal.pacific.net.au>, David Fisher
<da***@hsa.com.au> writes
JKop wrote:

int stricmp(const char *s1, const char *s2)
{
while (*s1 && *s2)
{
if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}
}

return (*s1 ? 1 : (*s2 ? -1 : 0));
}

int stricmp(std::string s1, std::string s2)
{
return stricmp(s1.c_str(), s2.c_str());
}

Won't behave correctly if either of the strings contains embedded '\0',
as comparison will stop at the first one.

When they made the enhancements to C to create C++, did they redefine
the string terminator? I haven't finished the book yet, but I'd have
thought something like that would be kind of important to highlight.

Thanks,
Rich

Jul 22 '05 #27

David Fisher

Rich wrote:

int stricmp(std::string s1, std::string s2)
{
return stricmp(s1.c_str(), s2.c_str());
}

Won't behave correctly if either of the strings contains embedded '\0',
as comparison will stop at the first one.

When they made the enhancements to C to create C++, did they redefine
the string terminator? I haven't finished the book yet, but I'd have
thought something like that would be kind of important to highlight.

He was pointing out that a std::string can contain an embedded null
character, and is defined by the string length rather than requiring a
terminator ... the value returned by c_str() has a '\0' at the end, but if
there is an earlier '\0' then the string will seem shorter.

I guess this is important in a library function like stricmp() ... maybe
overkill for other situations ...

David Fisher
Sydney, Australia

Jul 22 '05 #28

Richard Herring

In message <xWn3d.6577$464.5753@trnddc01>, Rich Grise <nu**@example.net>
writes

On Thursday 16 September 2004 02:52 am, Richard Herring did deign to grace
us with the following:
In message <AK****************@nasal.pacific.net.au>, David Fisher
<da***@hsa.com.au> writes
[...]

int stricmp(std::string s1, std::string s2)
{
return stricmp(s1.c_str(), s2.c_str());
}

Won't behave correctly if either of the strings contains embedded '\0',
as comparison will stop at the first one.

When they made the enhancements to C to create C++, did they redefine
the string terminator? I haven't finished the book yet, but I'd have
thought something like that would be kind of important to highlight.

std::string doesn't require a terminator, as it keeps an explicit record
of the length. The only parts of it that have any concept of terminator
are those functions which take a single pointer as argument and expect a
C-style null-terminated char array.

String constants still contain a terminating null, and the str...
functions are unchanged from C.

--
Richard Herring

Jul 22 '05 #29

Compare without regard to case

Similar topics