473,883 Members | 1,681 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

std::string and case insensitive comparison

Hi,

what is the most efficient way of doing a case insensitive comparison ?
I am trying to write a universal String class and I am stuck with the
case insensitive part :

TCHAR is a char in MultiByte String env (MBCS)
and wchar_t if UNICODE

#if defined(WIN32) || defined(UNDER_C E)
typedef std::basic_stri ng<TCHAR, std::char_trait s<TCHAR>,
std::allocator< TCHAR tstring;
#else

#endif
#endif
class String : public Object
{
private:
tstring m_str;

public:
String(){}

String(LPCTSTR lpsz)
{
m_str = lpsz;
}

String(tstring str)
{
m_str = str;
}
// Comparison
int Compare( LPCTSTR psz ) const
{
return m_str.compare(p sz);
}

int CompareNoCase( LPCTSTR psz ) const
{
???
}

// Convert the string to lowercase
String& MakeLower( LPCTSTR psz )
{
std::transform( m_str.begin(), m_str.end(), m_str.begin(), tolower);
return *this;
}

}
Jul 19 '07 #1
14 24294
On 2007-07-20 00:07, Mosfet wrote:
Hi,

what is the most efficient way of doing a case insensitive comparison ?
I am trying to write a universal String class and I am stuck with the
case insensitive part :
Usually it involves modifying both strings to either upper- or lower-
case and then compare. Checkout the toupper()/tolower() functions from
<cctype>/ctype.h>.

--
Erik Wikström
Jul 19 '07 #2
"Mosfet" <an*******@free .frwrote in message
news:46******** *************** @news.free.fr.. .
Hi,

what is the most efficient way of doing a case insensitive comparison ?
I am trying to write a universal String class and I am stuck with the case
insensitive part :

TCHAR is a char in MultiByte String env (MBCS)
and wchar_t if UNICODE

#if defined(WIN32) || defined(UNDER_C E)
typedef std::basic_stri ng<TCHAR, std::char_trait s<TCHAR>,
std::allocator< TCHAR tstring;
#else

#endif
#endif
class String : public Object
{
private:
tstring m_str;

public:
String(){}

String(LPCTSTR lpsz)
{
m_str = lpsz;
}

String(tstring str)
{
m_str = str;
}
// Comparison
int Compare( LPCTSTR psz ) const
{
return m_str.compare(p sz);
}

int CompareNoCase( LPCTSTR psz ) const
{
???
}

// Convert the string to lowercase
String& MakeLower( LPCTSTR psz )
{
std::transform( m_str.begin(), m_str.end(), m_str.begin(), tolower);
return *this;
}

}
This is what I use. I'm not sure if it's optimal, but it works.

bool StrLowCompare( std::string String1, std::string String2 )
{
std::transform( String1.begin() , String1.end(), String1.begin() ,
tolower);
std::transform( String2.begin() , String2.end(), String2.begin() ,
tolower);
return String1 == String2;
}
Jul 19 '07 #3
Jim Langston a écrit :
"Mosfet" <an*******@free .frwrote in message
news:46******** *************** @news.free.fr.. .
>Hi,

what is the most efficient way of doing a case insensitive comparison ?
I am trying to write a universal String class and I am stuck with the case
insensitive part :

TCHAR is a char in MultiByte String env (MBCS)
and wchar_t if UNICODE

#if defined(WIN32) || defined(UNDER_C E)
typedef std::basic_stri ng<TCHAR, std::char_trait s<TCHAR>,
std::allocator <TCHAR tstring;
#else

#endif
#endif
class String : public Object
{
private:
tstring m_str;

public:
String(){}

String(LPCTS TR lpsz)
{
m_str = lpsz;
}

String(tstri ng str)
{
m_str = str;
}
// Comparison
int Compare( LPCTSTR psz ) const
{
return m_str.compare(p sz);
}

int CompareNoCase( LPCTSTR psz ) const
{
???
}

// Convert the string to lowercase
String& MakeLower( LPCTSTR psz )
{
std::transform (m_str.begin(), m_str.end(), m_str.begin(), tolower);
return *this;
}

}

This is what I use. I'm not sure if it's optimal, but it works.

bool StrLowCompare( std::string String1, std::string String2 )
{
std::transform( String1.begin() , String1.end(), String1.begin() ,
tolower);
std::transform( String2.begin() , String2.end(), String2.begin() ,
tolower);
return String1 == String2;
}

Ir doesn't seem very efficient...


Jul 19 '07 #4
"Mosfet" <an*******@free .frwrote in message
news:46******** *************** @news.free.fr.. .
Jim Langston a écrit :
>"Mosfet" <an*******@free .frwrote in message
news:46******* *************** *@news.free.fr. ..
>>Hi,

what is the most efficient way of doing a case insensitive comparison ?
I am trying to write a universal String class and I am stuck with the
case insensitive part :

TCHAR is a char in MultiByte String env (MBCS)
and wchar_t if UNICODE

#if defined(WIN32) || defined(UNDER_C E)
typedef std::basic_stri ng<TCHAR, std::char_trait s<TCHAR>,
std::allocato r<TCHAR tstring;
#else

#endif
#endif
class String : public Object
{
private:
tstring m_str;

public:
String(){}

String(LPCTST R lpsz)
{
m_str = lpsz;
}

String(tstrin g str)
{
m_str = str;
}
// Comparison
int Compare( LPCTSTR psz ) const
{
return m_str.compare(p sz);
}

int CompareNoCase( LPCTSTR psz ) const
{
???
}

// Convert the string to lowercase
String& MakeLower( LPCTSTR psz )
{
std::transfor m(m_str.begin() , m_str.end(), m_str.begin(), tolower);
return *this;
}

}

This is what I use. I'm not sure if it's optimal, but it works.

bool StrLowCompare( std::string String1, std::string String2 )
{
std::transform( String1.begin() , String1.end(), String1.begin() ,
tolower);
std::transform( String2.begin() , String2.end(), String2.begin() ,
tolower);
return String1 == String2;
}

Ir doesn't seem very efficient...
No, it doesn't. But then, there is no way to do a case insensitive compare
without converting both to either upper or lower. Or to determine if one is
uppercase before converting to lower, but it probably takes about the same
amount of time for the if statement.

Basically, that's the way case insensitve works. You convert both to upper
or lower, then compare, or compare character by character converting.

It may be faster to compare character by character and see if you can return
early without having to go through the whole string, but the you're doing a
bunch of if statments anyway. I.E something like: (untested code)

bool StrLowCompare( std::string& String1, std::string& String2 )
{
if ( String1.size() != String2.size() )
return false;

for ( std::string::si ze_type i = 0; i < String1.size(); ++i )
{
if ( tolower( String1[i] ) != tolower( String2[i] )
return false;
}
return true;
}
Jul 19 '07 #5
>
bool StrLowCompare( std::string& String1, std::string& String2 )
{
if ( String1.size() != String2.size() )
return false;

for ( std::string::si ze_type i = 0; i < String1.size(); ++i )
{
if ( tolower( String1[i] ) != tolower( String2[i] )
return false;
}
return true;
}
If I had a pound for everytime this mistake is made I would be as rich
as Bill Gates.
tolower( String1[i] )

is undefined since char may be signed and therefore you may pass a
negative number to tolower. tolower is only defined on integer values in
the range of unsigned char and the value of EOF.

tolower( (unsigned char) String1[i] )

is correct.

This also means that

std::transform( str.begin(), str.end(), tolower)

is undefined for the same reason.

john
Jul 20 '07 #6
John Harrison wrote:
>>
bool StrLowCompare( std::string& String1, std::string& String2 )
{
if ( String1.size() != String2.size() )
return false;

for ( std::string::si ze_type i = 0; i < String1.size(); ++i )
{
if ( tolower( String1[i] ) != tolower( String2[i] )
return false;
}
return true;
}

If I had a pound for everytime this mistake is made I would be as rich
as Bill Gates.
tolower( String1[i] )

is undefined since char may be signed and therefore you may pass a
negative number to tolower. tolower is only defined on integer values in
the range of unsigned char and the value of EOF.

tolower( (unsigned char) String1[i] )

is correct.

This also means that

std::transform( str.begin(), str.end(), tolower)

is undefined for the same reason.
That wording is a little too harsh. The above code has perfectly
well-defined behavior for quite a lot of input values. To dismiss it as
undefined is like saying *p is undefined since p might be null. I agree,
however, that one can and should do better.

For the use in std::transform( ), I would suggest a function object like
this:

#include <locale>
#include <string>
#include <iostream>
#include <algorithm>

class to_lower {

std::locale const & loc;

public:

to_lower ( std::locale const & r_loc = std::locale() )
: loc ( r_loc )
{}

template < typename CharT >
CharT operator() ( CharT chr ) const {
return( std::tolower( chr, this->loc ) );
}

}; // class to_lower;

int main ( void ) {
std::string str ( "Hello World!" );
std::transform ( str.begin(), str.end(), str.begin(), to_lower() );
std::cout << str << '\n';
}
Best

Kai-Uwe Bux
Jul 20 '07 #7
On Jul 20, 12:56 am, "Jim Langston" <tazmas...@rock etmail.comwrote :
"Mosfet" <anonym...@free .frwrote in message
news:46******** *************** @news.free.fr.. .
[...]
This is what I use. I'm not sure if it's optimal, but it works.
bool StrLowCompare( std::string String1, std::string String2 )
{
std::transform( String1.begin() , String1.end(), String1.begin() ,
tolower);
std::transform( String2.begin() , String2.end(), String2.begin() ,
tolower);
return String1 == String2;
}
Using which headers? It shouldn't compile if you happen to
include <locale>, and <stringmay (or may not) include
<locale>. (With g++, <stringdoesn' t include <locale>, but
<iostreamdoes , so if you happen to also include <iostream>, it
doesn't compile.) If it does compile, it has undefined
behavior if char is signed. (Of course, g++ and VC++ have
options to force char to be unsigned, so if you're using these,
you may be OK.)

If you want to use transform, the correct invocation is:

std::transform(
source.begin(), source.end(),
std::back_inser ter( dest ),
boost::bind(
&Cvt::tolowe r,
&std::use_facet < Cvt >( std::locale() ),
_1 ) ) ;

You need boost::bind, or else write you're own functional
object.

Note that using std::equal with boost::transfor m_iterator (and
the functional object used with transform---either your own, or
the results of boost::bind) will allow the comparison without
making a copy.

--
James Kanze (GABI Software) email:ja******* **@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jul 20 '07 #8
On Jul 20, 1:27 am, "Jim Langston" <tazmas...@rock etmail.comwrote :
"Mosfet" <anonym...@free .frwrote in message
[...]
This is what I use. I'm not sure if it's optimal, but it works.
bool StrLowCompare( std::string String1, std::string String2 )
{
std::transform( String1.begin() , String1.end(), String1.begin() ,
tolower);
std::transform( String2.begin() , String2.end(), String2.begin() ,
tolower);
return String1 == String2;
}
Ir doesn't seem very efficient...
No, it doesn't. But then, there is no way to do a case
insensitive compare without converting both to either upper or
lower. Or to determine if one is uppercase before converting
to lower, but it probably takes about the same amount of time
for the if statement.
If you do it on a character by character basis, you can avoid
the copy of the string. If the string is long, this could be
significant.

On the other hand, if you're doing a lot of comparisons, it's
probably better to convert the strings once, and then just use
== on them.
Basically, that's the way case insensitve works. You convert
both to upper or lower, then compare, or compare character by
character converting.
Actually, it's more complicated than that. In German, for
example, in a case insensitive comparison, the single character
'ß' must compare equal to the two character sequence "SS"; in
Swiss German (and according to DIN, I think), 'ä' compares equal
to "AE", etc., where as in Turkish, 'i' shouldn't compare equal
to 'I'. Just defining what you actually mean by "case
insensitive compare" is a nightmare, even before you start
trying to implement it.
It may be faster to compare character by character and see if
you can return early without having to go through the whole
string, but the you're doing a bunch of if statments anyway.
There's never the slightest need for an if; tolower is generally
implemented as a simple table mapping. By comparing character
by character, however, you save copying the string.

--
James Kanze (GABI Software) email:ja******* **@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jul 20 '07 #9
On Jul 19, 5:07 pm, Mosfet <anonym...@free .frwrote:
Hi,

what is the most efficient way of doing a case insensitive comparison ?
I am trying to write a universal String class and I am stuck with the
case insensitive part :

TCHAR is a char in MultiByte String env (MBCS)
and wchar_t if UNICODE

#if defined(WIN32) || defined(UNDER_C E)
typedef std::basic_stri ng<TCHAR, std::char_trait s<TCHAR>,
std::allocator< TCHAR tstring;
#else

#endif
#endif

class String : public Object
{
....
}
I have no idea why you have an Object class unless you are trying to
mimic Java. In "Exceptiona l C++" by Herb Sutter, there is an
excellent solution: provide your own char_traits<TCH AR(pp. 5-6).
You will have to overload operators <<, >>, =, +, +=, -, -=, etc. (or
always call c_str()).

template<typena me CHAR>
class ci_char_traits: public char_traits<CHA R>
{
public: static bool eq( CHAR c1, CHAR c2 )
{
return( toupper( static_cast<uns igned char>( c1 ))
== toupper( static_cast<uns igned char>( c2 )));
}
public: static bool lt( CHAR c1, CHAR c2 )
{
return( toupper( static_cast<uns igned char>( c1 ))
< toupper( static_cast<uns igned char>( c2 )));
}
public: static int compare( CHAR *s1, CHAR *s2, size_t n )
{
return( memicmp( s1, s2, n ));
}
public: static const CHAR *find( const CHAR *s, int n, CHAR a )
{
while((n-- 0) && toupper( static_cast<uns igned char>( *s ))
!= toupper( static_cast<uns igned char>( a )))
{
++s;
}
return((n 0)?(s):(NULL));
}
};

static_cast<>'s would've been forgotten if it hadn't been for John
Harrison. Keep in mind James Kanze's comments about equality. For
more information, search the Internet for Java collator classes.

Milburn Young

Jul 20 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2557
by: Neil Zanella | last post by:
Hello, Consider the following program. There are two C style string stack variables and one C style string heap variable. The compiler may or may not optimize the space taken up by the two stack variables by placing them at the same address (my g++ compiler does this). Therefore the output of the given C program is compiler dependent. What is worse, the program does not do what its writer most likely intended, since, std::set's find()...
5
16589
by: Nils O. Selåsdal | last post by:
Is there some quick C++ way I can do something similar to string::find , but case insensitive ?
3
7019
by: Gernot Frisch | last post by:
hi, what would be the easiest (fastest to write) way to abbrevate from std::string and make it case insensitive for use with std::map? Overwrite c'tors and = operator, or overwrite <,>, and compare functions? -- -Gernot int main(int argc, char** argv) {printf
4
24892
by: Jim Langston | last post by:
Is there any builtin lowercase std::string compare? Right now I'm doing this: if ( _stricmp( AmmoTypeText.c_str(), "GunBullet" ) == 0 ) AmmoType = Item_Ammo_GunBullet; Is there anything the standard library to do this? I'm not interested in Boost until it becomes part of the standard.
1
5538
by: benhoefer | last post by:
I have been searching around and have not been able to find any info on this. I have a unique situation where I need a case sensitive map: std::map<string, intimap; I need to be able to run a find on this map with a case sensitive AND case insensitive search. I need to be able to change this dynamically during execution. Is this possible? Any thoughts on this? I understand that I can make the map case insensitive, but that is not...
7
3345
by: Adrian | last post by:
Hi, I want a const static std::set of strings which is case insensitive for the values. So I have the following which seems to work but something doesnt seem right about it. Is there a better way or any gotcha's from my code below.
4
8195
by: bb | last post by:
Hi, void fun(const std::map<std::string,int>& m1) { // How to make a case insensitive search of this map without making a copy? } cheers.
0
9792
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11142
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10743
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10415
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7128
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5797
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5991
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4220
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3232
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.