Connecting Tech Pros Worldwide Forums | Help | Site Map

Error in string comparison (Non-English windows)

Usman Jamil
Guest
 
Posts: n/a
#1: Dec 20 '06
Hi

I'm having a strange error while comparing two strings. Please check the
code below. This is a simple string comparison code and works just fine on
all of my machines. While debugging an issue on a client's machine, who had
turkish windows installed on his system, I found out that this simple piece
of code does'nt work. The messages boxes that are displayed are in this
sequence.

1. to upper works with szWINDOWS
2. to lower does'nt works with szWINDOWS
3. to upper does'nt works with szwindows
4. to lower works with szwindows

It seems like ToUpper and ToLower are'nt working at all and .Equals() method
is being passed the original values of the variables szWINDOWS and
szwindows. Does this problem have anything to do with the Turkish window
that is installed on the client's machine, or is it a known issue.

string szWINDOWS = "WINDOWS";
string szwindows = "windows";

if(szWINDOWS.ToUpper().Equals ("WINDOWS") )
System.Windows.Forms.MessageBox.Show("to upper works with
szWINDOWS");
else
System.Windows.Forms.MessageBox.Show("to upper does'nt works with
szWINDOWS");

if(szWINDOWS.ToLower().Equals ("windows"))
System.Windows.Forms.MessageBox.Show("to lower works with
szWINDOWS");
else
System.Windows.Forms.MessageBox.Show("to lower does'nt works with
szWINDOWS");

if(szwindows.ToUpper().Equals ("WINDOWS"))
System.Windows.Forms.MessageBox.Show("to upper works with
szwindows");
else
System.Windows.Forms.MessageBox.Show("to upper does'nt works with
szwindows");

if(szwindows.ToLower().Equals ("windows"))
System.Windows.Forms.MessageBox.Show("to lower works with
szwindows");
else
System.Windows.Forms.MessageBox.Show("to lower does'nt works with
szwindows");

Regards

Usman



Marc Gravell
Guest
 
Posts: n/a
#2: Dec 20 '06

re: Error in string comparison (Non-English windows)


Jon Skeet posted something about this a few days ago. The implication
that it was non-obvious, but deliberate (also included is correct
approach):

http://groups.google.co.uk/group/mic...8859549b4ed346

Marc


Marc Gravell
Guest
 
Posts: n/a
#3: Dec 20 '06

re: Error in string comparison (Non-English windows)


Or: http://tinyurl.com/y3o9dz

;-p


Usman Jamil
Guest
 
Posts: n/a
#4: Dec 20 '06

re: Error in string comparison (Non-English windows)


Thanx Marc.

It has been a great help. I've been debugging my whole project since 48
hours, and cud'nt get any idea why applicaiton is creating problems. I'll
surely look into the alternatives.

Regards
Usman


"Marc Gravell" <marc.gravell@gmail.comwrote in message
news:#AaB46DJHHA.3872@TK2MSFTNGP06.phx.gbl...
Quote:
Or: http://tinyurl.com/y3o9dz
>
;-p
>
>

Marc Gravell
Guest
 
Posts: n/a
#5: Dec 20 '06

re: Error in string comparison (Non-English windows)


Hmmm... just looking at Jon's sample again, and I'm damned if I can
get it to successfuly equate... following all also report false /
non-zero:
Console.WriteLine("mail".ToUpper() == "MAIL");
Console.WriteLine("mail".ToUpper() == "MAIL".ToUpper());
Console.WriteLine(StringComparer.CurrentCultureIgn oreCase.Equals("mail",
"MAIL"));
Console.WriteLine(StringComparer.CurrentCultureIgn oreCase.Compare("mail",
"MAIL"));
Console.WriteLine(string.Equals("mail", "MAIL",
StringComparison.CurrentCultureIgnoreCase));
Console.WriteLine("mail".Equals("MAIL",
StringComparison.CurrentCultureIgnoreCase));

Of course, this is Jon's test case, not yours - so your specific
culture and phrase may be more forgiving... but I don't think I know
about internationalization to give the complete answer... I'll add it
to my list of things to brush up on...

So : does anybody know how you *should* realistically compare such?

Marc


Usman Jamil
Guest
 
Posts: n/a
#6: Dec 20 '06

re: Error in string comparison (Non-English windows)


Hi

The problem just made me think that do I need to check my C++ code also for
this, or is this problem related to dotnet only. In C++ I've used stricmp()
at most of the places for case-insensitive comparison but at a few places
I've used my custom ToUpperCase and ToLowerCase functions. Just pasting one
of them, if you have any idea of it. Just curious to know, if I may have
problem here too, otherwise ignore it if its not relavant.

Thanks and Regards

Usman

string ToLowerCase(string szSourceString)
{
for(int nIndex = 0; nIndex < szSourceString.length(); nIndex++)
{
char cSingleChar = szSourceString[nIndex];
if( cSingleChar >= 'A' && cSingleChar <= 'Z')
{
szSourceString[nIndex] = cSingleChar + 32;
}
}
return szSourceString;
}

"Marc Gravell" <marc.gravell@gmail.comwrote in message
news:#AaB46DJHHA.3872@TK2MSFTNGP06.phx.gbl...
Quote:
Or: http://tinyurl.com/y3o9dz
>
;-p
>
>

JR
Guest
 
Posts: n/a
#7: Dec 20 '06

re: Error in string comparison (Non-English windows)


In Turkish there are two I's - with and without the dot above. The lower
case of I is ? (Dotless i), and the uppercase of i is ?.

JR


"Usman Jamil" <usman@advcomm.netwrote in message
news:O7GUYaEJHHA.4384@TK2MSFTNGP03.phx.gbl...
Quote:
Hi
>
The problem just made me think that do I need to check my C++ code also
for
this, or is this problem related to dotnet only. In C++ I've used
stricmp()
at most of the places for case-insensitive comparison but at a few places
I've used my custom ToUpperCase and ToLowerCase functions. Just pasting
one
of them, if you have any idea of it. Just curious to know, if I may have
problem here too, otherwise ignore it if its not relavant.
>
Thanks and Regards
>
Usman
>
string ToLowerCase(string szSourceString)
{
for(int nIndex = 0; nIndex < szSourceString.length(); nIndex++)
{
char cSingleChar = szSourceString[nIndex];
if( cSingleChar >= 'A' && cSingleChar <= 'Z')
{
szSourceString[nIndex] = cSingleChar + 32;
}
}
return szSourceString;
}
>
"Marc Gravell" <marc.gravell@gmail.comwrote in message
news:#AaB46DJHHA.3872@TK2MSFTNGP06.phx.gbl...
Quote:
>Or: http://tinyurl.com/y3o9dz
>>
>;-p
>>
>>
>
>

JR
Guest
 
Posts: n/a
#8: Dec 20 '06

re: Error in string comparison (Non-English windows)


I'll try again, hoping the get through with UTF-8 and HTML:

In Turkish there are two I's - with and without the dot above. The lower
case of I is ı (Dotless i, U+0131), and the uppercase of i is İ (U+0130).

JR

"JR" <NoMail@qsm.co.ilwrote in message news:ukbArMFJHHA.5104@TK2MSFTNGP06.phx.gbl...
Quote:
In Turkish there are two I's - with and without the dot above. The lower
case of I is ? (Dotless i), and the uppercase of i is ?.

JR


"Usman Jamil" <usman@advcomm.netwrote in message
news:O7GUYaEJHHA.4384@TK2MSFTNGP03.phx.gbl...
Quote:
>Hi
>>
>The problem just made me think that do I need to check my C++ code also
>for
>this, or is this problem related to dotnet only. In C++ I've used
>stricmp()
>at most of the places for case-insensitive comparison but at a few places
>I've used my custom ToUpperCase and ToLowerCase functions. Just pasting
>one
>of them, if you have any idea of it. Just curious to know, if I may have
>problem here too, otherwise ignore it if its not relavant.
>>
>Thanks and Regards
>>
>Usman
>>
>string ToLowerCase(string szSourceString)
>{
> for(int nIndex = 0; nIndex < szSourceString.length(); nIndex++)
> {
> char cSingleChar = szSourceString[nIndex];
> if( cSingleChar >= 'A' && cSingleChar <= 'Z')
> {
> szSourceString[nIndex] = cSingleChar + 32;
> }
> }
> return szSourceString;
>}
>>
>"Marc Gravell" <marc.gravell@gmail.comwrote in message
>news:#AaB46DJHHA.3872@TK2MSFTNGP06.phx.gbl...
Quote:
>>Or: http://tinyurl.com/y3o9dz
>>>
>>;-p
>>>
>>>
>>
>
>
Mihai N.
Guest
 
Posts: n/a
#9: Dec 21 '06

re: Error in string comparison (Non-English windows)


string ToLowerCase(string szSourceString)
Quote:
{
for(int nIndex = 0; nIndex < szSourceString.length(); nIndex++)
{
char cSingleChar = szSourceString[nIndex];
if( cSingleChar >= 'A' && cSingleChar <= 'Z')
{
szSourceString[nIndex] = cSingleChar + 32;
}
}
return szSourceString;
}
Wrong for almost everything beyond plain ASCII.
Meaning it will be wrong for pretty much every language,
including English (thing résumé)


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Mihai N.
Guest
 
Posts: n/a
#10: Dec 21 '06

re: Error in string comparison (Non-English windows)


Hmmm... just looking at Jon's sample again, and I'm damned if I can
Quote:
get it to successfuly equate... following all also report false /
non-zero:
Console.WriteLine("mail".ToUpper() == "MAIL");
Console.WriteLine("mail".ToUpper() == "MAIL".ToUpper());
>
Console.WriteLine(StringComparer.CurrentCultureIgn oreCase.Equals("mail",
Quote:
"MAIL"));
>
Console.WriteLine(StringComparer.CurrentCultureIgn oreCase.Compare("mail",
Quote:
"MAIL"));
Console.WriteLine(string.Equals("mail", "MAIL",
StringComparison.CurrentCultureIgnoreCase));
Console.WriteLine("mail".Equals("MAIL",
StringComparison.CurrentCultureIgnoreCase));
>
Of course, this is Jon's test case, not yours - so your specific
culture and phrase may be more forgiving... but I don't think I know
about internationalization to give the complete answer... I'll add it
to my list of things to brush up on...
>
So : does anybody know how you *should* realistically compare such?

For all the examples above, as well as for the initial case (the "Windows"
string) the CurrentCulture is the most important factor.

For English:
U+0069 <-U+0049
For Turkish/Azeri
U+0069 <-U+0130
U+0131 <-U+0049

So, for Turkish/Azeri "MAIL" is really NOT ToUpper("mail")
This is how it is, and this is how it *should* be.


Now, sometimes you might need to compare things in a locale-independent way
(ie for file system, comunication protocols (ex: mailto:....), etc.)

The right thing for file system is to try accessing the file
( with _access (_taccess), or PathFileExists, or CreateFile)

For other things use StringComparer.InvariantCultureIgnoreCase( ... )
or String.ToUpperInvariant + String.CompareOrdinal


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Closed Thread