473,549 Members | 2,862 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

unicode in windows 2003

Hi

I create a simple win32 project (VC2003, windows2003(Eng lish) ,
and do simple paint in WM_PAINT message, when the project use
multi-character set, it is OK.
but when I change to UNICODE, some Chinese characters are illegible( I see
sizeof(TCHAR)=2 being displayed). Your idea is welcome.

case WM_PAINT:
hdc = BeginPaint(hWnd , &ps);
{
LPCTSTR smsg = _T("pringÖÐÎÄ") ;
TextOut(hdc,0,0 ,smsg, _tcslen(smsg));
TCHAR buf[256];
wsprintf(buf, _T("sizeof(TCHA R)=%d"), sizeof(TCHAR));
TextOut(hdc,0,2 0,buf, _tcslen(buf));
}
EndPaint(hWnd, &ps);
break;

Best Regards
Onega
Nov 16 '05 #1
12 3854
>>Your idea is welcome.
why still use win32 for a new project? do you need to modify an existing
application?

for new projects i would recommend using a .NET windows application project.
these are MUCH simpler to use.

kind regards,
Bruno.
Nov 16 '05 #2
I would assume that when compiled as Unicode, the characters in your string
literal will each be interpreted as one Unicode character. You might want to
look at using the \x escape sequence.

"Onega" <no****@test.co m> wrote in message
news:eI******** ******@tk2msftn gp13.phx.gbl...
Hi

I create a simple win32 project (VC2003, windows2003(Eng lish) ,
and do simple paint in WM_PAINT message, when the project use
multi-character set, it is OK.
but when I change to UNICODE, some Chinese characters are illegible( I see
sizeof(TCHAR)=2 being displayed). Your idea is welcome.

case WM_PAINT:
hdc = BeginPaint(hWnd , &ps);
{
LPCTSTR smsg = _T("pringÖÐÎÄ") ;
TextOut(hdc,0,0 ,smsg, _tcslen(smsg));
TCHAR buf[256];
wsprintf(buf, _T("sizeof(TCHA R)=%d"), sizeof(TCHAR));
TextOut(hdc,0,2 0,buf, _tcslen(buf));
}
EndPaint(hWnd, &ps);
break;

Best Regards
Onega

Nov 16 '05 #3
Thank you, Ted Miller
\x escape sequence is not friendly.
My code snippet works well under Windows XP. I'd like to know if it is a bug
of Windows 2003 or VC 2003?

Best Regards
Onega

"Ted Miller" <te*@nwlink.com > wrote in message
news:vt******** ****@corp.super news.com...
I would assume that when compiled as Unicode, the characters in your string literal will each be interpreted as one Unicode character. You might want to look at using the \x escape sequence.

"Onega" <no****@test.co m> wrote in message
news:eI******** ******@tk2msftn gp13.phx.gbl...
Hi

I create a simple win32 project (VC2003, windows2003(Eng lish) ,
and do simple paint in WM_PAINT message, when the project use
multi-character set, it is OK.
but when I change to UNICODE, some Chinese characters are illegible( I see sizeof(TCHAR)=2 being displayed). Your idea is welcome.

case WM_PAINT:
hdc = BeginPaint(hWnd , &ps);
{
LPCTSTR smsg = _T("pringÖÐÎÄ") ;
TextOut(hdc,0,0 ,smsg, _tcslen(smsg));
TCHAR buf[256];
wsprintf(buf, _T("sizeof(TCHA R)=%d"), sizeof(TCHAR));
TextOut(hdc,0,2 0,buf, _tcslen(buf));
}
EndPaint(hWnd, &ps);
break;

Best Regards
Onega


Nov 16 '05 #4
> Thank you, Ted Miller
\x escape sequence is not friendly.
My code snippet works well under Windows XP. I'd like to know if it is a
bug of Windows 2003 or VC 2003?


None of the above.
It is a bug in your code.

Your string is _T("pringÖÐÎÄ") ;
Because of the _T, the string will be left as is if the application is
ANSI or will be converted to Unicode if the application is Unicode.

When left as is (ANSI), you will get the byte sequence:
D6 D0 CE C4

When you run this on an Chinese Simplified system,
D6 D0 => will be interpreted as center/midle (unicode 4E2D)
CE C4 => will be interpreted as literature/culture/writing (unicode 6587)
(I guess this is what you want)

When you run this on an Chinese Traditional system,
D6 D0 => will be unicode 7B22 (no clue about meaning)
CE C4 => will be unicode 6045 (no clue about meaning)
(I guess this is not what you want)

When run on Russian system you will get Russian characters and so on.
This is the problem with code pages, the same sequence of byte can represent
different characters in different code pages.

For an Unicode application, whenm you compile the string is converted to
Unicode from the code page of your source code, which is assumed to be the
system code page.
If you compile on a US system, the result is the byte sequence
D6 00 D0 00 CE 00 C4 00
representing the Unicode characters U+00D6 U+00D0 U+00CE U+00C4

This will display identical on any system supporting Unicode:
LATIN CAPITAL LETTER O WITH DIAERESIS
LATIN CAPITAL LETTER ETH
LATIN CAPITAL LETTER I WITH CIRCUMFLEX
LATIN CAPITAL LETTER A WITH DIAERESIS

If you compile this on a Simplified Chinese system you get what you want.
The \x escape sequence is not friendly, but behave identical on all systems.

This letting aside that it is a verry bad practice to hard-code UI strings in
your code (you already discovered one of the reason).
Mihai
-------------------------
Replace _year_ with _ to get the real email
Nov 16 '05 #5
Hi Mihai N,

Thanks a lot for your informative explaination. I got a lot from it.
While I still have some doubt on this issue.
According to your theory, it seems that my code snippet should fail on both
Windows XP(English, SP1) and Windows 2003(English) . But it is fine on
Windows XP( English version , default codepage: Chinese, Region : Chinese),
althrough I set default codepage and Region to Chinese too under Windows
2003.
I appreciate your help!

TCHAR buf[256];
ZeroMemory(buf, sizeof(buf));
int n = GetLocaleInfo(L OCALE_SYSTEM_DE FAULT
,LOCALE_ILANGUA GE,buf,ARRAY_SI ZE(buf));
buf contains text "0804" under both Windows XP and Windows 2003

Best Regards
Onega


"Mihai N." <nm************ **@yahoo.com> wrote in message
news:Xn******** ************@21 6.148.227.77...
Thank you, Ted Miller
\x escape sequence is not friendly.
My code snippet works well under Windows XP. I'd like to know if it is a
bug of Windows 2003 or VC 2003?
None of the above.
It is a bug in your code.

Your string is _T("pringÖÐÎÄ") ;
Because of the _T, the string will be left as is if the application is
ANSI or will be converted to Unicode if the application is Unicode.

When left as is (ANSI), you will get the byte sequence:
D6 D0 CE C4

When you run this on an Chinese Simplified system,
D6 D0 => will be interpreted as center/midle (unicode 4E2D)
CE C4 => will be interpreted as literature/culture/writing (unicode 6587)
(I guess this is what you want)

When you run this on an Chinese Traditional system,
D6 D0 => will be unicode 7B22 (no clue about meaning)
CE C4 => will be unicode 6045 (no clue about meaning)
(I guess this is not what you want)

When run on Russian system you will get Russian characters and so on.
This is the problem with code pages, the same sequence of byte can

represent different characters in different code pages.

For an Unicode application, whenm you compile the string is converted to
Unicode from the code page of your source code, which is assumed to be the
system code page.
If you compile on a US system, the result is the byte sequence
D6 00 D0 00 CE 00 C4 00
representing the Unicode characters U+00D6 U+00D0 U+00CE U+00C4

This will display identical on any system supporting Unicode:
LATIN CAPITAL LETTER O WITH DIAERESIS
LATIN CAPITAL LETTER ETH
LATIN CAPITAL LETTER I WITH CIRCUMFLEX
LATIN CAPITAL LETTER A WITH DIAERESIS

If you compile this on a Simplified Chinese system you get what you want.
The \x escape sequence is not friendly, but behave identical on all systems.
This letting aside that it is a verry bad practice to hard-code UI strings in your code (you already discovered one of the reason).
Mihai
-------------------------
Replace _year_ with _ to get the real email

Nov 16 '05 #6
> According to your theory, it seems that my code snippet should fail on both
Windows XP(English, SP1) and Windows 2003(English) . But it is fine on
Windows XP( English version , default codepage: Chinese, Region : Chinese),
althrough I set default codepage and Region to Chinese too under Windows
2003.


Ok, maybe this is not the explanation.
Can you pleas answer some questions, maybe I can figure it out?
Is the code compiled already and you test the same executable on the two
systems?
Or you recompile?
The convestion of the string in the source happens at compile time.
What characters do you get see when you run your code on Windows 2003?

Mihai
-------------------------
Replace _year_ with _ to get the real email
Nov 16 '05 #7
Glad to see your post again.
Your tips is valuable.
I build ANSI and UNICODE version executable under windows XP, both works
well under windows 2003, then I rebuild under windows 2003, only ANSI
version works well.

my code looks like

case WM_PAINT:
hdc = BeginPaint(hWnd , &ps);
{
LPCTSTR smsg = _T("AÖÐÎÄ");
int nlen = _tcslen(smsg);
TextOut(hdc,0,0 ,smsg, _tcslen(smsg));
TCHAR buf[512];
wsprintf(buf, _T("sizeof(TCHA R)=%d, strlen = %d,"), sizeof(TCHAR), nlen);
TextOut(hdc,0,2 0,buf, _tcslen(buf));
ZeroMemory(buf, sizeof(buf));
TCHAR nbuf[16];
for(int ci=0;ci<nlen;ci ++)
{
ZeroMemory(nbuf ,sizeof(nbuf));
TCHAR tci = smsg[ci];
if(sizeof(TCHAR )==1)
wsprintf(nbuf, _T("%02X"),tci& 0xff);
else
wsprintf(nbuf, _T("%04X"),tci& 0xffff);
_tcscat(buf, nbuf);
}
TextOut(hdc,0,4 0,buf, _tcslen(buf));
}
EndPaint(hWnd, &ps);
break;

version build under win2003 gives the following output(I have only run it
under 2003):
ANSI : Chinese is fine, sizeof(TCHAR)=1 , strlen=5, 41D6D0CEC4
UNICODE: Chinese isn't fine, sizeof(TCHAR)=2 ,strlen=5,
004100D600D000C E00C4

version build under Windows XP gives the following output(run on both XP and
2003):
UNICODE: Chinese is fine, sizeof(TCHAR)=2 ,strlen=3, 00414E2D6587
ANSI: Chinese is fine, sizeof(TCHAR)=1 ,strlen=5, 41D6D0CEC4

I think there is something wrong with Windows 2003 or VS.NET 2003

Best Regards
Onega
"Mihai N." <nm************ **@yahoo.com> wrote in message
news:Xn******** **********@216. 148.227.77...
According to your theory, it seems that my code snippet should fail on both Windows XP(English, SP1) and Windows 2003(English) . But it is fine on
Windows XP( English version , default codepage: Chinese, Region : Chinese), althrough I set default codepage and Region to Chinese too under Windows
2003.


Ok, maybe this is not the explanation.
Can you pleas answer some questions, maybe I can figure it out?
Is the code compiled already and you test the same executable on the two
systems?
Or you recompile?
The convestion of the string in the source happens at compile time.
What characters do you get see when you run your code on Windows 2003?

Mihai
-------------------------
Replace _year_ with _ to get the real email

Nov 16 '05 #8
> I build ANSI and UNICODE version executable under windows XP, both works
well under windows 2003, then I rebuild under windows 2003, only ANSI
version works well.
My guess: the XP you are using for building is Chinese Simplified,
the 2003 is English (or something else using code page 1252)
version build under win2003 gives the following output(I have only run
it under 2003):
ANSI : Chinese is fine, sizeof(TCHAR)=1 , strlen=5, 41D6D0CEC4
UNICODE: Chinese isn't fine, sizeof(TCHAR)=2 ,strlen=5,
004100D600D000C E00C4
This matches what I was saying in a previous email:
If you compile on a US system, the result is the byte sequence
D6 00 D0 00 CE 00 C4 00
representing the Unicode characters U+00D6 U+00D0 U+00CE U+00C4

Note: COMPILE on US system, not RUN on US system.
_T is solved at compile time.
This also points to the conclusion that you do compile on an English system.

Try to compile it on a Chinese Simplified system.
You can do it on your 2003 system, but you should set both the user
and the system locale to Chinese (RPC), then reboot.

There is nothing wrong with Windows 2003 or Dev. Studio (2003 or older)

But even if this solves the problem, please move the string in the resources.
This is "the right thing" to do.

Quoting Microsoft
"In fact, the C/C++ Language specification says that the source files are
to be written in 7-bit ANSI."
Quoting the standard:
1. The basic source character set consists of 96 characters: the space
character, the control characters representing
horizontal tab, vertical tab, form feed, and newline,
plus the following 91 graphical characters:
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + * / ^ & | ~ ! = , \ " ’

2 The universal-character-name construct provides a way to name
other characters.
hexquad:
hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit
universal-character-name:
\u hex-quad
\U hex-quad hex-quad


--
Mihai
-------------------------
Replace _year_ with _ to get the real email
Nov 16 '05 #9
Hi Mihai N,
Both Windows XP and Windows 2003 I worked with are English version.
At last I got a solution, by puting #pragma setlocale("chs" ) in .cpp file.
The idea is from Alexander Grigoriev. Show my respect to you for your
patience with it. I'll take your advice in future project. Thanks again!

Best Regards
Onega
"Mihai N." <nm************ **@yahoo.com> wrote in message
news:Xn******** ************@63 .240.76.16...
I build ANSI and UNICODE version executable under windows XP, both works
well under windows 2003, then I rebuild under windows 2003, only ANSI
version works well.
My guess: the XP you are using for building is Chinese Simplified,
the 2003 is English (or something else using code page 1252)
version build under win2003 gives the following output(I have only run
it under 2003):
ANSI : Chinese is fine, sizeof(TCHAR)=1 , strlen=5, 41D6D0CEC4
UNICODE: Chinese isn't fine, sizeof(TCHAR)=2 ,strlen=5,
004100D600D000C E00C4


This matches what I was saying in a previous email:
If you compile on a US system, the result is the byte sequence
D6 00 D0 00 CE 00 C4 00
representing the Unicode characters U+00D6 U+00D0 U+00CE U+00C4 Note: COMPILE on US system, not RUN on US system.
_T is solved at compile time.
This also points to the conclusion that you do compile on an English

system.
Try to compile it on a Chinese Simplified system.
You can do it on your 2003 system, but you should set both the user
and the system locale to Chinese (RPC), then reboot.

There is nothing wrong with Windows 2003 or Dev. Studio (2003 or older)

But even if this solves the problem, please move the string in the resources. This is "the right thing" to do.

Quoting Microsoft
"In fact, the C/C++ Language specification says that the source files

are to be written in 7-bit ANSI."


Quoting the standard:
1. The basic source character set consists of 96 characters: the space
character, the control characters representing
horizontal tab, vertical tab, form feed, and newline,
plus the following 91 graphical characters:
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + ?/ ^ & | ~ ! = , \ " ?

2 The universal-character-name construct provides a way to name
other characters.
hexquad:
hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit universal-character-name:
\u hex-quad
\U hex-quad hex-quad


--
Mihai
-------------------------
Replace _year_ with _ to get the real email

Nov 16 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
7074
by: sebastien.hugues | last post by:
Hi I would like to retrieve the application data directory path of the logged user on windows XP. To achieve this goal i use the environment variable APPDATA. The logged user has this name: sébastien. The second character is not an ascii one and when i try to encode the path that contains this name in utf-8,
9
2811
by: Thomas Heller | last post by:
First I was astonished to see that _winreg.QueryValue doesn't accept unicode key names, then I came up with this pattern: def RegQueryValue(root, subkey): if isinstance(subkey, unicode): return _winreg.QueryValue(root, subkey.encode("mbcs")) return _winreg.QueryValue(root, subkey) Does this look ok?
48
4586
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at once) * regular expression search & replace. Normally my documents are encoded with the ISO setting. Recently I was writing an XHTML...
2
1810
by: Grace | last post by:
Dear Sir, By default, an application build on .net framework 1.0 or 1.1 is it a unicode application?? and If i use VS.net 2003 (VC# or VC++) by defualt is it a unicode or ANSI appliation?
5
18632
by: Jamie | last post by:
I have a file that was written using Java and the file has unicode strings. What is the best way to deal with these in C? The file definition reads: Data Field Description CHAR File identifier (64 bytes corresponding to Unicode character string padded with '0' Unicode characters. CHAR File format version (32 bytes corresponding...
18
34089
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found Encoding.Convert, but that needs byte arrays. Thanks, /Ger
10
2197
by: Larry Hastings | last post by:
I'm an indie shareware Windows game developer. In indie shareware game development, download size is terribly important; conventional wisdom holds that--even today--your download should be 5MB or less. I'd like to use Python in my games. However, python24.dll is 1.86MB, and zips down to 877k. I can't afford to devote 1/6 of my download...
5
1560
by: bhc | last post by:
Anybody know how to use unicode in vb.net 2003?
5
1973
by: =?Utf-8?B?Q3JhaWcgSm9obnN0b24=?= | last post by:
I am in the process of converting an application to Unicode that is built with Visual C++ .NET 2003. On application startup in debug mode I get an exception. The problem appears to be that code with #ifndef _UNICDODE is executed in output.c, the library code for supporting printf functions. I need to how to get the code that is defined with...
0
7446
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7956
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
6040
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5368
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5087
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3498
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3480
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1935
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1057
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.