Umlaut letters in C++ 
July 22nd, 2005, 10:18 AM
| | | Umlaut letters in C++
I am using Visual Studio C++ .NET and when I try to print words with
umlaut letters, for instance
printf("Pässinpää-ääliö");
letters with dots over them, äö, will not be printed correctly on the
screen. I tried the trick
#ifdef _UNICODE
int wmain(void)
#else
int main(void)
#endif
but it didn't help. How can I get printf to produce umlaut letters
correctly?
Pekka | 
July 22nd, 2005, 10:18 AM
| | | Re: Umlaut letters in C++
Pekka Jarvela wrote:[color=blue]
> I am using Visual Studio C++ .NET and when I try to print words with
> umlaut letters, for instance
>
> printf("Pässinpää-ääliö");
>
> letters with dots over them, äö, will not be printed correctly on the
> screen.[/color]
<snip>
Assuming you mean it's printing 'different' characters, it's a character
set issue.
Windows uses the ANSI character set (with a few additions), at least
when it isn't using Unicode.
Your program is obviously running in a DOS window. DOS uses the IBM
character set (one of various versions thereof). So what you are
probably seeing is the IBM characters with the same codes as the ANSI
characters you're typing in your (presumably) Windows-based editor.
Look up the codes here: http://www.i18nguy.com/unicode/codepages.html#ibmdos
Stewart.
--
My e-mail is valid but not my primary mailbox, aside from its being the
unfortunate victim of intensive mail-bombing at the moment. Please keep
replies on the 'group where everyone may benefit. | 
July 22nd, 2005, 10:19 AM
| | | OFF TOPIC: UMLAT
Pekka Jarvela posted:
[color=blue]
> I am using Visual Studio C++ .NET and when I try to print words with
> umlaut letters, for instance
>
> printf("Pässinpää-ääliö");
>
> letters with dots over them, äö, will not be printed correctly on the
> screen. I tried the trick
>
> #ifdef _UNICODE
> int wmain(void)
> #else
> int main(void)
> #endif
>
> but it didn't help. How can I get printf to produce umlaut letters
> correctly?
>
> Pekka[/color]
Firstly,
Windows 95 -> Windows ME were all ANSI, ie. 8-bit charachters = 255 possible
different charachters. If you wanted foreign charachters, eg. Arabic,
Chinese, then you had to install a different codepage. You'd to switch
between codepages and could not display them both at once.
All versions of Windows NT, including Windows 2000 were Unicode, ie. 16-Bit
characters = 65,535 possible different charachters.
With Windows XP came hope, all versions are Unicode, both home and
professional edition.
But still, here comes a bit of irony: On my system, WinXP Professional, the
following
MessageBoxA(blah,"€6.72",blah,blah); //ANSI version
works perfectly, ie. the euro sign _is_ displayed, but:
MessageBoxW(blah,L"€6.72",blah,blah); //Unicode version
does _not_ display the euro sign!!
--
Umlated charachters _are_ included in ANSI, so I presume your problemo may
simply be that the umlated charachters are _not_ in the font you're using.
Try changing font. | 
July 22nd, 2005, 10:20 AM
| | | Re: OFF TOPIC: UMLAT
JKop <NULL@NULL.NULL> wrote in news:GFRjc.5944$qP2.13948@news.indigo.ie:
[color=blue]
> Pekka Jarvela posted:[color=green]
>> I am using Visual Studio C++ .NET and when I try to print words with
>> umlaut letters, for instance
>>
>> printf("Pässinpää-ääliö");
>>
>> letters with dots over them, äö, will not be printed correctly on the
>> screen. I tried the trick[/color][/color]
[...][color=blue]
> Windows 95 -> Windows ME were all ANSI, ie. 8-bit charachters = 255
> possible different charachters. If you wanted foreign charachters, eg.
> Arabic, Chinese, then you had to install a different codepage. You'd
> to switch between codepages and could not display them both at once.[/color]
Not quite. Win9x use multi-byte character sets in some locales, certainly
Chinese. So you can have more than 256 characters, but each character can
take more than one char.
Also, if you've got "Microsoft Layer for Unicode" installed, you can use
Unicode on Win9x.
[color=blue]
> All versions of Windows NT, including Windows 2000 were Unicode, ie.
> 16-Bit characters = 65,535 possible different charachters.
>
> With Windows XP came hope, all versions are Unicode, both home and
> professional edition.[/color]
XP is NT. Internally, everything is done with UCS-2, but applications
compiled for ANSI still get ANSI of some flavor.
Practically, it doesn't matter too much for the application.
[color=blue]
> But still, here comes a bit of irony: On my system, WinXP
> Professional, the following
>
> MessageBoxA(blah,"€6.72",blah,blah); //ANSI version
>
> works perfectly, ie. the euro sign _is_ displayed, but:
>
> MessageBoxW(blah,L"€6.72",blah,blah); //Unicode version
>
> does _not_ display the euro sign!![/color]
This is because your source code is ANSI, so you're entering the euro
symbol using the Microsoft-specific code 128. In ANSI mode, Windows maps
that to the appropriate Unicode codepoint 0x20AC before displaying it.
I'd guess that in Unicode mode, the compiler naively maps that to Unicode
codepoint 0x0080, which is not the euro symbol.
Try using '\x20AC' in the Unicode version.
[color=blue]
> Umlated charachters _are_ included in ANSI, so I presume your problemo
> may simply be that the umlated charachters are _not_ in the font
> you're using. Try changing font.[/color]
Or not in the ANSI codepage you're using. Actually, in console windows,
it tends to use the OEM codepage, which will distinct from any ANSI
codepage. (in particular the one that the IDE is probably using)
I would recommend not using non-ASCII characters in source code, and in
console windows.
And if you just want to make it work now, look for a font that uses the
OEM codepage, like Terminal or Lucida ConsoleP (note the P), in your
editor. (or in charmap, since it may be hard to enter accented characters
in the OEM codepage)
#include <cstdio>
int main(void)
{
printf("P\204ssinp\204\204-\204\204li\224\n");
printf("P\344ssinp\344\344-\344\344li\366\n");
return 0;
}
At the console window, the first line will be correct. Piped to a file
and opened in notepad, at least with the "Windows: Western" (almost ISO-
8859-1) codepage, the second line will be correct. (The second is
identical to what Pekka Jarvela posted.)
wprintf(L"P\344ssinp\344\344-\344\344li\366\n");
should look right when opened in Unicode-capable notepad, although you'd
need to make sure your stdout was Unicode. (I'm not really familiar with
wprintf...)
-josh | | Thread Tools | Search this Thread | | | |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | | | | What is Bytes?
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 220,662 network members.
|