Wide character input/output

Ioannis Vranos

[The current message encoding is set to Unicode (UTF-8) because it
contains Greek]
The following code does not work as expected:
#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <stddef.h>

int main()
{
char *p= setlocale( LC_ALL, "Greek" );

wchar_t input[50];

if (!p)
printf("NULL returned!\n");

fgetws(input, 50, stdin);

wprintf(L"%s\n", input);

return 0;
}
Under Linux:
[john@localhost src]$ ./foobar-cpp
Test
T
[john@localhost src]$
[john@localhost src]$ ./foobar-cpp
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
ï¿½
[john@localhost src]$

Under MS Visual C++ 2008 Express:

Test
Test

Press any key to continue . . .
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
??????Îµ????

Press any key to continue . . .
Am I missing something?

Feb 23 '08 #1

Subscribe Reply

4077

Ben Bacarisse

Ioannis Vranos <iv*****@nospam.no.spamfreemail.grwrites:

[The current message encoding is set to Unicode (UTF-8) because it
contains Greek]
The following code does not work as expected:
#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <stddef.h>

int main()
{
char *p= setlocale( LC_ALL, "Greek" );

wchar_t input[50];

if (!p)
printf("NULL returned!\n");

fgetws(input, 50, stdin);

wprintf(L"%s\n", input);

You need "%ls". This is very important with wprintf since without it
%s denotes a multi-byte character sequence. printf("%ls\n" input)
should also work. You need the w version if you want the multi-byte
conversion of %s or if the format has to be a wchar_t pointer.

>
return 0;
}
Under Linux:
[john@localhost src]$ ./foobar-cpp
Test
T
[john@localhost src]$
[john@localhost src]$ ./foobar-cpp
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
ï¿½
[john@localhost src]$

The above my not be the only problem. In cases like this, you need to
say way encoding your terminal is using.

<snip>

--
Ben.

Feb 23 '08 #2

Ioannis Vranos

Ben Bacarisse wrote:

>
You need "%ls". This is very important with wprintf since without it
%s denotes a multi-byte character sequence. printf("%ls\n" input)
should also work. You need the w version if you want the multi-byte
conversion of %s or if the format has to be a wchar_t pointer.

Perhaps you may help me understand better. We have the usual char
encoding which is implementation defined (usually ASCII).

wchar_t is wide character encoding, which is the "largest character set
supported by the system", so I suppose Unicode under Linux and Windows.

What exactly is a multi-byte character?

I have to say that I am talking about C95 here, not C99.

>
> return 0;
}
Under Linux:
[john@localhost src]$ ./foobar-cpp
Test
T
[john@localhost src]$
[john@localhost src]$ ./foobar-cpp
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
ï¿½
[john@localhost src]$

The above my not be the only problem. In cases like this, you need to
say way encoding your terminal is using.

You are somehow correct on this. My terminal encoding was UTF-8 and I
added Greek(ISO-8859-7). Under the last, the following code works OK:
#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <stddef.h>

int main()
{
char *p= setlocale( LC_ALL, "Greek" );

wprintf(L"Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ\n");

return 0;
}

[john@localhost src]$ ./foobar-cpp
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
[john@localhost src]$
Also the original, fixed according to your suggestion:
#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <stddef.h>

int main()
{
char *p= setlocale( LC_ALL, "Greek" );

wchar_t input[50];

if (!p)
printf("NULL returned!\n");

fgetws(input, 50, stdin);

wprintf(L"%ls", input);

return 0;
}

works OK too:

[john@localhost src]$ ./foobar-cpp
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
[john@localhost src]$
It works OK under Terminal UTF-8 default encoding too. So "%ls" is what
was really needed.
BTW, how can we define UTF-8 as the locale?
Thanks a lot.

Feb 23 '08 #3

Ioannis Vranos

Ioannis Vranos wrote:

>
It works OK under Terminal UTF-8 default encoding too. So "%ls" is what
was really needed.

Actually the code:

#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <stddef.h>

int main()
{
char *p= setlocale( LC_ALL, "Greek" );

wprintf(L"Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ\n");

return 0;
}

works only when I set the Terminal encoding to Greek (ISO-8859-7).

>

BTW, how can we define UTF-8 as the locale?
Thanks a lot.

Feb 23 '08 #4

Ben Bacarisse

Ioannis Vranos <iv*****@nospam.no.spamfreemail.grwrites:

Ben Bacarisse wrote:
>>
You need "%ls". This is very important with wprintf since without it
%s denotes a multi-byte character sequence. printf("%ls\n" input)
should also work. You need the w version if you want the multi-byte
conversion of %s or if the format has to be a wchar_t pointer.

Perhaps you may help me understand better. We have the usual char
encoding which is implementation defined (usually ASCII).

wchar_t is wide character encoding, which is the "largest character
set supported by the system", so I suppose Unicode under Linux and
Windows.

What exactly is a multi-byte character?

It is a confusing term. It means an encoding that uses sequences of
ordinary bytes (in the C sense -- chars) to encode a large character
set. The most common example is UTF-8.

I have to say that I am talking about C95 here, not C99.

>>
>> return 0;
}
Under Linux:
[john@localhost src]$ ./foobar-cpp
Test
T
[john@localhost src]$
[john@localhost src]$ ./foobar-cpp
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
ï¿½
[john@localhost src]$

The above my not be the only problem. In cases like this, you need to
say way encoding your terminal is using.

You are somehow correct on this.

Strange, I know!

My terminal encoding was UTF-8 and I
added Greek(ISO-8859-7). Under the last, the following code works OK:
#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <stddef.h>

int main()
{
char *p= setlocale( LC_ALL, "Greek" );

wprintf(L"Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ\n");

return 0;
}

[john@localhost src]$ ./foobar-cpp
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
[john@localhost src]$
Also the original, fixed according to your suggestion:
#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <stddef.h>

int main()
{
char *p= setlocale( LC_ALL, "Greek" );

wchar_t input[50];

if (!p)
printf("NULL returned!\n");

fgetws(input, 50, stdin);

wprintf(L"%ls", input);

return 0;
}

works OK too:

[john@localhost src]$ ./foobar-cpp
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
[john@localhost src]$
It works OK under Terminal UTF-8 default encoding too. So "%ls" is
what was really needed.
BTW, how can we define UTF-8 as the locale?

I *think* this is now off-topic. I don't think C says anything about
what the locale string means...

The character encoding is usually specified after a '.'. I use, for
example, "en-GB.UTF-8". I suspect that if you only specify a part of
the locale (or one that does not make sense) your C library picks up
what to do from the execution environment. To me "Greek" looks like
an odd locale string. I would expect "el-GR.UTF-8" or
"el-GR.ISO8859-7".

--
Ben.

Feb 23 '08 #5

Ben Bacarisse

Ioannis Vranos <iv*****@nospam.no.spamfreemail.grwrites:

Ioannis Vranos wrote:
>>
It works OK under Terminal UTF-8 default encoding too. So "%ls" is
what was really needed.

Actually the code:

#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <stddef.h>

int main()
{
char *p= setlocale( LC_ALL, "Greek" );

wprintf(L"Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ\n");

return 0;
}

works only when I set the Terminal encoding to Greek (ISO-8859-7).

This sort of thing is almost impossible to investigate over Usenet.
Your news software will take your code and may or may not encode the
characters of the L"..." string in the encoding of your post (UTF-8).
It makes it very hard to know what the program text actually is.

Another complication is that the locale setting affects the run-time
behaviour, but you program also depends on what character encoding is
expected by the compiler that builds the string.

--
Ben.

Feb 23 '08 #6

Ioannis Vranos

Ben Bacarisse wrote:

>BTW, how can we define UTF-8 as the locale?

I *think* this is now off-topic. I don't think C says anything about
what the locale string means...

The character encoding is usually specified after a '.'. I use, for
example, "en-GB.UTF-8". I suspect that if you only specify a part of
the locale (or one that does not make sense) your C library picks up
what to do from the execution environment. To me "Greek" looks like
an odd locale string. I would expect "el-GR.UTF-8" or
"el-GR.ISO8859-7".

I got the idea from:

http://msdn2.microsoft.com/en-us/lib...1d(VS.80).aspx

http://msdn2.microsoft.com/en-us/lib...zf(VS.80).aspx

Feb 24 '08 #7

Ioannis Vranos

Ben Bacarisse wrote:

>BTW, how can we define UTF-8 as the locale?

I *think* this is now off-topic. I don't think C says anything about
what the locale string means...

The character encoding is usually specified after a '.'. I use, for
example, "en-GB.UTF-8". I suspect that if you only specify a part of
the locale (or one that does not make sense) your C library picks up
what to do from the execution environment. To me "Greek" looks like
an odd locale string. I would expect "el-GR.UTF-8" or
"el-GR.ISO8859-7".

This code works with gcc:

#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <stddef.h>

int main()
{
char *p= setlocale( LC_ALL, "greek" );

wchar_t input[50];

if (!p)
printf("NULL returned!\n");

fgetws(input, 50, stdin);

wprintf(L"%ls", input);

return 0;
}
[john@localhost src]$ ./foobar-cpp
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ„Î¹ÎºÏŒ
[john@localhost src]$
When I place el-GR.UTF-8 or el-GR.ISO8859-7 I get:
[john@localhost src]$ ./foobar-cpp
NULL returned!

[john@localhost src]$

Feb 24 '08 #8

Ben Bacarisse

Ioannis Vranos <iv*****@nospam.no.spamfreemail.grwrites:

Ben Bacarisse wrote:

>>BTW, how can we define UTF-8 as the locale?

I *think* this is now off-topic. I don't think C says anything about
what the locale string means...

The character encoding is usually specified after a '.'. I use, for
example, "en-GB.UTF-8". I suspect that if you only specify a part of
the locale (or one that does not make sense) your C library picks up
what to do from the execution environment. To me "Greek" looks like
an odd locale string. I would expect "el-GR.UTF-8" or
"el-GR.ISO8859-7".

I got the idea from:

http://msdn2.microsoft.com/en-us/lib...1d(VS.80).aspx

Ah, OK. Anyway, we are off-topic now. I think you'd have to post in
a Windows group to find out what locale strings mean there.

--
Ben.

Feb 24 '08 #9

Ioannis Vranos

Ben Bacarisse wrote:

Ioannis Vranos <iv*****@nospam.no.spamfreemail.grwrites:

>Ben Bacarisse wrote:
>>>BTW, how can we define UTF-8 as the locale?
I *think* this is now off-topic. I don't think C says anything about
what the locale string means...

The character encoding is usually specified after a '.'. I use, for
example, "en-GB.UTF-8". I suspect that if you only specify a part of
the locale (or one that does not make sense) your C library picks up
what to do from the execution environment. To me "Greek" looks like
an odd locale string. I would expect "el-GR.UTF-8" or
"el-GR.ISO8859-7".
I got the idea from:

http://msdn2.microsoft.com/en-us/lib...1d(VS.80).aspx

Ah, OK. Anyway, we are off-topic now. I think you'd have to post in
a Windows group to find out what locale strings mean there.

I am a Linux user. The "el-GR.UTF-8" and "el-GR.ISO8859-7" you suggested
make setlocale() return NULL. The "greek" and "Greek" suggested by
MSDN works. So I supposed there is a portable way for this. Aren't any
portable locale encoding strings?

Feb 24 '08 #10

Ioannis Vranos

Clarified:

I am a Linux user. The "el-GR.UTF-8" and "el-GR.ISO8859-7" you suggested
make setlocale() return NULL

==under Linux.

The "greek" and "Greek" suggested by MSDN
works

==under Linux.

So I supposed there is a portable way for this. Aren't any
portable locale encoding strings?

Feb 24 '08 #11

Ioannis Vranos

Ioannis Vranos wrote:

Clarified:

>I am a Linux user. The "el-GR.UTF-8" and "el-GR.ISO8859-7" you
suggested make setlocale() return NULL

==under Linux.

>The "greek" and "Greek" suggested by MSDN works

==under Linux.

>So I supposed there is a portable way for this. Aren't any portable
locale encoding strings?

Also based on
http://gcc.gnu.org/onlinedocs/libstd...le/locale.html where it
mentions "locale -a" and provides a list of locales, in my system it
outputs among other things:
galego
galician
gd_GB
gd_GB.iso885915
gd_GB.utf8
german
gez_ER
gez_ER@abegede
gez_ER.utf8
gez_ER.utf8@abegede
gez_ET
gez_ET@abegede
gez_ET.utf8
gez_ET.utf8@abegede
gl_ES
gl_ES@euro
gl_ES.iso88591
gl_ES.iso885915@euro
gl_ES.utf8
==greek
gu_IN
gu_IN.utf8
gv_GB
gv_GB.iso88591
gv_GB.utf8
hebrew
he_IL
he_IL.iso88598
he_IL.utf8
hi_IN
hi_IN.utf8
hr_HR
hr_HR.iso88592
hr_HR.utf8
hrvatski
hsb_DE
hsb_DE.iso88592
hsb_DE.utf8
hu_HU
hu_HU.iso88592
hu_HU.utf8
hungarian
So "greek" is a valid locale for linux too.

Feb 24 '08 #12

Ben Bacarisse

Ioannis Vranos <iv*****@nospam.no.spamfreemail.grwrites:

Ioannis Vranos wrote:
>Clarified:

>>I am a Linux user. The "el-GR.UTF-8" and "el-GR.ISO8859-7" you
suggested make setlocale() return NULL

==under Linux.

>>The "greek" and "Greek" suggested by MSDN works

==under Linux.

>>So I supposed there is a portable way for this. Aren't any portable
locale encoding strings?

Also based on
http://gcc.gnu.org/onlinedocs/libstd...le/locale.html where it
mentions "locale -a" and provides a list of locales, in my system it
outputs among other things:

galego
galician
gd_GB

....

gl_ES.iso885915@euro
gl_ES.utf8
==greek

Post in comp.unix.programmer. I think you can define anything you
like under Linux, but what is and is not valid is not specified by C.
Other standards (like POSIX) probably specify much more.

So "greek" is a valid locale for linux too.

--
Ben.

Feb 24 '08 #13

CBFalconer

Ioannis Vranos wrote:

>

.... snip ...

>
I have attached a screenshot.

According to which, I believe, you are using a c++ compiler.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

--
Posted via a free Usenet account from http://www.teranews.com

Feb 24 '08 #14

CBFalconer

Ioannis Vranos wrote:

>
[The current message encoding is set to Unicode (UTF-8) because
it contains Greek]

The following code does not work as expected:

#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <stddef.h>

int main() {
char *p= setlocale( LC_ALL, "Greek" );
wchar_t input[50];

if (!p)
printf("NULL returned!\n");
fgetws(input, 50, stdin);
wprintf(L"%s\n", input);
return 0;
}

.... snip ...

>
Am I missing something?

Yes. If setlocale fails, it returns NULL, which you detect, but do
not immediately exit the program. You also forgot to check for
errors in executing fgetws or wprintf.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

--
Posted via a free Usenet account from http://www.teranews.com

Feb 24 '08 #15

Similar topics

2630

wide characters: "illusion of portability"?

by: Jonathan Mcdougall | last post by:

I started using boost's filesystem library a couple of days ago. In its FAQ, it states "Wide-character names would provide an illusion of portability where portability does not in fact exist....

C / C++

1991

How to determine (in compile time) whether a compiler supports wide characters or not

by: Ashabul Yeameen | last post by:

Hi all, I am writing a C program which at some certain steps needs to use the wide character funcion fputwc() for giving utf-8 output. Since I want to make the code more portable, I wrote my own...

C / C++

5161

wcsftime output encoding

by: Roger Leigh | last post by:

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The program listed below demonstrates the use of wcsftime() and std::time_put<wchar_t> which is a C++ wrapper around it. (I know this isn't C; but...

C / C++

2600

Mixing Wide & Narrow Character Output to stdout

by: Andy | last post by:

Can I mix wide and narrow character output to stdout? I seem to remember hearing this was not supported before but I can't find any reference to such a restriction now I actually need to do it! It...

C / C++

11841

8 bit character string to 16 bit character string

by: Brand Bogard | last post by:

Does the C standard include a library function to convert an 8 bit character string to a 16 bit character string?

C / C++

2885

writing wide chars

by: Elie Roux | last post by:

Hello, I would like to write a wide chars string with printf, but I do not really understand the behaviour I have with this basic test program for example : #include <stdlib.h> #include...

C / C++

3580

wcout does not print wide character string in solaris.

by: iwongu | last post by:

Hi, I have a question about std::wcout and its underlying C functions. According to C99, a stream have an orientation type that is one of byte-oriented or wide-one, and it is determined by the...

C / C++

10605

wide character (unicode) and multi-byte character

by: =?Utf-8?B?R2Vvcmdl?= | last post by:

Hello everyone, Wide character and multi-byte character are two popular encoding schemes on Windows. And wide character is using unicode encoding scheme. But each time I feel confused when...

.NET Framework

1474

get wide character and multibyte character value

by: George2 | last post by:

Hello everyone, I need to know the wide character (unicode) and multibyte (UTF-8) values of a character string of czech. I personally know nothing about czech. Is the following approach correct?...

C / C++

7051

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

6915

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7054

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7097

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

4794

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

3003

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

2993

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1307

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

567

php

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP