wide characters

Bill Cunningham

I want to print out the Chinese character meaning water which is decimal
27750 I believe. Do I use wprintf to do this and just include wchar.h ? So
far I haven't gotten anything to work.

Bill

Oct 15 '08 #1

Subscribe Post Reply

3911

Antoninus Twink

On 15 Oct 2008 at 21:23, Bill Cunningham wrote:

I want to print out the Chinese character meaning water which is
decimal 27750 I believe. Do I use wprintf to do this and just include
wchar.h ? So far I haven't gotten anything to work.

To be honest, internationalization in "standard" C is a complete mess,
hacked on imperfectly to the language at the last possible minute. The
wchar_t representation of a string is platform *and locale* dependent,
so bad things can happen if the run-time locale of your program is
different from the compile-time locale.

The best advice is to take advantage of an existing Unicode library:
someone else has already made the mistakes you're likely to made,
debugged them, and put the resulting code in a library for you to use,
so why reinvent the wheel?

A good option could be the ICU library (http://www.icu-project.org)
developed at IBM.

Oct 15 '08 #2

Ben Bacarisse

Antoninus Twink <no****@nospam.invalidwrites:

On 15 Oct 2008 at 21:23, Bill Cunningham wrote:
>I want to print out the Chinese character meaning water which is
decimal 27750 I believe. Do I use wprintf to do this and just include
wchar.h ? So far I haven't gotten anything to work.

To be honest, internationalization in "standard" C is a complete mess,
hacked on imperfectly to the language at the last possible minute. The
wchar_t representation of a string is platform *and locale* dependent,
so bad things can happen if the run-time locale of your program is
different from the compile-time locale.

I may regret this but I can't see what you mean by this. The only
meaning I can put on it applies equally to programs that use a library
like ICU.

The best advice is to take advantage of an existing Unicode library:
someone else has already made the mistakes you're likely to made,
debugged them, and put the resulting code in a library for you to use,
so why reinvent the wheel?

A good option could be the ICU library (http://www.icu-project.org)
developed at IBM.

Do you really think that is easier than either of the methods
illustrated here:

#include <wchar.h>
#include <locale.h>
#include <stdio.h>

int main(int argc, char **argv)
{
wchar_t water = 27750;
setlocale(LC_ALL, "");
printf("æ±¦");
printf("%lc\n", water);
return 0;
}
Of course, there are numerous way in which this can go wrong, but that
also apply to using ICU.

--
Ben.

Oct 16 '08 #3

Michael

Bill Cunningham wrote:

I want to print out the Chinese character meaning water which is decimal
27750 I believe. Do I use wprintf to do this and just include wchar.h ? So
far I haven't gotten anything to work.

Bill

If you use UTF-8, then the original C library is already enough.

Oct 16 '08 #4

lovecreatesbeauty

On Oct 16, 1:00 pm, Michael <mich...@michaeldadmum.no-ip.orgwrote:

Bill Cunningham wrote:
I want to print out the Chinese character meaning water which is decimal
27750 I believe. Do I use wprintf to do this and just include wchar.h ? So
far I haven't gotten anything to work.

If you use UTF-8, then the original C library is already enough.

Yes. I can print the Chinese word for water as I print ascii on my
machine.

(btw, the Chinese word for water is $B?e(B.
http://www.chinese-tools.com/tools/c...l?cn=%E6%B0%B4,
http://www.chinese-tools.com/tools/s...ml?q=%E6%B0%B4 )

$ cat a.c
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
if (!argv[1]) return EXIT_FAILURE;
printf("%s\n", argv[1]);
return EXIT_SUCCESS;
}

$ make && ./a.out "hello $B?e(B"
gcc -ansi -pedantic -Wall -W -c -o a.o a.c
a.c:4: warning: unused parameter 'argc'
gcc a.o -o a.out
hello $B?e(B
$

Oct 16 '08 #5

Bill Cunningham

Ben I am not seeing what you and Antonius are meaning by saying
"locale". I understand run-time and compile-time but I've never used the
term "locale".

Bill

Oct 16 '08 #6

Ben Bacarisse

"Bill Cunningham" <no****@nspam.invalidwrites:

Ben I am not seeing what you and Antonius are meaning by saying
"locale". I understand run-time and compile-time but I've never used the
term "locale".

I did not use the term and I claimed that I could understand what
Antoninus Twink meant by his posting. Unless he comes back to explain
what he meant, I suggest you ignore the term (as he used it).

--
Ben.

Oct 16 '08 #7

Antoninus Twink

On 16 Oct 2008 at 20:40, Ben Bacarisse wrote:

"Bill Cunningham" <no****@nspam.invalidwrites:
> Ben I am not seeing what you and Antonius are meaning by saying
"locale". I understand run-time and compile-time but I've never used
the term "locale".

I did not use the term and I claimed that I could understand what
Antoninus Twink meant by his posting. Unless he comes back to explain
what he meant, I suggest you ignore the term (as he used it).

I have the impression (perhaps it's just an unfounded prejudice) that
trying to work portably with wide characters in raw C is fraught with
difficulty, and relying on intelligent library routines is a safer
option.

Here's a quote from the wprintf manpage:

glibc represents wide characters using their Unicode (ISO-10646)
code point, but other platforms donâ€™t do this. Also, the use of C99
universal character names of the form \unnnn does not solve this
problem. Therefore, in internationalized programs, the format string
should consist of ASCII wide characters only, or should be
constructed at run time in an internationalized way (e.g., using
gettext(3) or iconv(3), followed by mbstowcs(3)).

Oct 16 '08 #8

Bill Cunningham

"Antoninus Twink" <no****@nospam.invalidwrote in message
news:sl*******************@nospam.invalid...

[snip]

Here's a quote from the wprintf manpage:

glibc represents wide characters using their Unicode (ISO-10646)
code point, but other platforms don't do this. Also, the use of C99
universal character names of the form \unnnn does not solve this
problem. Therefore, in internationalized programs, the format string
should consist of ASCII wide characters only, or should be
constructed at run time in an internationalized way (e.g., using
gettext(3) or iconv(3), followed by mbstowcs(3)).

I have gettext and FSF's libiconv on my system. I will have to find out
what mbstowcs is. Ok I see what you're trying to say. Basically stay away
from C's wchar.h functions and use something better.

Bill

Oct 16 '08 #9

Ben Bacarisse

"Bill Cunningham" <no****@nspam.invalidwrites:

"Antoninus Twink" <no****@nospam.invalidwrote in message
news:sl*******************@nospam.invalid...

[snip]

>Here's a quote from the wprintf manpage:

glibc represents wide characters using their Unicode (ISO-10646)
code point, but other platforms don't do this. Also, the use of C99
universal character names of the form \unnnn does not solve this
problem. Therefore, in internationalized programs, the format string
should consist of ASCII wide characters only, or should be
constructed at run time in an internationalized way (e.g., using
gettext(3) or iconv(3), followed by mbstowcs(3)).

I have gettext and FSF's libiconv on my system. I will have to find out
what mbstowcs is. Ok I see what you're trying to say. Basically stay away
from C's wchar.h functions and use something better.

That can't be what he is saying because mbstowcs is, roughly speaking,
one of "C's whcar.h functions".

I think, from the sort of programs I've seen you write, you will be
fine with standard C for a while yet.

There *is* a problem with wide character support but it is not fixed
by using other libraries. If there is going to be a miss-match
between the wide character representation used by your compiler and
that used by your run-time, then your will have trouble. The solution
is to use only run-time strings (this is what the quote is saying but
I have translated it from the system specific language of glibc,
gettext etc.). This applies to any program using any such facilities,
including the standard ones[1].

If you can assume that there is no such miss-match, then all is well.

[1] In fact it applies to all programs that use any character data, it
is just that we all assume that the execution and source character
sets are the same these days. In the old days, this problem occurred
even with printf("Hello world.\n");

--
Ben.

Oct 16 '08 #10

by: Jonathan Mcdougall | last post by:

I started using boost's filesystem library a couple of days ago. In its FAQ, it states "Wide-character names would provide an illusion of portability where portability does not in fact exist....

C / C++

Going wide and international

by: Steve | last post by:

Hi, I've been charged with investigating the possibilities of internationalizing our C++ libraries. std::strings are used all over the place, and unfortunately a mixture of...

C / C++

Multibyte VS. Wide

by: yazan jab | last post by:

Is it true that Multibyte characters are : char arrays (witch represent a string from the basic characters set). In this case Wide characters are the way for encoding characters from the...

C / C++

wcsftime output encoding

by: Roger Leigh | last post by:

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The program listed below demonstrates the use of wcsftime() and std::time_put<wchar_t> which is a C++ wrapper around it. (I know this isn't C; but...

C / C++

Wide Characters and tchar

by: Anitha Adusumilli | last post by:

Hi Can someone pls explain the usage of wide characters and tchar? Also, what should I be careful about, while coding in C, to make my code portable and suitable for internationalization? ( I...

C / C++

wchar_t and wide characters

by: jjf | last post by:

Do Standard C's wide characters and wide strings require absolutely that each character be stored in a single wchar_t, or can characters be "multi-wchar_t" in the same way that they can be...

C / C++

8 bit character string to 16 bit character string

by: Brand Bogard | last post by:

Does the C standard include a library function to convert an 8 bit character string to a 16 bit character string?

C / C++

writing wide chars

by: Elie Roux | last post by:

Hello, I would like to write a wide chars string with printf, but I do not really understand the behaviour I have with this basic test program for example : #include <stdlib.h> #include...

C / C++

how to convert narrow string to wide string and vice versa?

by: thinktwice | last post by:

i'm using VC++6 IDE i know i could use macros like A2T, T2A, but is there any way more decent way to do this?

C / C++

wide character (unicode) and multi-byte character

by: =?Utf-8?B?R2Vvcmdl?= | last post by:

Hello everyone, Wide character and multi-byte character are two popular encoding schemes on Windows. And wide character is using unicode encoding scheme. But each time I feel confused when...

.NET Framework

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

wide characters

Similar topics