Problem using wchar_t and wprintf

Rui Maciel

I've just started learning how to use the wchar_t data type as the basis for
Unicode strings and unfortunately I'm having quite a bit of problems, both
in the C front and the Unicode front.

In this case,it seems that the wprintf function isn't able to print a string
beyond the first character. I don't have a clue why this is happening. Here
is the test code:

<code>
#include <stdlib.h>
#include <wchar.h>
int main(int argc, char *argv[])
{
wchar_t *snafu = L"notaÃ§Ã£o";
wprintf(L"%s\n",snafu);
return EXIT_SUCCESS;
}
</code>
On a side note, I was amazed at the amount of information available
regarding the whole Unicode in C issue. It's practically nonexistent. As a
sign, according to Google Groups since at far as 2000 this newsgroup only
saw about 18 threads where the the word wprintf was mentioned. Is everyone
purposely ignoring Unicode or is there a better, standard way to handle it
besides using wchar_t and all those w* functions?
Rui Maciel
--
Running Kubuntu 6.10 with KDE 3.5.6 and proud of it.
jabber:ru********@jabber.org

Feb 27 '07 #1

Subscribe Post Reply

23792

=?utf-8?B?SGFyYWxkIHZhbiBExLNr?=

Rui Maciel wrote:

I've just started learning how to use the wchar_t data type as the basis for
Unicode strings and unfortunately I'm having quite a bit of problems, both
in the C front and the Unicode front.

In this case,it seems that the wprintf function isn't able to print a string
beyond the first character. I don't have a clue why this is happening. Here
is the test code:

<code>
#include <stdlib.h>
#include <wchar.h>
int main(int argc, char *argv[])
{
wchar_t *snafu = L"notaÃ§Ã£o";
wprintf(L"%s\n",snafu);
return EXIT_SUCCESS;
}
</code>

%s is the format specifier for an ordinary character string (char *),
not a wide character string. Use %ls for that. You can print multibyte
character strings and wide character strings both, with both printf()
and wprintf().

On a side note, I was amazed at the amount of information available
regarding the whole Unicode in C issue. It's practically nonexistent. As a
sign, according to Google Groups since at far as 2000 this newsgroup only
saw about 18 threads where the the word wprintf was mentioned. Is everyone
purposely ignoring Unicode or is there a better, standard way to handle it
besides using wchar_t and all those w* functions?

You cannot use printf() and wprintf() on the same output stream, and
the standard does not guarantee (to the best of my knowledge) that the
file format used by the wchar_t-based I/O functions is the same as
that of the char-based I/O functions. It may be better to read in and
write out data as multibyte strings, and only treat them as wide
strings internally.

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

int main(int argc, char *argv[])
{
wchar_t *snafu = L"notaÃ§Ã£o";

setlocale(LC_CTYPE, "");
printf("%ls\n",snafu);
return EXIT_SUCCESS;
}

Feb 27 '07 #2

Rui Maciel

Harald van DÄ³k wrote:

%s is the format specifier for an ordinary character string (char *),
not a wide character string. Use %ls for that. You can print multibyte
character strings and wide character strings both, with both printf()
and wprintf().

Thanks! That did the trick.

You cannot use printf() and wprintf() on the same output stream, and
the standard does not guarantee (to the best of my knowledge) that the
file format used by the wchar_t-based I/O functions is the same as
that of the char-based I/O functions. It may be better to read in and
write out data as multibyte strings, and only treat them as wide
strings internally.

So it seems that the use of wchar_t and related functions is a nice source
of headaches and a hefty dose of PitA. If those problems weren't enough it
seems that there is a sever drought of information relating to that theme.
According to my experience, there isn't a single C tutorial that delves
into it. All those tutorials that mix C and C++ together were bad enough
but noticing that none of the decent ones even mentions wchar_t anywhere...
That's bad.

So, where can I get my hands on a nice document which explains the whole
Unicode through wchar_t thing?
Thanks for the help
Rui Maciel
--
Running Kubuntu 6.10 with KDE 3.5.6 and proud of it.
jabber:ru********@jabber.org

Feb 28 '07 #3

Yevgen Muntyan

Rui Maciel wrote:
[snip]

So it seems that the use of wchar_t and related functions is a nice source
of headaches and a hefty dose of PitA.

People say it's not bad if you don't demand too much from it.
E.g. if you have some string processing program which pretends
everything is latin, you may be able to replace char functions with
their wide characters equivalents and get a program which works with
Chinese, for free.

If those problems weren't enough it
seems that there is a sever drought of information relating to that theme.
According to my experience, there isn't a single C tutorial that delves
into it. All those tutorials that mix C and C++ together were bad enough
but noticing that none of the decent ones even mentions wchar_t anywhere...
That's bad.

So, where can I get my hands on a nice document which explains the whole
Unicode through wchar_t thing?

There is no "Unicode through wchar_t" thing. Wide character business in
C is *not* "Unicode in C". It depends on what you need. If you want to
get list of words in console from user and count them, you use wchar_t.
If you want to save a file and read it later, you better use
non-standard stuff, convert your data to whatever encoding you like and
back, etc. If you are on windows NT you can use wchar_t without worries
(it's fixed UTF16), but then there are problems with the standard. So
you don't want to know how to handle unicode in standard C, you want
to know what you have available on your platform(s) and pick
what's easier/better for you. (E.g. wchar_t if you only care
about windows; glib or icu if you want more portability; you may
use a library which handles everything internally and you may not
care about unicode and such at all, etc.)

Yevgen

Feb 28 '07 #4

Ben Pfaff

Yevgen Muntyan <mu****************@tamu.eduwrites:

There is no "Unicode through wchar_t" thing. Wide character business in
C is *not* "Unicode in C".

Well, not in general. If the compiler defines
__STDC_ISO_10646__, however, then wchar_t encodes ISO 10646 code
points, which is essentially Unicode.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}

Feb 28 '07 #5

Yevgen Muntyan

Ben Pfaff wrote:

Yevgen Muntyan <mu****************@tamu.eduwrites:

>There is no "Unicode through wchar_t" thing. Wide character business in
C is *not* "Unicode in C".

Well, not in general. If the compiler defines
__STDC_ISO_10646__, however, then wchar_t encodes ISO 10646 code
points, which is essentially Unicode.

But even if wchar_t sequences handled by the C library can represent
whole unicode, you don't know *how* it does it. So if you actually need
to be able to transfer data somehow somewhere (like save and load, even
on the same machine), you get a problem. Which is why I said that, it
surely depends on "Unicode in C thing" interpretation :)
And one could say that conforming implementation is allowed to
ignore unicode and use ascii and one-byte wchar_t (you know,
we are in comp.lang.c).

Yevgen

Feb 28 '07 #6

Yevgen Muntyan

Yevgen Muntyan wrote:

Ben Pfaff wrote:
>Yevgen Muntyan <mu****************@tamu.eduwrites:

>>There is no "Unicode through wchar_t" thing. Wide character business in
C is *not* "Unicode in C".

Well, not in general. If the compiler defines
__STDC_ISO_10646__, however, then wchar_t encodes ISO 10646 code
points, which is essentially Unicode.

But even if wchar_t sequences handled by the C library can represent
whole unicode, you don't know *how* it does it. So if you actually need
to be able to transfer data somehow somewhere (like save and load, even
on the same machine), you get a problem. Which is why I said that, it
surely depends on "Unicode in C thing" interpretation :)
And one could say that conforming implementation is allowed to
ignore unicode and use ascii and one-byte wchar_t (you know,
we are in comp.lang.c).

And there is always that nice MS implementation. So if "standard C" is
C99, then we ignore MS, which may be impractical; if "standard C" is
C90, then there is no standard way to do anything with unicode and
alike. So in the end what you do with wchar_t is really what you can do
and what you do in particular implementation(s).

Yevgen

Mar 1 '07 #7

=?utf-8?B?SGFyYWxkIHZhbiBExLNr?=

Yevgen Muntyan wrote:

Ben Pfaff wrote:
Yevgen Muntyan <mu****************@tamu.eduwrites:

There is no "Unicode through wchar_t" thing. Wide character business in
C is *not* "Unicode in C".
Well, not in general. If the compiler defines
__STDC_ISO_10646__, however, then wchar_t encodes ISO 10646 code
points, which is essentially Unicode.

But even if wchar_t sequences handled by the C library can represent
whole unicode, you don't know *how* it does it.

Yes, you do. If __STDC_ISO_10646__ is defined (there might also be a
minimum value), then

wchar_t wc = 0x20AC;

is guaranteed to set wc to the Euro sign, and vice versa.

You don't know how this will be encoded when you convert it to a
multibyte character using the standard routines (or whether it can be
at all), but that's not a wchar_t sequence issue. (And of course, you
can portably write this to a file as UTF-8 using your own conversion
routines (or UTF-32 if you like simplicity, or anything else), and
read it back the same way.)

[...]

And one could say that conforming implementation is allowed to
ignore unicode and use ascii and one-byte wchar_t (you know,
we are in comp.lang.c).

You seem to be saying that a one-byte eight-bit wchar_t is allowed but
useless. It's not. It's useful for making multibyte-aware programs
work without modifications even on systems that do not support
multibyte characters.

Mar 2 '07 #8

Yevgen Muntyan

Harald van DÄ³k wrote:

Yevgen Muntyan wrote:
>Ben Pfaff wrote:
>>Yevgen Muntyan <mu****************@tamu.eduwrites:

There is no "Unicode through wchar_t" thing. Wide character business in
C is *not* "Unicode in C".
Well, not in general. If the compiler defines
__STDC_ISO_10646__, however, then wchar_t encodes ISO 10646 code
points, which is essentially Unicode.
But even if wchar_t sequences handled by the C library can represent
whole unicode, you don't know *how* it does it.

Yes, you do. If __STDC_ISO_10646__ is defined (there might also be a
minimum value), then

wchar_t wc = 0x20AC;

is guaranteed to set wc to the Euro sign, and vice versa.
You don't know how this will be encoded when you convert it to a
multibyte character using the standard routines (or whether it can be
at all), but that's not a wchar_t sequence issue. (And of course, you
can portably write this to a file as UTF-8 using your own conversion
routines (or UTF-32 if you like simplicity, or anything else), and
read it back the same way.)

I take "you don't know *how* it does it" part back, it was my
ignorance. If __STDC_ISO_10646__ is defined, you can actually
do all you need (after you write encoding/decoding routines,
with fancy UTF-8 character layout or UTF-16/32 byte order
marks, I need to finally learn these two!).
Still, there are implementations with working (meaning you can
do Chinese and Russian) wchar_t business but with no C99 (or is it
C95?) compliance. For instance, MS doesn't care about C99
at all; FreeBSD library doesn't have this macro defined (I don't
know if it actually can do Chinese in Russian locale, I guess
it should); glibc does have the macro defined. So if the macro is
defined, you're good; but if it's not defined, you're back to either
writing portable code which doesn't use wchar_t at all (or using
third-party libs for that purpose), or studying what exactly you have on
your target platforms, without any standard support.
I wonder if __STDC_ISO_10646__ is considered nice by (majority
of) implementors, or they tend to have "efficient" code.

[...]

>And one could say that conforming implementation is allowed to
ignore unicode and use ascii and one-byte wchar_t (you know,
we are in comp.lang.c).

You seem to be saying that a one-byte eight-bit wchar_t is allowed but
useless. It's not. It's useful for making multibyte-aware programs
work without modifications even on systems that do not support
multibyte characters.

Sure, wchar_t can certainly be useful, and it's certainly good that
wchar_t code won't break if the system doesn't support unicode. But if
the system doesn't support unicode, and you got a file from Chinese
friend, it's useless. It's like a program which parses some text
and pretends everything is ASCII - it may be very useful, and in
in many setups it's all you need.
I'd rather say wchar_t facilities (as in C standard) are useless,
but it'd be too strong a statement, I presume it is used in lot of
software.

Best regards,
Yevgen

Mar 2 '07 #9

Similar topics

Problem with inheritance

by: Victor Chew | last post by:

Can someone tell me why the following code doesn't work: > TestClass.cpp > ------------- > class A > { > public: > virtual void read(wchar_t& ch) { read(&ch, 0, 1); } > virtual void...

C / C++

wchar_t problem

by: Jan Engelhardt | last post by:

Hello ng, I have found that the following program only prints "empty" but not "hello world". Does anybody know why this happens? #include <stdio.h> #include <wchar.h> int main(void) {...

C / C++

Using TsUserEx in C#

by: ssg31415926 | last post by:

I need to use TsUserEx in C#. I found this code from here: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/termserv/termserv/iadstsuserex.asp I've never coded in C++ and I can't...

C# / C Sharp

Using XP's CD writing support

by: Bob | last post by:

Hi there, Can anyone point me to anything relating to using XP's built-in CD writing support from VB.Net... or even C#, or classic VB...? I just want to write files to the CD and finalise the...

Visual Basic .NET

Pro*c - Using c++ to retrieve unicode data

by: PRiya | last post by:

Hi, The common examples provided under "Pro*C/C++ Programming with Unicode" is #include <sqlca.h> main() { ... /* Change to STRING datatype: */ EXEC ORACLE OPTION (CHAR_MAP=STRING) ;

C / C++

how to output the wchar_t type string

by: abbu | last post by:

int main() { wchar_t p="Good Morning"; } How to use cout on p. That is, can I use cout<<p; It's not working.

C / C++

8 bit character string to 16 bit character string

by: Brand Bogard | last post by:

Does the C standard include a library function to convert an 8 bit character string to a 16 bit character string?

C / C++

multibyte,wchar_t and mblen(),wcslen()

by: Marcel Ruff | last post by:

Hi, i have the question on how to determine the string length of a wide string and a multibyte string: 1. Number of letters (one letter may use three bytes) 2. Number of bytes In the code...

C / C++

Problem with ISAXXMLReader in VS 2003 C++

by: Aslane | last post by:

I have followed the Sax2 JumpStart example (http://msdn2.microsoft.com/en-us/library/ms994335.aspx), to implement a xml Parser in my project. The projects uses DirectX to create a device and use it,...

XML

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing