Problem using wchar_t and wprintf

Rui Maciel

I've just started learning how to use the wchar_t data type as the basis for
Unicode strings and unfortunately I'm having quite a bit of problems, both
in the C front and the Unicode front.

In this case,it seems that the wprintf function isn't able to print a string
beyond the first character. I don't have a clue why this is happening. Here
is the test code:

<code>
#include <stdlib.h>
#include <wchar.h>
int main(int argc, char *argv[])
{
wchar_t *snafu = L"notaÃ§Ã£o" ;
wprintf(L"%s\n" ,snafu);
return EXIT_SUCCESS;
}
</code>
On a side note, I was amazed at the amount of information available
regarding the whole Unicode in C issue. It's practically nonexistent. As a
sign, according to Google Groups since at far as 2000 this newsgroup only
saw about 18 threads where the the word wprintf was mentioned. Is everyone
purposely ignoring Unicode or is there a better, standard way to handle it
besides using wchar_t and all those w* functions?
Rui Maciel
--
Running Kubuntu 6.10 with KDE 3.5.6 and proud of it.
jabber:ru****** **@jabber.org

Feb 27 '07 #1

Subscribe Reply

23911

=?utf-8?B?SGFyYWxkIHZhbiBExLNr?=

Rui Maciel wrote:

I've just started learning how to use the wchar_t data type as the basis for
Unicode strings and unfortunately I'm having quite a bit of problems, both
in the C front and the Unicode front.

In this case,it seems that the wprintf function isn't able to print a string
beyond the first character. I don't have a clue why this is happening. Here
is the test code:

<code>
#include <stdlib.h>
#include <wchar.h>
int main(int argc, char *argv[])
{
wchar_t *snafu = L"notaÃ§Ã£o" ;
wprintf(L"%s\n" ,snafu);
return EXIT_SUCCESS;
}
</code>

%s is the format specifier for an ordinary character string (char *),
not a wide character string. Use %ls for that. You can print multibyte
character strings and wide character strings both, with both printf()
and wprintf().

On a side note, I was amazed at the amount of information available
regarding the whole Unicode in C issue. It's practically nonexistent. As a
sign, according to Google Groups since at far as 2000 this newsgroup only
saw about 18 threads where the the word wprintf was mentioned. Is everyone
purposely ignoring Unicode or is there a better, standard way to handle it
besides using wchar_t and all those w* functions?

You cannot use printf() and wprintf() on the same output stream, and
the standard does not guarantee (to the best of my knowledge) that the
file format used by the wchar_t-based I/O functions is the same as
that of the char-based I/O functions. It may be better to read in and
write out data as multibyte strings, and only treat them as wide
strings internally.

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

int main(int argc, char *argv[])
{
wchar_t *snafu = L"notaÃ§Ã£o" ;

setlocale(LC_CT YPE, "");
printf("%ls\n", snafu);
return EXIT_SUCCESS;
}

Feb 27 '07 #2

Rui Maciel

Harald van DÄ³k wrote:

%s is the format specifier for an ordinary character string (char *),
not a wide character string. Use %ls for that. You can print multibyte
character strings and wide character strings both, with both printf()
and wprintf().

Thanks! That did the trick.

You cannot use printf() and wprintf() on the same output stream, and
the standard does not guarantee (to the best of my knowledge) that the
file format used by the wchar_t-based I/O functions is the same as
that of the char-based I/O functions. It may be better to read in and
write out data as multibyte strings, and only treat them as wide
strings internally.

So it seems that the use of wchar_t and related functions is a nice source
of headaches and a hefty dose of PitA. If those problems weren't enough it
seems that there is a sever drought of information relating to that theme.
According to my experience, there isn't a single C tutorial that delves
into it. All those tutorials that mix C and C++ together were bad enough
but noticing that none of the decent ones even mentions wchar_t anywhere...
That's bad.

So, where can I get my hands on a nice document which explains the whole
Unicode through wchar_t thing?
Thanks for the help
Rui Maciel
--
Running Kubuntu 6.10 with KDE 3.5.6 and proud of it.
jabber:ru****** **@jabber.org

Feb 28 '07 #3

Yevgen Muntyan

Rui Maciel wrote:
[snip]

So it seems that the use of wchar_t and related functions is a nice source
of headaches and a hefty dose of PitA.

People say it's not bad if you don't demand too much from it.
E.g. if you have some string processing program which pretends
everything is latin, you may be able to replace char functions with
their wide characters equivalents and get a program which works with
Chinese, for free.

If those problems weren't enough it
seems that there is a sever drought of information relating to that theme.
According to my experience, there isn't a single C tutorial that delves
into it. All those tutorials that mix C and C++ together were bad enough
but noticing that none of the decent ones even mentions wchar_t anywhere...
That's bad.

So, where can I get my hands on a nice document which explains the whole
Unicode through wchar_t thing?

There is no "Unicode through wchar_t" thing. Wide character business in
C is *not* "Unicode in C". It depends on what you need. If you want to
get list of words in console from user and count them, you use wchar_t.
If you want to save a file and read it later, you better use
non-standard stuff, convert your data to whatever encoding you like and
back, etc. If you are on windows NT you can use wchar_t without worries
(it's fixed UTF16), but then there are problems with the standard. So
you don't want to know how to handle unicode in standard C, you want
to know what you have available on your platform(s) and pick
what's easier/better for you. (E.g. wchar_t if you only care
about windows; glib or icu if you want more portability; you may
use a library which handles everything internally and you may not
care about unicode and such at all, etc.)

Yevgen

Feb 28 '07 #4

Ben Pfaff

Yevgen Muntyan <mu************ ****@tamu.eduwr ites:

There is no "Unicode through wchar_t" thing. Wide character business in
C is *not* "Unicode in C".

Well, not in general. If the compiler defines
__STDC_ISO_1064 6__, however, then wchar_t encodes ISO 10646 code
points, which is essentially Unicode.
--
int main(void){char p[]="ABCDEFGHIJKLM NOPQRSTUVWXYZab cdefghijklmnopq rstuvwxyz.\
\n",*q="kl BIcNBFr.NKEzjwC IxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+= strchr(p,*q++)-p;if(i>=(int)si zeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}

Feb 28 '07 #5

Yevgen Muntyan

Ben Pfaff wrote:

Yevgen Muntyan <mu************ ****@tamu.eduwr ites:

>There is no "Unicode through wchar_t" thing. Wide character business in
C is *not* "Unicode in C".

Well, not in general. If the compiler defines
__STDC_ISO_1064 6__, however, then wchar_t encodes ISO 10646 code
points, which is essentially Unicode.

But even if wchar_t sequences handled by the C library can represent
whole unicode, you don't know *how* it does it. So if you actually need
to be able to transfer data somehow somewhere (like save and load, even
on the same machine), you get a problem. Which is why I said that, it
surely depends on "Unicode in C thing" interpretation :)
And one could say that conforming implementation is allowed to
ignore unicode and use ascii and one-byte wchar_t (you know,
we are in comp.lang.c).

Yevgen

Feb 28 '07 #6

Yevgen Muntyan

Yevgen Muntyan wrote:

Ben Pfaff wrote:
>Yevgen Muntyan <mu************ ****@tamu.eduwr ites:

>>There is no "Unicode through wchar_t" thing. Wide character business in
C is *not* "Unicode in C".

Well, not in general. If the compiler defines
__STDC_ISO_106 46__, however, then wchar_t encodes ISO 10646 code
points, which is essentially Unicode.

But even if wchar_t sequences handled by the C library can represent
whole unicode, you don't know *how* it does it. So if you actually need
to be able to transfer data somehow somewhere (like save and load, even
on the same machine), you get a problem. Which is why I said that, it
surely depends on "Unicode in C thing" interpretation :)
And one could say that conforming implementation is allowed to
ignore unicode and use ascii and one-byte wchar_t (you know,
we are in comp.lang.c).

And there is always that nice MS implementation. So if "standard C" is
C99, then we ignore MS, which may be impractical; if "standard C" is
C90, then there is no standard way to do anything with unicode and
alike. So in the end what you do with wchar_t is really what you can do
and what you do in particular implementation( s).

Yevgen

Mar 1 '07 #7

=?utf-8?B?SGFyYWxkIHZhbiBExLNr?=

Yevgen Muntyan wrote:

Ben Pfaff wrote:
Yevgen Muntyan <mu************ ****@tamu.eduwr ites:

There is no "Unicode through wchar_t" thing. Wide character business in
C is *not* "Unicode in C".
Well, not in general. If the compiler defines
__STDC_ISO_1064 6__, however, then wchar_t encodes ISO 10646 code
points, which is essentially Unicode.

But even if wchar_t sequences handled by the C library can represent
whole unicode, you don't know *how* it does it.

Yes, you do. If __STDC_ISO_1064 6__ is defined (there might also be a
minimum value), then

wchar_t wc = 0x20AC;

is guaranteed to set wc to the Euro sign, and vice versa.

You don't know how this will be encoded when you convert it to a
multibyte character using the standard routines (or whether it can be
at all), but that's not a wchar_t sequence issue. (And of course, you
can portably write this to a file as UTF-8 using your own conversion
routines (or UTF-32 if you like simplicity, or anything else), and
read it back the same way.)

[...]

And one could say that conforming implementation is allowed to
ignore unicode and use ascii and one-byte wchar_t (you know,
we are in comp.lang.c).

You seem to be saying that a one-byte eight-bit wchar_t is allowed but
useless. It's not. It's useful for making multibyte-aware programs
work without modifications even on systems that do not support
multibyte characters.

Mar 2 '07 #8

Yevgen Muntyan

Harald van DÄ³k wrote:

Yevgen Muntyan wrote:
>Ben Pfaff wrote:
>>Yevgen Muntyan <mu************ ****@tamu.eduwr ites:

There is no "Unicode through wchar_t" thing. Wide character business in
C is *not* "Unicode in C".
Well, not in general. If the compiler defines
__STDC_ISO_10 646__, however, then wchar_t encodes ISO 10646 code
points, which is essentially Unicode.
But even if wchar_t sequences handled by the C library can represent
whole unicode, you don't know *how* it does it.

Yes, you do. If __STDC_ISO_1064 6__ is defined (there might also be a
minimum value), then

wchar_t wc = 0x20AC;

is guaranteed to set wc to the Euro sign, and vice versa.
You don't know how this will be encoded when you convert it to a
multibyte character using the standard routines (or whether it can be
at all), but that's not a wchar_t sequence issue. (And of course, you
can portably write this to a file as UTF-8 using your own conversion
routines (or UTF-32 if you like simplicity, or anything else), and
read it back the same way.)

I take "you don't know *how* it does it" part back, it was my
ignorance. If __STDC_ISO_1064 6__ is defined, you can actually
do all you need (after you write encoding/decoding routines,
with fancy UTF-8 character layout or UTF-16/32 byte order
marks, I need to finally learn these two!).
Still, there are implementations with working (meaning you can
do Chinese and Russian) wchar_t business but with no C99 (or is it
C95?) compliance. For instance, MS doesn't care about C99
at all; FreeBSD library doesn't have this macro defined (I don't
know if it actually can do Chinese in Russian locale, I guess
it should); glibc does have the macro defined. So if the macro is
defined, you're good; but if it's not defined, you're back to either
writing portable code which doesn't use wchar_t at all (or using
third-party libs for that purpose), or studying what exactly you have on
your target platforms, without any standard support.
I wonder if __STDC_ISO_1064 6__ is considered nice by (majority
of) implementors, or they tend to have "efficient" code.

[...]

>And one could say that conforming implementation is allowed to
ignore unicode and use ascii and one-byte wchar_t (you know,
we are in comp.lang.c).

You seem to be saying that a one-byte eight-bit wchar_t is allowed but
useless. It's not. It's useful for making multibyte-aware programs
work without modifications even on systems that do not support
multibyte characters.

Sure, wchar_t can certainly be useful, and it's certainly good that
wchar_t code won't break if the system doesn't support unicode. But if
the system doesn't support unicode, and you got a file from Chinese
friend, it's useless. It's like a program which parses some text
and pretends everything is ASCII - it may be very useful, and in
in many setups it's all you need.
I'd rather say wchar_t facilities (as in C standard) are useless,
but it'd be too strong a statement, I presume it is used in lot of
software.

Best regards,
Yevgen

Mar 2 '07 #9

Similar topics

3724

Problem with inheritance

by: Victor Chew | last post by:

Can someone tell me why the following code doesn't work: > TestClass.cpp > ------------- > class A > { > public: > virtual void read(wchar_t& ch) { read(&ch, 0, 1); } > virtual void read(wchar_t* buf, int off, int len) = 0; > };

C / C++

2812

wchar_t problem

by: Jan Engelhardt | last post by:

Hello ng, I have found that the following program only prints "empty" but not "hello world". Does anybody know why this happens? #include <stdio.h> #include <wchar.h> int main(void) { wprintf(L"empty\n");

C / C++

4761

Using TsUserEx in C#

by: ssg31415926 | last post by:

I need to use TsUserEx in C#. I found this code from here: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/termserv/termserv/iadstsuserex.asp I've never coded in C++ and I can't read it. Can anyone translate what it's doing at each step? I know there are comments but they leave out bits and don't make sense. For example, there are two comments which read "Get the IADsTSUser interface from the user object." and yet...

C# / C Sharp

2086

Using XP's CD writing support

by: Bob | last post by:

Hi there, Can anyone point me to anything relating to using XP's built-in CD writing support from VB.Net... or even C#, or classic VB...? I just want to write files to the CD and finalise the session and disk. Cheers

Visual Basic .NET

3665

Pro*c - Using c++ to retrieve unicode data

by: PRiya | last post by:

Hi, The common examples provided under "Pro*C/C++ Programming with Unicode" is #include <sqlca.h> main() { ... /* Change to STRING datatype: */ EXEC ORACLE OPTION (CHAR_MAP=STRING) ;

C / C++

31010

how to output the wchar_t type string

by: abbu | last post by:

int main() { wchar_t p="Good Morning"; } How to use cout on p. That is, can I use cout<<p; It's not working.

C / C++

11889

8 bit character string to 16 bit character string

by: Brand Bogard | last post by:

Does the C standard include a library function to convert an 8 bit character string to a 16 bit character string?

C / C++

6050

multibyte,wchar_t and mblen(),wcslen()

by: Marcel Ruff | last post by:

Hi, i have the question on how to determine the string length of a wide string and a multibyte string: 1. Number of letters (one letter may use three bytes) 2. Number of bytes In the code snippet *p points to one chinese word which i copy/pasted from my browser from some chinese homepage,

C / C++

1490

Problem with ISAXXMLReader in VS 2003 C++

by: Aslane | last post by:

I have followed the Sax2 JumpStart example (http://msdn2.microsoft.com/en-us/library/ms994335.aspx), to implement a xml Parser in my project. The projects uses DirectX to create a device and use it, and a console to show a log (the console only works in debug). The parser works well, but when in close my application, the device closes correctly, but the console remains opened. Commenting the parser solves the problem. In release the...

XML

9707

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9585

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10586

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10323

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

6856

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5525

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5658

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4301

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2997

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General