Harald van Dijk wrote:
Quote:
Yevgen Muntyan wrote:
Quote:
>Ben Pfaff wrote:
Quote:
>>Yevgen Muntyan <muntyan.removethis@tamu.eduwrites:
>>>
>>>There is no "Unicode through wchar_t" thing. Wide character business in
>>>C is *not* "Unicode in C".
>>Well, not in general. If the compiler defines
>>__STDC_ISO_10646__, however, then wchar_t encodes ISO 10646 code
>>points, which is essentially Unicode.
>But even if wchar_t sequences handled by the C library can represent
>whole unicode, you don't know *how* it does it.
>
Yes, you do. If __STDC_ISO_10646__ is defined (there might also be a
minimum value), then
>
wchar_t wc = 0x20AC;
>
is guaranteed to set wc to the Euro sign, and vice versa.
You don't know how this will be encoded when you convert it to a
multibyte character using the standard routines (or whether it can be
at all), but that's not a wchar_t sequence issue. (And of course, you
can portably write this to a file as UTF-8 using your own conversion
routines (or UTF-32 if you like simplicity, or anything else), and
read it back the same way.)
I take "you don't know *how* it does it" part back, it was my
ignorance. If __STDC_ISO_10646__ is defined, you can actually
do all you need (after you write encoding/decoding routines,
with fancy UTF-8 character layout or UTF-16/32 byte order
marks, I need to finally learn these two!).
Still, there are implementations with working (meaning you can
do Chinese and Russian) wchar_t business but with no C99 (or is it
C95?) compliance. For instance, MS doesn't care about C99
at all; FreeBSD library doesn't have this macro defined (I don't
know if it actually can do Chinese in Russian locale, I guess
it should); glibc does have the macro defined. So if the macro is
defined, you're good; but if it's not defined, you're back to either
writing portable code which doesn't use wchar_t at all (or using
third-party libs for that purpose), or studying what exactly you have on
your target platforms, without any standard support.
I wonder if __STDC_ISO_10646__ is considered nice by (majority
of) implementors, or they tend to have "efficient" code.
Quote:
[...]
>
Quote:
>And one could say that conforming implementation is allowed to
>ignore unicode and use ascii and one-byte wchar_t (you know,
>we are in comp.lang.c).
>
You seem to be saying that a one-byte eight-bit wchar_t is allowed but
useless. It's not. It's useful for making multibyte-aware programs
work without modifications even on systems that do not support
multibyte characters.
Sure, wchar_t can certainly be useful, and it's certainly good that
wchar_t code won't break if the system doesn't support unicode. But if
the system doesn't support unicode, and you got a file from Chinese
friend, it's useless. It's like a program which parses some text
and pretends everything is ASCII - it may be very useful, and in
in many setups it's all you need.
I'd rather say wchar_t facilities (as in C standard) are useless,
but it'd be too strong a statement, I presume it is used in lot of
software.
Best regards,
Yevgen