-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
This question pertains to both Standard C and platform-dependent
features, hence the crosspost.
I'm trying to understand exactly how the "execution character set"
works. On GNU/Linux, using GCC >= 3.4, if I compile a C source file
(any encoding), by default the execution character set is UTF-8, and
the wide execution character set is UTF-32.
What I want to understand is what the implications of this are on the
various operations I might want to perform on any of the strings. As
an example:
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int
main (void)
{
setlocale (LC_ALL, "");
printf("‘Name1’\n");
printf("%ls\n", L"‘Name2’");
fwide(stderr, 1);
fwprintf(stderr, L"‘Name3’\n");
fwprintf(stderr, L"%s\n", "‘Name4’");
printf("‘Name5’\n");
return 0;
}
If I run in a normal (UTF-8) locale:
$ ./test
‘Name1’
‘Name2’
‘Name3’
‘Name4’
‘Name5’
Now, running in a C locale:
$ ./test
'Name3'
‘Name1’
‘Name5’
"‘Name1’" and "‘Name5’" are the same. These passed through
byte-for-byte identical. No conversions took place, I think.
"‘Name1’" (wide→narrow) was lost. Why?
"‘Name4’" (narrow→wide) was lost. Why?
"‘Name3’" (wide→wide) was *not* lost. Moreover, it was
transliterated (UTF-32->US-ASCII) into a readable form for the locale.
Where does the conversion take place, and how does the C runtime know
what the source and destination charset are? I can't replicate the
conversion with iconv(), so I'd like to know how to do it by hand.
I'd like to understand the reasons for why each of these cases work
the way they do.
Thanks,
Roger
- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iD8DBQFCuzQ4VcFcaSW/uEgRApOUAJ49ghx4LxRo8Tn0RdOafRjdACDBqQCfQOA7
KLhWn0VmNzDLFD8gPHBFpgU=
=rX99
-----END PGP SIGNATURE-----