By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,159 Members | 888 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,159 IT Pros & Developers. It's quick & easy.

System functions + wchar_t

P: n/a
Hi all,

I've been thinking about all the system functions which accept wchar_t.
The point is that they don't define what encoding the wchar_t has to
be. Let us assume that all the exernal input is UTF-8 and all the
output is also UTF-8 and your internal representation is using wchar_t
encoded using UTF-16. So when you call wcout or other system functions
which accept wide characters, what encoding do they assume?

Regards

Dec 23 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a
gamehack wrote:
I've been thinking about all the system functions which accept wchar_t.
What system functions are those? Do you mean platform-specific ones?
The point is that they don't define what encoding the wchar_t has to
be.
It's probably implementation-defined or platform-defined. Have you tried
reading the documentation?
Let us assume that all the exernal input is UTF-8 and all the
output is also UTF-8 and your internal representation is using wchar_t
encoded using UTF-16. So when you call wcout or other system functions
which accept wide characters, what encoding do they assume?


I would venture a guess that _locales_ have something to do with it.

V
Dec 23 '05 #2

P: n/a
gamehack wrote:
Hi all,

I've been thinking about all the system functions which accept wchar_t.
The point is that they don't define what encoding the wchar_t has to
be. Let us assume that all the exernal input is UTF-8 and all the
output is also UTF-8 and your internal representation is using wchar_t
encoded using UTF-16. So when you call wcout or other system functions
which accept wide characters, what encoding do they assume?

Regards

Welcome to the piss poor implementation of internationalization in C++.
The implementation punts and assumes that you can always uniquely
convert from wide stream to multibyte unsing the woefully inadequate
C library function.
Dec 23 '05 #3

P: n/a
Victor Bazarov wrote:
gamehack wrote:
I've been thinking about all the system functions which accept wchar_t.


What system functions are those? Do you mean platform-specific ones?

Anything in the standard that takes an filename for one (fstreams,
etc..). The main args are another.
Dec 23 '05 #4

P: n/a
Ron Natalie wrote:
Victor Bazarov wrote:
gamehack wrote:
I've been thinking about all the system functions which accept wchar_t.

What system functions are those? Do you mean platform-specific ones?

Anything in the standard that takes an filename for one (fstreams,
etc..). The main args are another.


I guess all those functions, that gamehack has in mind, interpret
strings of chars according to locales on particular operating system.

Cheers
--
Mateusz Łoskot
http://mateusz.loskot.net
Dec 23 '05 #5

P: n/a
That's what I suspected :)

Dec 23 '05 #6

P: n/a
Mateusz Łoskot wrote:
Ron Natalie wrote:
Victor Bazarov wrote:
gamehack wrote:

I've been thinking about all the system functions which accept wchar_t.
What system functions are those? Do you mean platform-specific ones?

Anything in the standard that takes an filename for one (fstreams,
etc..). The main args are another.


I guess all those functions, that gamehack has in mind, interpret
strings of chars according to locales on particular operating system.

That is a nonsensical statement. There is no guarantee that there
exists a way to map wchar_t based strings into a string of chars
in any locale.

Dec 23 '05 #7

P: n/a
Ron Natalie wrote:
Mateusz Łoskot wrote:
Ron Natalie wrote:
Victor Bazarov wrote:

gamehack wrote:

> I've been thinking about all the system functions which accept
> wchar_t.

What system functions are those? Do you mean platform-specific ones?

Anything in the standard that takes an filename for one (fstreams,
etc..). The main args are another.

I guess all those functions, that gamehack has in mind, interpret
strings of chars according to locales on particular operating system.

That is a nonsensical statement. There is no guarantee that there
exists a way to map wchar_t based strings into a string of chars
in any locale.


I said I guess. So, please explain me how function like fopen knows
what is the codepage of ASCII string passed to it?
I think there must be some trick or so because fopen is able find path
given in many charsets.

Cheers
--
Mateusz Łoskot
http://mateusz.loskot.net
Dec 23 '05 #8

P: n/a
Mateusz Łoskot wrote:
I said I guess. So, please explain me how function like fopen knows
what is the codepage of ASCII string passed to it?
I think there must be some trick or so because fopen is able find path
given in many charsets.

It works on UNIX because you effectively have an 8 bit clean path.
Any character other than / and \0 is legitimate.
Dec 23 '05 #9

P: n/a
Ron Natalie wrote:
Mateusz Łoskot wrote:
I said I guess. So, please explain me how function like fopen knows
what is the codepage of ASCII string passed to it?
I think there must be some trick or so because fopen is able find path
given in many charsets.

It works on UNIX because you effectively have an 8 bit clean path.
Any character other than / and \0 is legitimate.


I'm not sure. There is still a possibility that filesystem is
"incompatible", in term of charset, with given path and the file can not
be found.

Cheers
--
Mateusz Łoskot
http://mateusz.loskot.net
Dec 24 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.