468,457 Members | 1,690 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,457 developers. It's quick & easy.

How do mbtowc() and wctomb() work?

Hi,

I have a question regarding how the mbtowc() and wctomb() functions
work. Given that some compilers (gcc, for example) allow the wide
execution character set to be specified at compile time, and that the
multibyte encoding depends on LC_CTYPE, this suggests that (at runtime)
the compiled program has the ability to convert character strings
between arbitrary character sets.

My question is, how is this conversion performed? As I understand it,
the C library does not have this facility. So what does...?

Thanks in advance

Jul 24 '06 #1
9 4925

Ross wrote:
>
I have a question regarding how the mbtowc() and wctomb() functions
work. Given that some compilers (gcc, for example) allow the wide
execution character set to be specified at compile time, and that the
multibyte encoding depends on LC_CTYPE, this suggests that (at runtime)
the compiled program has the ability to convert character strings
between arbitrary character sets.

My question is, how is this conversion performed? As I understand it,
the C library does not have this facility. So what does...?
The C library.

Jul 25 '06 #2

J. J. Farrell wrote:
Ross wrote:

I have a question regarding how the mbtowc() and wctomb() functions
work. Given that some compilers (gcc, for example) allow the wide
execution character set to be specified at compile time, and that the
multibyte encoding depends on LC_CTYPE, this suggests that (at runtime)
the compiled program has the ability to convert character strings
between arbitrary character sets.

My question is, how is this conversion performed? As I understand it,
the C library does not have this facility. So what does...?

The C library.
If the C library has this functionality, is it available through the
API? I can't find any mention of it in the C spec.

Jul 25 '06 #3
Ross wrote:
J. J. Farrell wrote:
>Ross wrote:
>>I have a question regarding how the mbtowc() and wctomb() functions
work. Given that some compilers (gcc, for example) allow the wide
execution character set to be specified at compile time, and that the
multibyte encoding depends on LC_CTYPE, this suggests that (at runtime)
the compiled program has the ability to convert character strings
between arbitrary character sets.

My question is, how is this conversion performed? As I understand it,
the C library does not have this facility. So what does...?
The C library.

If the C library has this functionality, is it available through the
API? I can't find any mention of it in the C spec.
The API essentially consists of the setlocale, mbtowc, wctomb, mbstowcs,
wcstombs, mbrtowc, wcrtomb, mbsrtowcs and wcsrtombs functions!

On a hosted C implementation, the C library is required to provide these
functions. They don't have to be particularly useful. For example, the
library may support only the "C" and "" locales, and the native locale
"" may be equivalent to the "C" locale. In this case there is not much
scope for converting character strings between arbitrary character sets.

If your C spec doesn't contain descriptions of those functions, you may
find that it does not conform to the latest C standard.

--
Simon.
Jul 25 '06 #4

Simon Biber wrote:
Ross wrote:
J. J. Farrell wrote:
Ross wrote:
I have a question regarding how the mbtowc() and wctomb() functions
work. Given that some compilers (gcc, for example) allow the wide
execution character set to be specified at compile time, and that the
multibyte encoding depends on LC_CTYPE, this suggests that (at runtime)
the compiled program has the ability to convert character strings
between arbitrary character sets.

My question is, how is this conversion performed? As I understand it,
the C library does not have this facility. So what does...?
The C library.
If the C library has this functionality, is it available through the
API? I can't find any mention of it in the C spec.

The API essentially consists of the setlocale, mbtowc, wctomb, mbstowcs,
wcstombs, mbrtowc, wcrtomb, mbsrtowcs and wcsrtombs functions!

On a hosted C implementation, the C library is required to provide these
functions. They don't have to be particularly useful. For example, the
library may support only the "C" and "" locales, and the native locale
"" may be equivalent to the "C" locale. In this case there is not much
scope for converting character strings between arbitrary character sets.

If your C spec doesn't contain descriptions of those functions, you may
find that it does not conform to the latest C standard.

--
Simon.
Thanks, I'm aware of those functions. However, given that both the
execution and native character sets are flexible, the existence of
these functions seems to suggest that the C library *should* have the
ability to convert between truly arbitrary character sets, not just the
encoding of 'mb' and 'wc'. I guess the existence of such a facility is
implied, rather than required, hence the reason the API doesn't provide
an iconv-esque interface.

Jul 25 '06 #5
Ross wrote:
Simon Biber wrote:
>The API essentially consists of the setlocale, mbtowc, wctomb, mbstowcs,
wcstombs, mbrtowc, wcrtomb, mbsrtowcs and wcsrtombs functions!

On a hosted C implementation, the C library is required to provide these
functions. They don't have to be particularly useful. For example, the
library may support only the "C" and "" locales, and the native locale
"" may be equivalent to the "C" locale. In this case there is not much
scope for converting character strings between arbitrary character sets.

If your C spec doesn't contain descriptions of those functions, you may
find that it does not conform to the latest C standard.

Thanks, I'm aware of those functions. However, given that both the
execution and native character sets are flexible, the existence of
these functions seems to suggest that the C library *should* have the
ability to convert between truly arbitrary character sets, not just the
encoding of 'mb' and 'wc'. I guess the existence of such a facility is
implied, rather than required, hence the reason the API doesn't provide
an iconv-esque interface.
Yes. It's what I call "partial standardisation". The API is defined but
it's not useful in portable code since you can't tell whether there is
actually any useful functionality behind it. Some implementations
provide a useful implementation with many locales and many different
encodings (glibc) but some implementations don't bother (msvcrt).

By the way, please snip out signatures (anything following -- on its own
line) unless you are specifically commenting on someone's signature.

--
Simon.
Jul 25 '06 #6
"Ross" <ro************@yahoo.co.ukwrote in message
news:11**********************@h48g2000cwc.googlegr oups.com...
Simon Biber wrote:
>Ross wrote:
J. J. Farrell wrote:
Ross wrote:
I have a question regarding how the mbtowc() and wctomb() functions
work. Given that some compilers (gcc, for example) allow the wide
execution character set to be specified at compile time, and that the
multibyte encoding depends on LC_CTYPE, this suggests that (at
runtime)
the compiled program has the ability to convert character strings
between arbitrary character sets.

My question is, how is this conversion performed? As I understand it,
the C library does not have this facility. So what does...?
The C library.

If the C library has this functionality, is it available through the
API? I can't find any mention of it in the C spec.

The API essentially consists of the setlocale, mbtowc, wctomb, mbstowcs,
wcstombs, mbrtowc, wcrtomb, mbsrtowcs and wcsrtombs functions!

On a hosted C implementation, the C library is required to provide these
functions. They don't have to be particularly useful. For example, the
library may support only the "C" and "" locales, and the native locale
"" may be equivalent to the "C" locale. In this case there is not much
scope for converting character strings between arbitrary character sets.

If your C spec doesn't contain descriptions of those functions, you may
find that it does not conform to the latest C standard.

--
Simon.

Thanks, I'm aware of those functions. However, given that both the
execution and native character sets are flexible, the existence of
these functions seems to suggest that the C library *should* have the
ability to convert between truly arbitrary character sets, not just the
encoding of 'mb' and 'wc'. I guess the existence of such a facility is
implied, rather than required, hence the reason the API doesn't provide
an iconv-esque interface.
Right. Support for multiple conversions can vary from the trivial,
as Biber described above, to the highly adaptive. See the essay
on multibyte encodings:

http://www.dinkumware.com/manuals/?m...multibyte.html

for an overview of the issues that arise when an implementation
permits various encodings to change.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
Jul 25 '06 #7
P.J. Plauger wrote:
Right. Support for multiple conversions can vary from the trivial,
as Biber described above, to the highly adaptive. See the essay
on multibyte encodings:

http://www.dinkumware.com/manuals/?m...multibyte.html

for an overview of the issues that arise when an implementation
permits various encodings to change.
It's an interesting essay, and it introduced me to many facets (no pun
intended) of C++ that I never bothered learning.

Towards the end it says "some people are proposing the use of UTF-16 as
a wide-character encoding". This is no longer just a proposal; it was
introduced in Windows 2000 and now seems to be fully entrenched in
Microsoft products.

"UTF-16 is the native internal representation of text in the Microsoft
Windows NT/Windows 2000/Windows XP/Windows CE, Qualcomm BREW, and
Symbian operating systems; the Java and .NET bytecode environments; Mac
OS X's Cocoa and Core Foundation frameworks; and the Qt cross-platform
graphical widget toolkit."

It makes wchar_t handling much more tricky than it was intended to be.
Indeed, many programs don't bother considering or handling the surrogate
pairs.

--
Simon.
Jul 25 '06 #8
"Ross" <ro************@yahoo.co.ukwrote in message
news:11**********************@h48g2000cwc.googlegr oups.com...
Simon Biber wrote:
>The API essentially consists of the setlocale, mbtowc, wctomb,
mbstowcs, wcstombs, mbrtowc, wcrtomb, mbsrtowcs and
wcsrtombs functions!

On a hosted C implementation, the C library is required to provide
these functions. They don't have to be particularly useful. For
example,
the library may support only the "C" and "" locales, and the native
locale "" may be equivalent to the "C" locale. In this case there
is not
much scope for converting character strings between arbitrary
character sets.

If your C spec doesn't contain descriptions of those functions, you
may
find that it does not conform to the latest C standard.

Thanks, I'm aware of those functions. However, given that both the
execution and native character sets are flexible, the existence of
these functions seems to suggest that the C library *should* have
the
ability to convert between truly arbitrary character sets, not just
the
encoding of 'mb' and 'wc'. I guess the existence of such a facility
is
implied, rather than required, hence the reason the API doesn't
provide
an iconv-esque interface.
Well, if you have a decent implementation, you can convert from any
interest charset to wide chars, then change the locale appropriately
and convert them to any other interesting charset.

Figuring out which locales are available (if any besides "C" and "")
is the stumbling block, since they vary from system to system.
Add-ons like iconv() tend to be more useful and more portable in
practice.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

--
Posted via a free Usenet account from http://www.teranews.com

Jul 26 '06 #9
Stephen Sprunk <st*****@sprunk.orgwrote:
>
Well, if you have a decent implementation, you can convert from any
interest charset to wide chars, then change the locale appropriately
and convert them to any other interesting charset.
There's no guarantee that the wide character encoding isn't also locale-
specific, so that doesn't work in the general case. Of course, you're
free to define "decent implementation" as one where the wide character
encoding is independent of locale.

-Larry Jones

I'm getting disillusioned with these New Years. -- Calvin
Jul 27 '06 #10

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Old Wolf | last post: by
2 posts views Thread by allez | last post: by
1 post views Thread by kyuupi | last post: by
1 post views Thread by Neil Booth | last post: by
1 post views Thread by subhajit12345 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.