How do mbtowc() and wctomb() work?

Ross

Hi,

I have a question regarding how the mbtowc() and wctomb() functions
work. Given that some compilers (gcc, for example) allow the wide
execution character set to be specified at compile time, and that the
multibyte encoding depends on LC_CTYPE, this suggests that (at runtime)
the compiled program has the ability to convert character strings
between arbitrary character sets.

My question is, how is this conversion performed? As I understand it,
the C library does not have this facility. So what does...?

Thanks in advance

Jul 24 '06 #1

Subscribe Post Reply

5162

J. J. Farrell

Ross wrote:

>
I have a question regarding how the mbtowc() and wctomb() functions
work. Given that some compilers (gcc, for example) allow the wide
execution character set to be specified at compile time, and that the
multibyte encoding depends on LC_CTYPE, this suggests that (at runtime)
the compiled program has the ability to convert character strings
between arbitrary character sets.

My question is, how is this conversion performed? As I understand it,
the C library does not have this facility. So what does...?

The C library.

Jul 25 '06 #2

Ross

J. J. Farrell wrote:

Ross wrote:

I have a question regarding how the mbtowc() and wctomb() functions
work. Given that some compilers (gcc, for example) allow the wide
execution character set to be specified at compile time, and that the
multibyte encoding depends on LC_CTYPE, this suggests that (at runtime)
the compiled program has the ability to convert character strings
between arbitrary character sets.

My question is, how is this conversion performed? As I understand it,
the C library does not have this facility. So what does...?

The C library.

If the C library has this functionality, is it available through the
API? I can't find any mention of it in the C spec.

Jul 25 '06 #3

Simon Biber

Ross wrote:

J. J. Farrell wrote:
>Ross wrote:
>>I have a question regarding how the mbtowc() and wctomb() functions
work. Given that some compilers (gcc, for example) allow the wide
execution character set to be specified at compile time, and that the
multibyte encoding depends on LC_CTYPE, this suggests that (at runtime)
the compiled program has the ability to convert character strings
between arbitrary character sets.

My question is, how is this conversion performed? As I understand it,
the C library does not have this facility. So what does...?
The C library.

If the C library has this functionality, is it available through the
API? I can't find any mention of it in the C spec.

The API essentially consists of the setlocale, mbtowc, wctomb, mbstowcs,
wcstombs, mbrtowc, wcrtomb, mbsrtowcs and wcsrtombs functions!

On a hosted C implementation, the C library is required to provide these
functions. They don't have to be particularly useful. For example, the
library may support only the "C" and "" locales, and the native locale
"" may be equivalent to the "C" locale. In this case there is not much
scope for converting character strings between arbitrary character sets.

If your C spec doesn't contain descriptions of those functions, you may
find that it does not conform to the latest C standard.

--
Simon.

Jul 25 '06 #4

Ross

Simon Biber wrote:

Ross wrote:
J. J. Farrell wrote:
Ross wrote:
I have a question regarding how the mbtowc() and wctomb() functions
work. Given that some compilers (gcc, for example) allow the wide
execution character set to be specified at compile time, and that the
multibyte encoding depends on LC_CTYPE, this suggests that (at runtime)
the compiled program has the ability to convert character strings
between arbitrary character sets.

My question is, how is this conversion performed? As I understand it,
the C library does not have this facility. So what does...?
The C library.
If the C library has this functionality, is it available through the
API? I can't find any mention of it in the C spec.

The API essentially consists of the setlocale, mbtowc, wctomb, mbstowcs,
wcstombs, mbrtowc, wcrtomb, mbsrtowcs and wcsrtombs functions!

On a hosted C implementation, the C library is required to provide these
functions. They don't have to be particularly useful. For example, the
library may support only the "C" and "" locales, and the native locale
"" may be equivalent to the "C" locale. In this case there is not much
scope for converting character strings between arbitrary character sets.

If your C spec doesn't contain descriptions of those functions, you may
find that it does not conform to the latest C standard.

--
Simon.

Thanks, I'm aware of those functions. However, given that both the
execution and native character sets are flexible, the existence of
these functions seems to suggest that the C library *should* have the
ability to convert between truly arbitrary character sets, not just the
encoding of 'mb' and 'wc'. I guess the existence of such a facility is
implied, rather than required, hence the reason the API doesn't provide
an iconv-esque interface.

Jul 25 '06 #5

Simon Biber

Ross wrote:

Simon Biber wrote:
>The API essentially consists of the setlocale, mbtowc, wctomb, mbstowcs,
wcstombs, mbrtowc, wcrtomb, mbsrtowcs and wcsrtombs functions!

On a hosted C implementation, the C library is required to provide these
functions. They don't have to be particularly useful. For example, the
library may support only the "C" and "" locales, and the native locale
"" may be equivalent to the "C" locale. In this case there is not much
scope for converting character strings between arbitrary character sets.

If your C spec doesn't contain descriptions of those functions, you may
find that it does not conform to the latest C standard.

Thanks, I'm aware of those functions. However, given that both the
execution and native character sets are flexible, the existence of
these functions seems to suggest that the C library *should* have the
ability to convert between truly arbitrary character sets, not just the
encoding of 'mb' and 'wc'. I guess the existence of such a facility is
implied, rather than required, hence the reason the API doesn't provide
an iconv-esque interface.

Yes. It's what I call "partial standardisation". The API is defined but
it's not useful in portable code since you can't tell whether there is
actually any useful functionality behind it. Some implementations
provide a useful implementation with many locales and many different
encodings (glibc) but some implementations don't bother (msvcrt).

By the way, please snip out signatures (anything following -- on its own
line) unless you are specifically commenting on someone's signature.

--
Simon.

Jul 25 '06 #6

P.J. Plauger

"Ross" <ro************@yahoo.co.ukwrote in message
news:11**********************@h48g2000cwc.googlegr oups.com...

Simon Biber wrote:
>Ross wrote:
J. J. Farrell wrote:
Ross wrote:
I have a question regarding how the mbtowc() and wctomb() functions
work. Given that some compilers (gcc, for example) allow the wide
execution character set to be specified at compile time, and that the
multibyte encoding depends on LC_CTYPE, this suggests that (at
runtime)
the compiled program has the ability to convert character strings
between arbitrary character sets.

My question is, how is this conversion performed? As I understand it,
the C library does not have this facility. So what does...?
The C library.

If the C library has this functionality, is it available through the
API? I can't find any mention of it in the C spec.

The API essentially consists of the setlocale, mbtowc, wctomb, mbstowcs,
wcstombs, mbrtowc, wcrtomb, mbsrtowcs and wcsrtombs functions!

On a hosted C implementation, the C library is required to provide these
functions. They don't have to be particularly useful. For example, the
library may support only the "C" and "" locales, and the native locale
"" may be equivalent to the "C" locale. In this case there is not much
scope for converting character strings between arbitrary character sets.

If your C spec doesn't contain descriptions of those functions, you may
find that it does not conform to the latest C standard.

--
Simon.

Thanks, I'm aware of those functions. However, given that both the
execution and native character sets are flexible, the existence of
these functions seems to suggest that the C library *should* have the
ability to convert between truly arbitrary character sets, not just the
encoding of 'mb' and 'wc'. I guess the existence of such a facility is
implied, rather than required, hence the reason the API doesn't provide
an iconv-esque interface.

Right. Support for multiple conversions can vary from the trivial,
as Biber described above, to the highly adaptive. See the essay
on multibyte encodings:

http://www.dinkumware.com/manuals/?m...multibyte.html

for an overview of the issues that arise when an implementation
permits various encodings to change.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Jul 25 '06 #7

Simon Biber

P.J. Plauger wrote:

Right. Support for multiple conversions can vary from the trivial,
as Biber described above, to the highly adaptive. See the essay
on multibyte encodings:

http://www.dinkumware.com/manuals/?m...multibyte.html

for an overview of the issues that arise when an implementation
permits various encodings to change.

It's an interesting essay, and it introduced me to many facets (no pun
intended) of C++ that I never bothered learning.

Towards the end it says "some people are proposing the use of UTF-16 as
a wide-character encoding". This is no longer just a proposal; it was
introduced in Windows 2000 and now seems to be fully entrenched in
Microsoft products.

"UTF-16 is the native internal representation of text in the Microsoft
Windows NT/Windows 2000/Windows XP/Windows CE, Qualcomm BREW, and
Symbian operating systems; the Java and .NET bytecode environments; Mac
OS X's Cocoa and Core Foundation frameworks; and the Qt cross-platform
graphical widget toolkit."

It makes wchar_t handling much more tricky than it was intended to be.
Indeed, many programs don't bother considering or handling the surrogate
pairs.

--
Simon.

Jul 25 '06 #8

Stephen Sprunk

"Ross" <ro************@yahoo.co.ukwrote in message
news:11**********************@h48g2000cwc.googlegr oups.com...

Simon Biber wrote:
>The API essentially consists of the setlocale, mbtowc, wctomb,
mbstowcs, wcstombs, mbrtowc, wcrtomb, mbsrtowcs and
wcsrtombs functions!

On a hosted C implementation, the C library is required to provide
these functions. They don't have to be particularly useful. For
example,
the library may support only the "C" and "" locales, and the native
locale "" may be equivalent to the "C" locale. In this case there
is not
much scope for converting character strings between arbitrary
character sets.

If your C spec doesn't contain descriptions of those functions, you
may
find that it does not conform to the latest C standard.

Thanks, I'm aware of those functions. However, given that both the
execution and native character sets are flexible, the existence of
these functions seems to suggest that the C library *should* have
the
ability to convert between truly arbitrary character sets, not just
the
encoding of 'mb' and 'wc'. I guess the existence of such a facility
is
implied, rather than required, hence the reason the API doesn't
provide
an iconv-esque interface.

Well, if you have a decent implementation, you can convert from any
interest charset to wide chars, then change the locale appropriately
and convert them to any other interesting charset.

Figuring out which locales are available (if any besides "C" and "")
is the stumbling block, since they vary from system to system.
Add-ons like iconv() tend to be more useful and more portable in
practice.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

--
Posted via a free Usenet account from http://www.teranews.com

Jul 26 '06 #9

lawrence.jones

Stephen Sprunk <st*****@sprunk.orgwrote:

>
Well, if you have a decent implementation, you can convert from any
interest charset to wide chars, then change the locale appropriately
and convert them to any other interesting charset.

There's no guarantee that the wide character encoding isn't also locale-
specific, so that doesn't work in the general case. Of course, you're
free to define "decent implementation" as one where the wide character
encoding is independent of locale.

-Larry Jones

I'm getting disillusioned with these New Years. -- Calvin

Jul 27 '06 #10

by: Jonas | last post by:

This works fine in Win XP but does not work at all in Win 98. Private WithEvents objIExplorer As InternetExplorer I have to do it like this to get it to work in Win 98 Dim objIExplorer As...

Visual Basic 4 / 5 / 6

Buttons don't work if form is opened on startup

by: Douglas Buchanan | last post by:

Buttons don't work if form is opened on startup A2k If 'frmMain' is set to open by default at startup none of the buttons work. If 'frmMain' is opened from the database window then all the...

Microsoft Access / VBA

How are things done where you work?

by: Brett | last post by:

I'd like to know what management and the work environment are like where you work. I need something relative to compare my current work environment. Here's a few questions I have: 1.) Is it...

Visual Basic .NET

mbtowc - combining character

by: Old Wolf | last post by:

As far as I can see, mbtowc and mbstowcs assume that there is exactly one wide character for each multi-byte sequence. How are you meant to cope with MBS that correspond to two wide characters? ...

C / C++

Problem with wctomb...

by: allez | last post by:

Hi, I'm trying to convert a wide character string in UTF-8 into a multibyte string using wctomb and I'm running into a problem when I try to convert characters that take more than one byte (ie, non...

C / C++

mbtowc recovery

by: kyuupi | last post by:

Limiting ourselves to C90 pre-amendment 1 interfaces, is there a guaranteed way to recover from an invalid character conversion attempt? I'm unable to find anything. In other words, suppose we...

C / C++

mbtowc question

by: Neil Booth | last post by:

What is the behaviour of mbtowc following an attempt to convert an invalid character sequence? My belief is that, if the encoding is state-independent, then mbtowc should continue to work if given...

C / C++

10 Tips to Avoid Information Overload at Work

by: Niheel | last post by:

http://bytes.com/images/howtos/information_overloaded.jpgPaul Graham wrote an interesting article a few months back about how the internet is leading to information overload for information workers...

Career Advice

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

How do mbtowc() and wctomb() work?

Similar topics