Wide character to multi-byte

PEK

I need some code that convert a multi-byte string to a Unicode string,
and Unicode to multi-byte. I work mostly in Windows and know how to
solve it there, but I would like to have some platform independent
code too.

I have tried with mbtowcs/wctombs but I'm not satisfied with the
result. If wctombs finds a character that can't be converted it return
-1, and stops. I would like to replace such of characters with some
special character and convert so much that is possible.

So I have written my own functions, based on mbtowc and wctomb. I have
successfully converted text from and to different codepages (I have
tried 437, 1252 and 949 [Korean, with some characters that takes two
bytes]). So I think the code is OK, but I would appreciate if someone
else look at it (so I have someone to blame ;-).

The code:

void ConvertCharToWstring(const char* from, wstring &to)
{
to = L"";

size_t pos=0;
wchar_t temp[1];

while(true)
{
size_t len = mbtowc(temp, from+pos, MB_CUR_MAX);

//Found end
if(len == 0)
return;
else if(len == (size_t)-1)
{
//Unknown character, this should never happen
pos++;
}
else
{
to += temp[0];
pos += len;
}
}
}

void ConvertWcharToString
(const wchar_t* from, string &to,
bool* datalost, char unknownchar)
{
to = "";

char* temp = new char[MB_CUR_MAX];

while(*from != L'\0')
{
size_t len = wctomb(temp, *from);

//Found end
if(len == 0)
break;
else if(len == (size_t)-1)
{
//Replace with unknown character
to += unknownchar;

if(datalost != NULL)
*datalost=true;
}
else
{
//Copy all characters
for(size_t i=0; i<len; i++)
to += temp[i];
}

from++;
}

delete [] temp;
}

/PEK

Jul 22 '05 #1

Subscribe Post Reply

8473

Unforgiven

"PEK" <pe*****@home.se> wrote in message
news:41***************@news.individual.net...

I need some code that convert a multi-byte string to a Unicode string,
and Unicode to multi-byte. I work mostly in Windows and know how to
solve it there, but I would like to have some platform independent
code too.
/PEK

// wide-char to multibyte:
wstring source = "something";
typedef ctype<wchar_t> CT;
size_t length = source.length();
char *result = new char[length];
CT const& ct = use_facet<CT>(locale());
ct.narrow(source.data(), source.data() + source.size(), 'X', result);
string dest(result, length);
delete[] result;
return dest;

For the reverse, use ct.widen instead (and make source a string and dest a
wstring of course).
This uses the global C locale, which at program startup is ASCII, *not* the
system locale. To set a specific locale, use:
locale::global(locale("Dutch_Netherlands"));
At least on Windows with VC, this sets the global locale to the system
locale:
locale::global(locale(""));

Note that this won't handle actual multi-byte character sets, i.e. character
sets with characters > 256 (e.g. JIS), those characters will not get
converted properly. I know of no standard way to handle those, just the
WideCharToMultiByte windows method.

--
Unforgiven

Jul 22 '05 #2

Jonathan Turkanis

PEK wrote:

I need some code that convert a multi-byte string to a Unicode string,
and Unicode to multi-byte. I work mostly in Windows and know how to
solve it there, but I would like to have some platform independent
code too.

The standard C++ solution is to use codecvt facets. Currently these are a bit
hard to use, but there is a proposal to add several components which would make
it easier. See

http://www.open-std.org/jtc1/sc22/wg...004/n1683.html.

In the meantime, both the Boost Serialization library and the soon-to-be-relased
Boost Iostreams

http://home.comcast.net/~jturkanis/i.../doc/?path=5.6

library contain code conversion components. (The documentation for the iostreams
code conversion component is temporarily out-of-sync with the source.)

You can also use the Dinkumware CoreX library, which is reasonably priced and is
the basis for n1683.

Jonathan

Jul 22 '05 #3

Jonathan Turkanis

Unforgiven wrote:

"PEK" <pe*****@home.se> wrote in message
news:41***************@news.individual.net...
I need some code that convert a multi-byte string to a Unicode
string, and Unicode to multi-byte. I work mostly in Windows and know
how to solve it there, but I would like to have some platform
independent code too.
/PEK
Note that this won't handle actual multi-byte character sets, i.e.
character sets with characters > 256 (e.g. JIS), those characters
will not get converted properly. I know of no standard way to handle
those, just the WideCharToMultiByte windows method.

Using mbtowcs/wctombs *is* a standard way to handle multibyte characters. The
prefered C++ solution is to use a codecvt facet instead of a ctype facet.

Jonathan

Jul 22 '05 #4

Unforgiven

"Jonathan Turkanis" <te******@kangaroologic.com> wrote in message
news:34*************@individual.net...

Unforgiven wrote:
"PEK" <pe*****@home.se> wrote in message
news:41***************@news.individual.net...
I need some code that convert a multi-byte string to a Unicode
string, and Unicode to multi-byte. I work mostly in Windows and know
how to solve it there, but I would like to have some platform
independent code too.
/PEK
Note that this won't handle actual multi-byte character sets, i.e.
character sets with characters > 256 (e.g. JIS), those characters
will not get converted properly. I know of no standard way to handle
those, just the WideCharToMultiByte windows method.

Using mbtowcs/wctombs *is* a standard way to handle multibyte characters.

That I knew, but it has the drawback of bolting on unrecognized characters
instead of replacing them with some predetermined character (like '?'), as
the OP mentioned.
The
prefered C++ solution is to use a codecvt facet instead of a ctype facet.

That I didn't know.

--
Unforgiven

Jul 22 '05 #5

PEK

On Wed, 5 Jan 2005 23:38:00 +0100, "Unforgiven"
<ja*******@hotmail.com> wrote:

"Jonathan Turkanis" <te******@kangaroologic.com> wrote in message
news:34*************@individual.net...
Unforgiven wrote:
"PEK" <pe*****@home.se> wrote in message
news:41***************@news.individual.net...
I need some code that convert a multi-byte string to a Unicode
string, and Unicode to multi-byte. I work mostly in Windows and know
how to solve it there, but I would like to have some platform
independent code too.
/PEK

Note that this won't handle actual multi-byte character sets, i.e.
character sets with characters > 256 (e.g. JIS), those characters
will not get converted properly. I know of no standard way to handle
those, just the WideCharToMultiByte windows method.

Using mbtowcs/wctombs *is* a standard way to handle multibyte characters.

That I knew, but it has the drawback of bolting on unrecognized characters
instead of replacing them with some predetermined character (like '?'), as
the OP mentioned.

A workaround for this is to use mbtowc/wctomb instead and convert the
characters in a loop. This was my solution and it seems to work, or is
there some problems with it?

The
prefered C++ solution is to use a codecvt facet instead of a ctype facet.

That I didn't know.

The code Unforgiven it's a bit obscure, but I think I understand most
of it. But I also want to detect if an unrecognized character was
replaced (I guess I didn't mention that in my earlier post). Another
problem with the code is that I suppose it's hard to calculate the
length of the result when multibyte characters will be used.
/PEK

Jul 22 '05 #6

by: Jonathan Mcdougall | last post by:

I started using boost's filesystem library a couple of days ago. In its FAQ, it states "Wide-character names would provide an illusion of portability where portability does not in fact exist....

C / C++

warning: multi-character character constant...help me!

by: mimmo | last post by:

Hi! I should convert the accented letters of a string in the correspondent letters not accented. But when I compile with -Wall it give me: warning: multi-character character constant Do the...

C / C++

Wide Characters and tchar

by: Anitha Adusumilli | last post by:

Hi Can someone pls explain the usage of wide characters and tchar? Also, what should I be careful about, while coding in C, to make my code portable and suitable for internationalization? ( I...

C / C++

wchar_t and wide characters

by: jjf | last post by:

Do Standard C's wide characters and wide strings require absolutely that each character be stored in a single wchar_t, or can characters be "multi-wchar_t" in the same way that they can be...

C / C++

A simple question - how to convert from UTF8 to wide char (wchar_t) on linux

by: uday.sen | last post by:

Hi, I need to convert a string from UTF8 to wide character (wchar_t *). I perform the same in windows using: MultiByteToWideChar(CP_UTF8, 0, pInput, -1, pOutput, nLen); However, in linux...

C / C++

writing wide chars

by: Elie Roux | last post by:

Hello, I would like to write a wide chars string with printf, but I do not really understand the behaviour I have with this basic test program for example : #include <stdlib.h> #include...

C / C++

how to convert narrow string to wide string and vice versa?

by: thinktwice | last post by:

i'm using VC++6 IDE i know i could use macros like A2T, T2A, but is there any way more decent way to do this?

C / C++

wstring & wifstream

by: toton | last post by:

Hi, I have my program using wstring everywhere instead of string. Similarly I need to process some file, which contains unicode or ascii character. I need to stream them. Thus I use wifstream etc....

C / C++

get wide character and multibyte character value

by: George2 | last post by:

Hello everyone, I need to know the wide character (unicode) and multibyte (UTF-8) values of a character string of czech. I personally know nothing about czech. Is the following approach...

C / C++

wide characters

by: Bill Cunningham | last post by:

I want to print out the Chinese character meaning water which is decimal 27750 I believe. Do I use wprintf to do this and just include wchar.h ? So far I haven't gotten anything to work. Bill

C / C++

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Wide character to multi-byte

Similar topics