Connecting Tech Pros Worldwide Forums | Help | Site Map

Converting from UTF-16 to UTF-32

Jimmy Shaw
Guest
 
Posts: n/a
#1: Jul 31 '06
Hi everybody,

Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be
mixed up, but is it possible that all UTF-16 "code points" that are 16
bits long appear just the same in UTF-32, but with zero padding and
hence no real conversion is necessary?

If I am completely wrong and some intricate conversion operation needs
to take place, can anyone give me some primer on the subject?

Thanks!


Rolf Magnus
Guest
 
Posts: n/a
#2: Jul 31 '06

re: Converting from UTF-16 to UTF-32


Jimmy Shaw wrote:
Quote:
Hi everybody,
>
Is there any SIMPLE way to convert from UTF-16 to UTF-32?
Not in standard C++.

Clark S. Cox III
Guest
 
Posts: n/a
#3: Jul 31 '06

re: Converting from UTF-16 to UTF-32


On 2006-07-31 10:06:35 -0400, Rolf Magnus <ramagnus@t-online.desaid:
Quote:
Jimmy Shaw wrote:
>
Quote:
>Hi everybody,
>>
>Is there any SIMPLE way to convert from UTF-16 to UTF-32?
>
Not in standard C++.
That's certainly not true.

Clark S. Cox III
Guest
 
Posts: n/a
#4: Jul 31 '06

re: Converting from UTF-16 to UTF-32


On 2006-07-31 08:44:53 -0400, "Jimmy Shaw" <sivan.gal@gmail.comsaid:
Quote:
Hi everybody,
>
Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be
mixed up, but is it possible that all UTF-16 "code points" that are 16
bits long appear just the same in UTF-32, but with zero padding and
hence no real conversion is necessary?
First, your question is off-topic here, as it isn't really a C++ question.

[offtopic]But there is indeed a conversion that is needed (otherwise,
UTF-32 would be a pointless waste of space or UTF-16 would be
incomplete)[/offtopic]
Quote:
If I am completely wrong and some intricate conversion operation needs
to take place, can anyone give me some primer on the subject?
>
Thanks!
[offtopic]The conversion isn't really intricate at all. See
http://www.zvon.org/tmRFC/RFC2781/Ou...ter2.html#sub2 for a
description of the algorithms used to convert to/from UTF-16.[/offtopic]

Julián Albo
Guest
 
Posts: n/a
#5: Jul 31 '06

re: Converting from UTF-16 to UTF-32


Jimmy Shaw wrote:
Quote:
Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be
mixed up, but is it possible that all UTF-16 "code points" that are 16
bits long appear just the same in UTF-32, but with zero padding and
hence no real conversion is necessary?
If I am completely wrong and some intricate conversion operation needs
to take place, can anyone give me some primer on the subject?
http://www.unicode.org/

--
Salu2
Kirit Sćlensminde
Guest
 
Posts: n/a
#6: Aug 1 '06

re: Converting from UTF-16 to UTF-32



Jimmy Shaw wrote:
Quote:
Hi everybody,
>
Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be
mixed up, but is it possible that all UTF-16 "code points" that are 16
bits long appear just the same in UTF-32, but with zero padding and
hence no real conversion is necessary?
>
If I am completely wrong and some intricate conversion operation needs
to take place, can anyone give me some primer on the subject?
>
Thanks!
These are the important bits from the functions that I use. utf32 is a
typedef for a signed 32 bit integer (__int32 on MSVC). utf16 is
normally the same as wchar_t on most platforms, but just in case it
isn't it needs to be sixteen bit. The UTF16 sequence is assumed to be
in Intel endian mode - the same as Windows uses.

You do need to use this sort of belt and braces approach though as this
is a prime vector for security cracks. The checks are even more
important for UTF8 sequences. I think that there's a lot of
improvements that could be made, but the code does work.


std::size_t FSLib::utf::utf16length( const utf32 ch ) {
if ( ch < 0x10000 ) return 1;
else return 2;
}

utf32 FSLib::utf::assertValid( const utf32 ch ) {
try {
if ( ch >= 0xD800 && ch <= 0xDBFF ) throw
FSLib::Exceptions::UnicodeEncoding( L"UTF-32 character is in the UTF-16
leading surrogate pair range." );
if ( ch >= 0xDC00 && ch <= 0xDFFF ) throw
FSLib::Exceptions::UnicodeEncoding( L"UTF-32 character is in the UTF-16
trailing surrogate pair range." );
if ( ch == 0xFFFE || ch == 0xFFFF ) throw
FSLib::Exceptions::UnicodeEncoding( L"UTF-32 character is disallowed
(0xFFFE/0xFFFF)" );
if ( ch 0x10FFFF ) throw FSLib::Exceptions::UnicodeEncoding(
L"UTF-32 character is beyond the allowable range." );
return ch;
} catch ( FSLib::Exceptions::UnicodeEncoding &e ) {
e.info() << L"Character value is: " << ch << std::endl;
throw;
}
}

utf32 FSLib::utf::decode( const utf16 *seq ) {
try {
utf32 ch = *seq;
if ( ch >= 0xD800 && ch <= 0xDBFF ) {
if ( seq[ 1 ] == 0 ) throw FSLib::Exceptions::UnicodeEncoding(
L"Trailing surrogate missing from UTF-16 sequence (it is ZERO)" );
if ( seq[ 1 ] < 0xDC00 || seq[ 1 ] 0xDFFF ) throw
FSLib::Exceptions::UnicodeEncoding( L"Trailing character in a UTF-16
surrogate pair is missing (outside correct range)" );
return assertValid( ( ch << 10 ) + seq[ 1 ] + 0x10000 - ( 0xD800 <<
10 ) - 0xDC00 );
}
return assertValid( ch );
} catch ( FSLib::Exceptions::Exception &e ) {
e.info() << L"Decoding UTF-16 number: " << toString( unsigned int(
seq[ 0 ] ) ) << std::endl;
e.info() << L"Preceeding UTF-16 number: " << toString( unsigned int(
seq[ -1 ] ) ) << std::endl;
e.info() << L"Following UTF-16 number: " << toString( unsigned int(
seq[ 1 ] ) ) << std::endl;
throw;
}
}

dayton
Guest
 
Posts: n/a
#7: Aug 1 '06

re: Converting from UTF-16 to UTF-32


Clark S. Cox III wrote:
Quote:
On 2006-07-31 10:06:35 -0400, Rolf Magnus <ramagnus@t-online.desaid:
>
Quote:
>Jimmy Shaw wrote:
>>
Quote:
>>Hi everybody,
>>>
>>Is there any SIMPLE way to convert from UTF-16 to UTF-32?
>>
>Not in standard C++.
>
That's certainly not true.
>
Concur. Most non-Windows platforms use a full integer for
wchar_t. Using locales and <codecvtyour iostreams probably
already provide the capability. Check your platform's wchar_t
to see if it already is in UTF32.

Dinkumware (http://www.dinkumware.com/) sells an extension
library that includes <codecvtconverters for UTF16.
P.J. Plauger
Guest
 
Posts: n/a
#8: Aug 1 '06

re: Converting from UTF-16 to UTF-32


"dayton" <mvglen04-cnews@yahoo.comwrote in message
news:3YKzg.196$%j7.70@newssvr29.news.prodigy.net.. .
Quote:
Clark S. Cox III wrote:
Quote:
>On 2006-07-31 10:06:35 -0400, Rolf Magnus <ramagnus@t-online.desaid:
>>
Quote:
>>Jimmy Shaw wrote:
>>>
>>>Hi everybody,
>>>>
>>>Is there any SIMPLE way to convert from UTF-16 to UTF-32?
>>>
>>Not in standard C++.
>>
>That's certainly not true.
>>
>
Concur. Most non-Windows platforms use a full integer for wchar_t. Using
locales and <codecvtyour iostreams probably already provide the
capability. Check your platform's wchar_t to see if it already is in
UTF32.
>
Dinkumware (http://www.dinkumware.com/) sells an extension library that
includes <codecvtconverters for UTF16.
Yep, except that they're now included as part of our standard
(Compleat) library product.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com


Closed Thread