468,768 Members | 1,712 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,768 developers. It's quick & easy.

Universal String (4 Byte Canonical Encoding) and UTF-32

Hi All,

BMP Strings are a subset of Universal Strings.The BMP string uses
approximately 65,000 code points from Universal String encoding. BMP
Strings: ISO/IEC 10646, 2-octet canonical form, Universal String: ISO/
IEC 10646, 4-octet canonical form.

An excellent discussion occured with respect to BMP Strings and .Net
(see http://groups.google.com/group/micro...cb62156a1a0c/).
The discussion ended with the statement, "UTF-16 is a superset of
UCS2."

Can we use UTF-32 for UCS4 [Universal String, 4-octet canonical form]
in the same manner as was justified in the previously mentioned thread
(UTF-16/UCS2)?

Thanks,
Jeff
Jeffrey Walton

Nov 20 '07 #1
2 4456
Jeffrey Walton <no******@gmail.comwrote:
BMP Strings are a subset of Universal Strings.The BMP string uses
approximately 65,000 code points from Universal String encoding. BMP
Strings: ISO/IEC 10646, 2-octet canonical form, Universal String: ISO/
IEC 10646, 4-octet canonical form.

An excellent discussion occured with respect to BMP Strings and .Net
(see http://groups.google.com/group/micro...tnet.languages
.csharp/browse_thread/thread/f18fcb62156a1a0c/).
The discussion ended with the statement, "UTF-16 is a superset of
UCS2."

Can we use UTF-32 for UCS4 [Universal String, 4-octet canonical form]
in the same manner as was justified in the previously mentioned thread
(UTF-16/UCS2)?
It's not quite clear to me how you want to use UTF-32. I have a
Utf32String class which is probably full of bugs (I've never really
used it) but you're welcome to it - it's part of the library at
http://pobox.com/~skeet/csharp/miscutil

You can use UTF-16 to cover the same range of values, however, using
surrogate pairs. The System.String class doesn't have a *lot* of
support for this though - it's not exactly easy to work with things
outside the BMP.

Are you doing a lot of work requiring non-BMP characters?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Nov 20 '07 #2
An excellent discussion occured with respect to BMP Strings and .Net
(see
http://groups.google.com/group/micro...s.csharp/brows
e_thread/thread/f18fcb62156a1a0c/).
The discussion ended with the statement, "UTF-16 is a superset of
UCS2."
This part did not change since July :-)

Can we use UTF-32 for UCS4 [Universal String, 4-octet canonical form]
in the same manner as was justified in the previously mentioned thread
(UTF-16/UCS2)?
You can consider UTF-32 to be the same thing as UCS4.
(while UTF-16 is a superset of UCS2).
There are no surrogates, nothing tricky in UTF-32

In general UCS is use by ISO/IEC 10646, while UTF is Unicode lingo.

My personal rule: when in doubt, I go to the official source:
http://www.unicode.org/versions/Unicode5.0.0/appC.pdf
"As a consequence, UCS-4 can now be taken effectively as an alias
for the Unicode encoding form UTF-32, except that UTF-32 has the
extra requirement that additional Unicode semantics be observed
for all characters."

And somewhere below (C.6)
"In the framework of the Unicode Standard, character semantics
are indicated via character properties, functional specifications,
usage annotations, and name aliases;"

In fact, the whole C.4-C.7 range is interesting for this topic.
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Nov 22 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

20 posts views Thread by Chris LaJoie | last post: by
10 posts views Thread by Danny | last post: by
5 posts views Thread by Trapulo | last post: by
3 posts views Thread by Jammer | last post: by
reply views Thread by mohamed Reda | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by Marin | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.