An excellent discussion occured with respect to BMP Strings and .Net
(see
http://groups.google.com/group/micro...s.csharp/brows
e_thread/thread/f18fcb62156a1a0c/).
The discussion ended with the statement, "UTF-16 is a superset of
UCS2."
This part did not change since July :-)
Can we use UTF-32 for UCS4 [Universal String, 4-octet canonical form]
in the same manner as was justified in the previously mentioned thread
(UTF-16/UCS2)?
You can consider UTF-32 to be the same thing as UCS4.
(while UTF-16 is a superset of UCS2).
There are no surrogates, nothing tricky in UTF-32
In general UCS is use by ISO/IEC 10646, while UTF is Unicode lingo.
My personal rule: when in doubt, I go to the official source:
http://www.unicode.org/versions/Unicode5.0.0/appC.pdf
"As a consequence, UCS-4 can now be taken effectively as an alias
for the Unicode encoding form UTF-32, except that UTF-32 has the
extra requirement that additional Unicode semantics be observed
for all characters."
And somewhere below (C.6)
"In the framework of the Unicode Standard, character semantics
are indicated via character properties, functional specifications,
usage annotations, and name aliases;"
In fact, the whole C.4-C.7 range is interesting for this topic.
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email