472,958 Members | 2,322 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,958 software developers and data experts.

Universal String (4 Byte Canonical Encoding) and UTF-32

Hi All,

BMP Strings are a subset of Universal Strings.The BMP string uses
approximately 65,000 code points from Universal String encoding. BMP
Strings: ISO/IEC 10646, 2-octet canonical form, Universal String: ISO/
IEC 10646, 4-octet canonical form.

An excellent discussion occured with respect to BMP Strings and .Net
(see http://groups.google.com/group/micro...cb62156a1a0c/).
The discussion ended with the statement, "UTF-16 is a superset of
UCS2."

Can we use UTF-32 for UCS4 [Universal String, 4-octet canonical form]
in the same manner as was justified in the previously mentioned thread
(UTF-16/UCS2)?

Thanks,
Jeff
Jeffrey Walton

Nov 20 '07 #1
2 4759
Jeffrey Walton <no******@gmail.comwrote:
BMP Strings are a subset of Universal Strings.The BMP string uses
approximately 65,000 code points from Universal String encoding. BMP
Strings: ISO/IEC 10646, 2-octet canonical form, Universal String: ISO/
IEC 10646, 4-octet canonical form.

An excellent discussion occured with respect to BMP Strings and .Net
(see http://groups.google.com/group/micro...tnet.languages
.csharp/browse_thread/thread/f18fcb62156a1a0c/).
The discussion ended with the statement, "UTF-16 is a superset of
UCS2."

Can we use UTF-32 for UCS4 [Universal String, 4-octet canonical form]
in the same manner as was justified in the previously mentioned thread
(UTF-16/UCS2)?
It's not quite clear to me how you want to use UTF-32. I have a
Utf32String class which is probably full of bugs (I've never really
used it) but you're welcome to it - it's part of the library at
http://pobox.com/~skeet/csharp/miscutil

You can use UTF-16 to cover the same range of values, however, using
surrogate pairs. The System.String class doesn't have a *lot* of
support for this though - it's not exactly easy to work with things
outside the BMP.

Are you doing a lot of work requiring non-BMP characters?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Nov 20 '07 #2
An excellent discussion occured with respect to BMP Strings and .Net
(see
http://groups.google.com/group/micro...s.csharp/brows
e_thread/thread/f18fcb62156a1a0c/).
The discussion ended with the statement, "UTF-16 is a superset of
UCS2."
This part did not change since July :-)

Can we use UTF-32 for UCS4 [Universal String, 4-octet canonical form]
in the same manner as was justified in the previously mentioned thread
(UTF-16/UCS2)?
You can consider UTF-32 to be the same thing as UCS4.
(while UTF-16 is a superset of UCS2).
There are no surrogates, nothing tricky in UTF-32

In general UCS is use by ISO/IEC 10646, while UTF is Unicode lingo.

My personal rule: when in doubt, I go to the official source:
http://www.unicode.org/versions/Unicode5.0.0/appC.pdf
"As a consequence, UCS-4 can now be taken effectively as an alias
for the Unicode encoding form UTF-32, except that UTF-32 has the
extra requirement that additional Unicode semantics be observed
for all characters."

And somewhere below (C.6)
"In the framework of the Unicode Standard, character semantics
are indicated via character properties, functional specifications,
usage annotations, and name aliases;"

In fact, the whole C.4-C.7 range is interesting for this topic.
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Nov 22 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

20
by: Chris LaJoie | last post by:
I'm looking for some kind of simple string compression code I can use. I'm not looking for SharpZipLib. Their implimentation spans several classes and is very complex. I'm just looking for...
1
by: Oliver | last post by:
Hi - I am using the Encoding class to encode unicode strings. I am encoding into a variety of target encodings eg utf-8, 'iso-8859-1' or 'iso-8859-5' etc like this: encodedBytes =...
10
by: Danny | last post by:
I am working on a project where I will receive xml documents from clients machines as a byte array. They will use the web browser navigate method to post the data to my ASP.NET page. I then pick up...
5
by: Trapulo | last post by:
Hello, I need to send to a webservice a parameter that is a string containing an XML doc. In this xml, a node value came from a byte array (it's an RSA signature). What is the best way to convert...
3
by: Jammer | last post by:
Does anyone that knows python want to write me a byte dump for strings? :-) I am trying to modify a plugin (that someone else wrote) that uses interprocess communication. It works on strings...
4
by: Oleg Parashchenko | last post by:
Hello, I'm working on an unicode-aware application. I like to use "print" to debug programs, but in this case it was nightmare. The most popular result of "print" was: UnicodeDecodeError:...
1
by: anugrahpal | last post by:
compress the string like aabbcc to a2b2 c2
5
by: da1978 | last post by:
Hi experts, I need to convert a string or a Byte array to a string byte array. Its relatively easy to convert a string to an char array or a byte array but not a STRING byte array. i.e. ...
0
by: mohamed Reda | last post by:
My problem is in encryption and decryption using RSA, in my project I need to encrypt some string using the public key and send the results via some socket to another client to decrypt it using the...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.