473,748 Members | 2,294 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Universal String (4 Byte Canonical Encoding) and UTF-32

Hi All,

BMP Strings are a subset of Universal Strings.The BMP string uses
approximately 65,000 code points from Universal String encoding. BMP
Strings: ISO/IEC 10646, 2-octet canonical form, Universal String: ISO/
IEC 10646, 4-octet canonical form.

An excellent discussion occured with respect to BMP Strings and .Net
(see http://groups.google.com/group/micro...cb62156a1a0c/).
The discussion ended with the statement, "UTF-16 is a superset of
UCS2."

Can we use UTF-32 for UCS4 [Universal String, 4-octet canonical form]
in the same manner as was justified in the previously mentioned thread
(UTF-16/UCS2)?

Thanks,
Jeff
Jeffrey Walton

Nov 20 '07 #1
2 4825
Jeffrey Walton <no******@gmail .comwrote:
BMP Strings are a subset of Universal Strings.The BMP string uses
approximately 65,000 code points from Universal String encoding. BMP
Strings: ISO/IEC 10646, 2-octet canonical form, Universal String: ISO/
IEC 10646, 4-octet canonical form.

An excellent discussion occured with respect to BMP Strings and .Net
(see http://groups.google.com/group/micro...tnet.languages
.csharp/browse_thread/thread/f18fcb62156a1a0 c/).
The discussion ended with the statement, "UTF-16 is a superset of
UCS2."

Can we use UTF-32 for UCS4 [Universal String, 4-octet canonical form]
in the same manner as was justified in the previously mentioned thread
(UTF-16/UCS2)?
It's not quite clear to me how you want to use UTF-32. I have a
Utf32String class which is probably full of bugs (I've never really
used it) but you're welcome to it - it's part of the library at
http://pobox.com/~skeet/csharp/miscutil

You can use UTF-16 to cover the same range of values, however, using
surrogate pairs. The System.String class doesn't have a *lot* of
support for this though - it's not exactly easy to work with things
outside the BMP.

Are you doing a lot of work requiring non-BMP characters?

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Nov 20 '07 #2
An excellent discussion occured with respect to BMP Strings and .Net
(see
http://groups.google.com/group/micro...s.csharp/brows
e_thread/thread/f18fcb62156a1a0 c/).
The discussion ended with the statement, "UTF-16 is a superset of
UCS2."
This part did not change since July :-)

Can we use UTF-32 for UCS4 [Universal String, 4-octet canonical form]
in the same manner as was justified in the previously mentioned thread
(UTF-16/UCS2)?
You can consider UTF-32 to be the same thing as UCS4.
(while UTF-16 is a superset of UCS2).
There are no surrogates, nothing tricky in UTF-32

In general UCS is use by ISO/IEC 10646, while UTF is Unicode lingo.

My personal rule: when in doubt, I go to the official source:
http://www.unicode.org/versions/Unicode5.0.0/appC.pdf
"As a consequence, UCS-4 can now be taken effectively as an alias
for the Unicode encoding form UTF-32, except that UTF-32 has the
extra requirement that additional Unicode semantics be observed
for all characters."

And somewhere below (C.6)
"In the framework of the Unicode Standard, character semantics
are indicated via character properties, functional specifications,
usage annotations, and name aliases;"

In fact, the whole C.4-C.7 range is interesting for this topic.
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Nov 22 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

20
8805
by: Chris LaJoie | last post by:
I'm looking for some kind of simple string compression code I can use. I'm not looking for SharpZipLib. Their implimentation spans several classes and is very complex. I'm just looking for something simple. A single class, preferrably. Does such a thing exist? Thanks, Chris
1
1284
by: Oliver | last post by:
Hi - I am using the Encoding class to encode unicode strings. I am encoding into a variety of target encodings eg utf-8, 'iso-8859-1' or 'iso-8859-5' etc like this: encodedBytes = Encoding.GetEncoding("iso-8859-1").GetBytes(TextBox1.Text);
10
2683
by: Danny | last post by:
I am working on a project where I will receive xml documents from clients machines as a byte array. They will use the web browser navigate method to post the data to my ASP.NET page. I then pick up the byte array using the request object (XMLData=bytearray..). Can someone point me to an article that shows me how I can do this in ASP.NET, C#? Thanks Danny
5
5863
by: Trapulo | last post by:
Hello, I need to send to a webservice a parameter that is a string containing an XML doc. In this xml, a node value came from a byte array (it's an RSA signature). What is the best way to convert the original byte () value to the xml and viceversa? I've tried a lof of way, as encoding.utf8, encoding.unicode, bitconverter, etc, but the only working solution I found is Convert.XXbase64String: writer.WriteElementString("ContentSignature",
3
1833
by: Jammer | last post by:
Does anyone that knows python want to write me a byte dump for strings? :-) I am trying to modify a plugin (that someone else wrote) that uses interprocess communication. It works on strings without special characters but it fails on other stings like "Björk". It calls decode('utf8') but I guess the strings are not utf8 so I need to find out what is being input.
4
5377
by: Oleg Parashchenko | last post by:
Hello, I'm working on an unicode-aware application. I like to use "print" to debug programs, but in this case it was nightmare. The most popular result of "print" was: UnicodeDecodeError: 'ascii' codec can't decode byte 0xXX in position 0: ordinal not in range(128) I spent two hours fixing it, and I hope it's done. The solution is one
1
2019
by: anugrahpal | last post by:
compress the string like aabbcc to a2b2 c2
5
4851
by: da1978 | last post by:
Hi experts, I need to convert a string or a Byte array to a string byte array. Its relatively easy to convert a string to an char array or a byte array but not a STRING byte array. i.e. Dim Array() As Char Dim strwork As String = "76A3kj9d6" Array = strwork.ToCharArray OR
0
1368
by: mohamed Reda | last post by:
My problem is in encryption and decryption using RSA, in my project I need to encrypt some string using the public key and send the results via some socket to another client to decrypt it using the relative private key. My encryption function return byte and I encode it to string to be able to send it also I add some other data to this string and then in the other side I break it again and get the encrypted part off and then again turn it back...
0
8983
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8822
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9528
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9310
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6792
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4592
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4863
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2774
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2206
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.