473,402 Members | 2,046 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,402 software developers and data experts.

C# does not support Unicode characters

Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#?

I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF?

I appreciate any comments.
Thanks,
Johannes
Nov 15 '05 #1
7 2421
Unicode is defined as 16-bits (max of 0xFFFF).

Johannes wrote:

Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#?

I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF?

I appreciate any comments.
Thanks,
Johannes

Nov 15 '05 #2
And, yes, C# (natively) supports Unicode.

"The string type represents a string of Unicode characters. string is an alias
for System.String in the .NET Framework."

Julie wrote:

Unicode is defined as 16-bits (max of 0xFFFF).

Johannes wrote:

Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#?

I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF?

I appreciate any comments.
Thanks,
Johannes

Nov 15 '05 #3
Julie <ju***@aol.com> wrote:
Unicode is defined as 16-bits (max of 0xFFFF).


No, it's not. The Basic Multilingual Plane (plane 0) is 64K, but
Unicode is more than that. This is unfortunate as it means we need
surrogate characters etc to cope with systems designed around the 64K
limit.

See http://www.cl.cam.ac.uk/~mgk25/unicode.html for more information.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 15 '05 #4
> Is it correct that Unicode characters with code points above 0x10FFFF
are not supported by C#?


There are no Unicode characters above 0x10FFFF.

C# may have a problem with characters above 0xFFFF, since the internal
representation is UTF16. Characters between 0xFFFF and 0x10FFFF are
represented using surogates and some .NET API may be inacurate
(string length, iterations "by char", and others in the same class)

--
Mihai
-------------------------
Replace _year_ with _ to get the real email
Nov 15 '05 #5
You are correct sir. I wasn't aware of the change in in Unicode v3.

"Jon Skeet [C# MVP]" wrote:

Julie <ju***@aol.com> wrote:
Unicode is defined as 16-bits (max of 0xFFFF).


No, it's not. The Basic Multilingual Plane (plane 0) is 64K, but
Unicode is more than that. This is unfortunate as it means we need
surrogate characters etc to cope with systems designed around the 64K
limit.

See http://www.cl.cam.ac.uk/~mgk25/unicode.html for more information.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 15 '05 #6
In light of the Unicode v3 changes that I just became aware of, I retract all
that I've said on the subject in this thread.

Julie wrote:

And, yes, C# (natively) supports Unicode.

"The string type represents a string of Unicode characters. string is an alias
for System.String in the .NET Framework."

Julie wrote:

Unicode is defined as 16-bits (max of 0xFFFF).

Johannes wrote:

Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#?

I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF?

I appreciate any comments.
Thanks,
Johannes

Nov 15 '05 #7
Thanks for all your responses. It's all clear to me now

UTF-16 - the internal representation of Unicode in the .NET Framework - permits code points up to 10FFFF, which does cover all languages, including Asian languages

The misunderstanding was caused by a syntax error in my code. I was using [\u000000-\u10FFFF] to indicate a range in the character class of regular expression, which is simply the wrong notation (matches 0-FFFF). The correct notation uses upper-case U, as in [\U00000000-\U0010FFFF]. The C# Language Specification is very clear about this. (section Grammar, C1.5) Maybe I will read it after all..

Johanne

Nov 15 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Aditya Ivaturi | last post by:
We have a CMS which is written is based on php & mysql. Recently we received a request to support multiple languages so that sites in that particular laguage can be created. I did some search on...
7
by: Michael Davis | last post by:
Hi, I've known C/C++ for years, but only ever used ascii strings. I have a client who wants to know how gcc handles unicode. I've found the functions utf8_mbtowc, utf8_mbstowcs, utf8_wctomb and...
3
by: Kieran Green | last post by:
Greetings, We are building an application written for Windows in C++ which uses OLEDB to connect to AIX DB2 8.2. Our app stores all string data in the wchar_t datatype, which generates dynamic...
1
by: PvdK | last post by:
Hello, Although a lot of postings in the past discuss the subject of A2000 and Unicode support, I couldn't find what I was looking for. My institute is compiling a dictionary of early middle...
5
by: Johannes | last post by:
Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C# I have a hard time believing this since it would eliminate some Asian languages. If it is true, is...
2
by: Navanith | last post by:
Hi, I wrote a simple aspx page that dumps query string to the browser. Following is the globalization section in web.config <globalization requestEncoding="utf-8" responseEncoding="utf-8" />
0
by: tmWin | last post by:
Hi there I m on a project which needs unicode support. I use Access database, which support unicode. In text box, the Labels and the List Views, unicode works well. But in DataGrid In...
32
by: lovecreatesbea... | last post by:
In C++ Primer 4th, sec 3.3.2, it states that C++ programmers use != rather than < in a for loop. The following small snippet erases punctuations in a string. It works well with < used in the for...
2
by: Roberto | last post by:
When will PHP get Unicode support for PHP5? I realize that English is what everyone speaks when they want to do business, but I potentially could be not getting clients for my freelancing work...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.