473,385 Members | 2,269 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

C# does not support Unicode?

Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#

I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF

I appreciate any comments
Thanks
Johannes
Nov 15 '05 #1
5 8823
Check the doco on the char keyword and Char struct. The range is 0x0000 to
0xffff (16-bit number).

--
William Stacey, MVP

"Johannes" <an*******@discussions.microsoft.com> wrote in message
news:38**********************************@microsof t.com...
Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#?
I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages
support code points > 0x10FFFF?
I appreciate any comments.
Thanks,
Johannes


Nov 15 '05 #2
See http://www.yoda.arachsys.com/csharp/faq/#escapes for how to embed
special characters in a string.

Austin

On Mon, 1 Mar 2004 20:36:06 -0800, "Johannes"
<an*******@discussions.microsoft.com> wrote:
Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#?

I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF?

I appreciate any comments.
Thanks,
Johannes


Nov 15 '05 #3
I believe .Net Framework 1.0 and 1.1 is limited to max UTF-16, Unicode
version 2.0
However, next version looks like supporting up to UTF-32.

From "2.4.1 Unicode character escape sequences"

"A Unicode escape sequence represents the single Unicode character formed
by the hexadecimal number following the "\u" or "\U" characters. Since C#
uses a 16-bit encoding of Unicode code points in characters and string
values, a Unicode character in the range U+10000 to U+10FFFF is not
permitted in a character literal and is represented using a Unicode
surrogate pair in a string literal. Unicode characters with code points
above 0x10FFFF are not supported."

--
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Nov 15 '05 #4
Johannes <an*******@discussions.microsoft.com> wrote:
Is it correct that Unicode characters with code points above 0x10FFFF
are not supported by C#?
Which code points are those? You'll have a harder time supporting
characters over 0xffff in .NET as you need surrogate pairs, etc, but I
*thought* everything was within 0-0x10ffff still. (That does, after
all, give a pretty huge scope.) Has that situation changed?
I have a hard time believing this since it would eliminate some Asian
languages. If it is true, is there a workaround? Do other .NET
languages support code points > 0x10FFFF?


It's not really a language issue - .NET itself represents the character
type as a 16 bit entity, as to display Unicode characters outside plane
0 you need to use surrogates and check that whatever you're using to
display them (etc) supports surrogates properly. C# has the \U (as
opposed to \u) escaping for characters above 0xffff, within strings -
and those are then represented as a surrogate pair. That's the only
specific language support I know of in C# for characters outside plane
0, but I would imagine it's probably enough. Most of the work needs to
be done by .NET itself.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 15 '05 #5
Thanks for all your responses. It's all clear to me now

UTF-16 - the internal representation of unicode in the .NET Framework - permits code points up to 10FFFF, which does cover all languages, including Asian languages

The misunderstanding was caused by a syntax error in my code. I was using [\u000000-\u10FFFF] to indicate a range in the character class of regular expression, which is simply the wrong notation (matches 0-FFFF). The correct notation uses upper-case U, as in [\U00000000-\U0010FFFF]. The C# Language Specification is very clear about this. (section Grammar, C1.5) Maybe I will read it after all..

Johanne

Nov 15 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Aditya Ivaturi | last post by:
We have a CMS which is written is based on php & mysql. Recently we received a request to support multiple languages so that sites in that particular laguage can be created. I did some search on...
0
by: ayeh | last post by:
Hi Forum, Does OCCI support Unicode strings? According to the document, it supports STS strings; but there's no mentioning of wstrings or character conversion functions. Does anyone know how I...
4
by: Eric Tan | last post by:
Hi all, I would like to setup a database which can accept HKSCS - "Hong Kong Supplementary Character Set". Here is the reference: http://www.info.gov.hk/digital21/eng/hkscs/introduction.html ...
3
by: Kieran Green | last post by:
Greetings, We are building an application written for Windows in C++ which uses OLEDB to connect to AIX DB2 8.2. Our app stores all string data in the wchar_t datatype, which generates dynamic...
7
by: Johannes | last post by:
Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#? I have a hard time believing this since it would eliminate some Asian languages. If it is true, is...
8
by: Divick | last post by:
Hi all, can somebody tell how much std::wstring is supported across different compilers on different platforms? AFAIK std::string is supported by almost all C++ compilers and almost all platforms,...
8
by: sonald | last post by:
Hi, I am using python2.4.1 I need to pass russian text into python and validate the same. Can u plz guide me on how to make my existing code support the russian text. Is there any module...
32
by: lovecreatesbea... | last post by:
In C++ Primer 4th, sec 3.3.2, it states that C++ programmers use != rather than < in a for loop. The following small snippet erases punctuations in a string. It works well with < used in the for...
2
by: Samant.Trupti | last post by:
Hi, Does main function support unicode? int main( int argc, char** argv ) can I say int mainw( int argc, wchar_t** argv )? Thanks Trupti
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.