473,779 Members | 2,078 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

C# does not support Unicode?

Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#

I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF

I appreciate any comments
Thanks
Johannes
Nov 15 '05 #1
5 8847
Check the doco on the char keyword and Char struct. The range is 0x0000 to
0xffff (16-bit number).

--
William Stacey, MVP

"Johannes" <an*******@disc ussions.microso ft.com> wrote in message
news:38******** *************** ***********@mic rosoft.com...
Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#?
I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages
support code points > 0x10FFFF?
I appreciate any comments.
Thanks,
Johannes


Nov 15 '05 #2
See http://www.yoda.arachsys.com/csharp/faq/#escapes for how to embed
special characters in a string.

Austin

On Mon, 1 Mar 2004 20:36:06 -0800, "Johannes"
<an*******@disc ussions.microso ft.com> wrote:
Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#?

I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF?

I appreciate any comments.
Thanks,
Johannes


Nov 15 '05 #3
I believe .Net Framework 1.0 and 1.1 is limited to max UTF-16, Unicode
version 2.0
However, next version looks like supporting up to UTF-32.

From "2.4.1 Unicode character escape sequences"

"A Unicode escape sequence represents the single Unicode character formed
by the hexadecimal number following the "\u" or "\U" characters. Since C#
uses a 16-bit encoding of Unicode code points in characters and string
values, a Unicode character in the range U+10000 to U+10FFFF is not
permitted in a character literal and is represented using a Unicode
surrogate pair in a string literal. Unicode characters with code points
above 0x10FFFF are not supported."

--
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Nov 15 '05 #4
Johannes <an*******@disc ussions.microso ft.com> wrote:
Is it correct that Unicode characters with code points above 0x10FFFF
are not supported by C#?
Which code points are those? You'll have a harder time supporting
characters over 0xffff in .NET as you need surrogate pairs, etc, but I
*thought* everything was within 0-0x10ffff still. (That does, after
all, give a pretty huge scope.) Has that situation changed?
I have a hard time believing this since it would eliminate some Asian
languages. If it is true, is there a workaround? Do other .NET
languages support code points > 0x10FFFF?


It's not really a language issue - .NET itself represents the character
type as a 16 bit entity, as to display Unicode characters outside plane
0 you need to use surrogates and check that whatever you're using to
display them (etc) supports surrogates properly. C# has the \U (as
opposed to \u) escaping for characters above 0xffff, within strings -
and those are then represented as a surrogate pair. That's the only
specific language support I know of in C# for characters outside plane
0, but I would imagine it's probably enough. Most of the work needs to
be done by .NET itself.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 15 '05 #5
Thanks for all your responses. It's all clear to me now

UTF-16 - the internal representation of unicode in the .NET Framework - permits code points up to 10FFFF, which does cover all languages, including Asian languages

The misunderstandin g was caused by a syntax error in my code. I was using [\u000000-\u10FFFF] to indicate a range in the character class of regular expression, which is simply the wrong notation (matches 0-FFFF). The correct notation uses upper-case U, as in [\U00000000-\U0010FFFF]. The C# Language Specification is very clear about this. (section Grammar, C1.5) Maybe I will read it after all..

Johanne

Nov 15 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
9540
by: Aditya Ivaturi | last post by:
We have a CMS which is written is based on php & mysql. Recently we received a request to support multiple languages so that sites in that particular laguage can be created. I did some search on the google and it seems I have to build in multibyte support for php and mysql. Mbstring (http://us3.php.net/mbstring) claims to support multiple languages with a caution saying it might not work properly. After further research it seems unicode...
0
1830
by: ayeh | last post by:
Hi Forum, Does OCCI support Unicode strings? According to the document, it supports STS strings; but there's no mentioning of wstrings or character conversion functions. Does anyone know how I can get, say, createStatement() to take in a Unicode string? Thanks in advance.
4
1798
by: Eric Tan | last post by:
Hi all, I would like to setup a database which can accept HKSCS - "Hong Kong Supplementary Character Set". Here is the reference: http://www.info.gov.hk/digital21/eng/hkscs/introduction.html Can somebody tell me how to do it? I've try database encoding with EUC_TW, SQL_ASCII and UNICODE, all failed. These are my configure: RH9 + Postgresql 7.3.4 + Tomcat 4.1.24 + JSP. Do I need to update RH9 also?
3
7394
by: Kieran Green | last post by:
Greetings, We are building an application written for Windows in C++ which uses OLEDB to connect to AIX DB2 8.2. Our app stores all string data in the wchar_t datatype, which generates dynamic SQL, typically with bound parameters DBTYPE_WSTR, and so is a Unicode app. We don't know whether to use the vargraphic datatype for storing strings, or varchar, and which database character sets to support.
7
2436
by: Johannes | last post by:
Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#? I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF? I appreciate any comments. Thanks, Johannes
8
16483
by: Divick | last post by:
Hi all, can somebody tell how much std::wstring is supported across different compilers on different platforms? AFAIK std::string is supported by almost all C++ compilers and almost all platforms, is that also the case with wstring? Another related question that I have is, is it advisable to use wstring than string for unicode support? To be able to support Unicode build, is it that all the occurrence of std::string will need to be...
8
2262
by: sonald | last post by:
Hi, I am using python2.4.1 I need to pass russian text into python and validate the same. Can u plz guide me on how to make my existing code support the russian text. Is there any module that can be used for unicode support in python? Incase of decimal numbers, how to handle "comma as a decimal point"
32
2123
by: lovecreatesbea... | last post by:
In C++ Primer 4th, sec 3.3.2, it states that C++ programmers use != rather than < in a for loop. The following small snippet erases punctuations in a string. It works well with < used in the for loop but it breaks when != is used instead. #include <string> #include <iostream> #include <exception> #include <cctype>
2
2763
by: Samant.Trupti | last post by:
Hi, Does main function support unicode? int main( int argc, char** argv ) can I say int mainw( int argc, wchar_t** argv )? Thanks Trupti
0
9636
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10306
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10139
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10075
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9931
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7485
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6727
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4037
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3632
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.