Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#
I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF
I appreciate any comments
Thanks
Johannes 5 8847
Check the doco on the char keyword and Char struct. The range is 0x0000 to
0xffff (16-bit number).
--
William Stacey, MVP
"Johannes" <an*******@disc ussions.microso ft.com> wrote in message
news:38******** *************** ***********@mic rosoft.com... Is it correct that Unicode characters with code points above 0x10FFFF are
not supported by C#? I have a hard time believing this since it would eliminate some Asian
languages. If it is true, is there a workaround? Do other .NET languages
support code points > 0x10FFFF? I appreciate any comments. Thanks, Johannes
See http://www.yoda.arachsys.com/csharp/faq/#escapes for how to embed
special characters in a string.
Austin
On Mon, 1 Mar 2004 20:36:06 -0800, "Johannes"
<an*******@disc ussions.microso ft.com> wrote: Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#?
I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF?
I appreciate any comments. Thanks, Johannes
I believe .Net Framework 1.0 and 1.1 is limited to max UTF-16, Unicode
version 2.0
However, next version looks like supporting up to UTF-32.
From "2.4.1 Unicode character escape sequences"
"A Unicode escape sequence represents the single Unicode character formed
by the hexadecimal number following the "\u" or "\U" characters. Since C#
uses a 16-bit encoding of Unicode code points in characters and string
values, a Unicode character in the range U+10000 to U+10FFFF is not
permitted in a character literal and is represented using a Unicode
surrogate pair in a string literal. Unicode characters with code points
above 0x10FFFF are not supported."
--
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Johannes <an*******@disc ussions.microso ft.com> wrote: Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#?
Which code points are those? You'll have a harder time supporting
characters over 0xffff in .NET as you need surrogate pairs, etc, but I
*thought* everything was within 0-0x10ffff still. (That does, after
all, give a pretty huge scope.) Has that situation changed?
I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF?
It's not really a language issue - .NET itself represents the character
type as a 16 bit entity, as to display Unicode characters outside plane
0 you need to use surrogates and check that whatever you're using to
display them (etc) supports surrogates properly. C# has the \U (as
opposed to \u) escaping for characters above 0xffff, within strings -
and those are then represented as a surrogate pair. That's the only
specific language support I know of in C# for characters outside plane
0, but I would imagine it's probably enough. Most of the work needs to
be done by .NET itself.
--
Jon Skeet - <sk***@pobox.co m> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Thanks for all your responses. It's all clear to me now
UTF-16 - the internal representation of unicode in the .NET Framework - permits code points up to 10FFFF, which does cover all languages, including Asian languages
The misunderstandin g was caused by a syntax error in my code. I was using [\u000000-\u10FFFF] to indicate a range in the character class of regular expression, which is simply the wrong notation (matches 0-FFFF). The correct notation uses upper-case U, as in [\U00000000-\U0010FFFF]. The C# Language Specification is very clear about this. (section Grammar, C1.5) Maybe I will read it after all..
Johanne This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Aditya Ivaturi |
last post by:
We have a CMS which is written is based on php & mysql. Recently we received
a request to support multiple languages so that sites in that particular
laguage can be created. I did some search on the google and it seems I have
to build in multibyte support for php and mysql. Mbstring
(http://us3.php.net/mbstring) claims to support multiple languages with a
caution saying it might not work properly.
After further research it seems unicode...
|
by: ayeh |
last post by:
Hi Forum,
Does OCCI support Unicode strings? According to the document, it
supports STS strings; but there's no mentioning of wstrings or
character conversion functions. Does anyone know how I can get, say,
createStatement() to take in a Unicode string?
Thanks in advance.
|
by: Eric Tan |
last post by:
Hi all,
I would like to setup a database which can accept HKSCS - "Hong Kong Supplementary Character Set". Here is the reference:
http://www.info.gov.hk/digital21/eng/hkscs/introduction.html
Can somebody tell me how to do it? I've try database encoding with EUC_TW, SQL_ASCII and UNICODE, all failed.
These are my configure: RH9 + Postgresql 7.3.4 + Tomcat 4.1.24 + JSP. Do I need to update RH9 also?
|
by: Kieran Green |
last post by:
Greetings,
We are building an application written for Windows in C++ which uses
OLEDB to connect to AIX DB2 8.2. Our app stores all string data in
the wchar_t datatype, which generates dynamic SQL, typically with
bound parameters DBTYPE_WSTR, and so is a Unicode app.
We don't know whether to use the vargraphic datatype for storing
strings, or varchar, and which database character sets to support.
|
by: Johannes |
last post by:
Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C#?
I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF?
I appreciate any comments.
Thanks,
Johannes
| |
by: Divick |
last post by:
Hi all,
can somebody tell how much std::wstring is supported across
different compilers on different platforms? AFAIK std::string is
supported by almost all C++ compilers and almost all platforms, is that
also the case with wstring?
Another related question that I have is, is it advisable to use
wstring than string for unicode support? To be able to support Unicode
build, is it that all the occurrence of std::string will need to be...
|
by: sonald |
last post by:
Hi,
I am using python2.4.1
I need to pass russian text into python and validate the same.
Can u plz guide me on how to make my existing code support the
russian text.
Is there any module that can be used for unicode support in python?
Incase of decimal numbers, how to handle "comma as a decimal point"
|
by: lovecreatesbea... |
last post by:
In C++ Primer 4th, sec 3.3.2, it states that C++ programmers use !=
rather than < in a for loop.
The following small snippet erases punctuations in a string. It works
well with < used in the for loop but it breaks when != is used instead.
#include <string>
#include <iostream>
#include <exception>
#include <cctype>
|
by: Samant.Trupti |
last post by:
Hi,
Does main function support unicode?
int main( int argc, char** argv ) can I say int mainw( int argc,
wchar_t** argv )?
Thanks
Trupti
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |