473,406 Members | 2,549 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

surrogate characters and chars

guy
if a string contains surrogate chars (i.e. Unicode characters that consiste
of more than 1 char) do functions that use an indexer or a string length into
the string e.g. Mid, Len work correctly?

guy
Dec 20 '05 #1
6 2602
All strings in .NET are Unicode. Indexing is per-character, not per byte.
I assume you are coming from a C/C++ background? :)))

"guy" <gu*@discussions.microsoft.com> wrote in message
news:4A**********************************@microsof t.com...
if a string contains surrogate chars (i.e. Unicode characters that
consiste
of more than 1 char) do functions that use an indexer or a string length
into
the string e.g. Mid, Len work correctly?

guy

Dec 20 '05 #2
From help:

The LenB function in earlier versions of Visual Basic returns the number
of bytes in a string rather than characters. It is used primarily for
converting strings in double-byte character set (DBCS) applications. All
Visual Basic .NET strings are in Unicode, and LenB is no longer supported.

So the short answer is Yes. Len, Mid etc. work fine. If you try in VB6
for example you would (normally) get Len("A") = 1 but LenB("A") = 2
thereby demonstrating that strings are handled as unicode.

guy wrote:
if a string contains surrogate chars (i.e. Unicode characters that consiste
of more than 1 char) do functions that use an indexer or a string length into
the string e.g. Mid, Len work correctly?

guy

Dec 20 '05 #3
Guy,

In addition to the others.
A string is a non mutable array of char.
A stringbuilder is a mutable array of char.

http://msdn2.microsoft.com/en-us/lib...stem.char.aspx

I hope this helps,

Cor

"guy" <gu*@discussions.microsoft.com> schreef in bericht
news:4A**********************************@microsof t.com...
if a string contains surrogate chars (i.e. Unicode characters that
consiste
of more than 1 char) do functions that use an indexer or a string length
into
the string e.g. Mid, Len work correctly?

guy

Dec 20 '05 #4
if a string contains surrogate chars (i.e. Unicode characters that consiste
of more than 1 char) do functions that use an indexer or a string length into
the string e.g. Mid, Len work correctly?


What is correct to you? They work the same way the String class works,
i.e. Len (just like String.Length) returns the number of 16-bit Chars
in the string. If you want to treat a surrogate pair as a single
element you should have a look at the StringInfo class in .NET 2.0.
Mattias

--
Mattias Sjögren [C# MVP] mattias @ mvps.org
http://www.msjogren.net/dotnet/ | http://www.dotnetinterop.com
Please reply only to the newsgroup.
Dec 20 '05 #5
guy
Sorry for not making myself clear, i am interested in the impact surrogate
char/char pairs have, these are used to extend unicode and in effect are
32bit chars.
what i need to know is if i ahve a string that consists of chinese text -
3 graphically symbols - which in the string are stored as char, surrogate
char and char, char the situation is that the i am using 4 .net chars to
represent 3 graphical characters.
so would Len(myString) return 3 or 4?
would Left(myString,2) give me the first 2 graphical characters (3 chars) or
a string consisting of 1 graphical char and a char which is the first half of
the second (surrogate) char?

btw I am Not a C person:-) i have been using basic since 1976!
"Mattias Sjögren" wrote:
if a string contains surrogate chars (i.e. Unicode characters that consiste
of more than 1 char) do functions that use an indexer or a string length into
the string e.g. Mid, Len work correctly?


What is correct to you? They work the same way the String class works,
i.e. Len (just like String.Length) returns the number of 16-bit Chars
in the string. If you want to treat a surrogate pair as a single
element you should have a look at the StringInfo class in .NET 2.0.
Mattias

--
Mattias Sjögren [C# MVP] mattias @ mvps.org
http://www.msjogren.net/dotnet/ | http://www.dotnetinterop.com
Please reply only to the newsgroup.

Dec 21 '05 #6
guy
Looks like the normal vb and string functions dont dfferentiate surrogate
pairs, i will need to use StringInfo.ParseCombiningCharacters
"guy" wrote:
Sorry for not making myself clear, i am interested in the impact surrogate
char/char pairs have, these are used to extend unicode and in effect are
32bit chars.
what i need to know is if i ahve a string that consists of chinese text -
3 graphically symbols - which in the string are stored as char, surrogate
char and char, char the situation is that the i am using 4 .net chars to
represent 3 graphical characters.
so would Len(myString) return 3 or 4?
would Left(myString,2) give me the first 2 graphical characters (3 chars) or
a string consisting of 1 graphical char and a char which is the first half of
the second (surrogate) char?

btw I am Not a C person:-) i have been using basic since 1976!
"Mattias Sjögren" wrote:
if a string contains surrogate chars (i.e. Unicode characters that consiste
of more than 1 char) do functions that use an indexer or a string length into
the string e.g. Mid, Len work correctly?


What is correct to you? They work the same way the String class works,
i.e. Len (just like String.Length) returns the number of 16-bit Chars
in the string. If you want to treat a surrogate pair as a single
element you should have a look at the StringInfo class in .NET 2.0.
Mattias

--
Mattias Sjögren [C# MVP] mattias @ mvps.org
http://www.msjogren.net/dotnet/ | http://www.dotnetinterop.com
Please reply only to the newsgroup.

Dec 21 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Chris Mullins | last post by:
I've got a big unicode character, and i'm trying to build it into a string. The unicode character is in the range "0x10400", so it's going to require a surrogate pair. I've been through all...
2
by: Chris Mullins | last post by:
I've spent a bit of time over the last year trying to implement RFC 3454 (Preparation of Internationalized Strings, aka 'StringPrep'). This RFC is also a dependency for RFC 3491...
3
by: TeekUS | last post by:
hey ppl, i am currently developing a parsing application my input is a 10MB english text file the parsing works fine however every now and then a non english character appears that messes...
3
by: Sakcee | last post by:
Hi In one of the data files that I have , I am seeing these characters \xed\xa0\xa0 . They seem to break the xsl. --------------------------------------------------------------- Extra...
2
by: metzger | last post by:
I am using the function listed below to handle characters events in SAX. It does not handle multiple sequential calls to this function correctly. For example, I am getting "2 4 816 32 64" as a...
6
by: Deep | last post by:
Suppose there is a character of one byte then max characters possible under this are 256. If characters are of two bytes then max chars should be 65536. Now if character can be of one byte or two...
0
by: Janusz Nykiel | last post by:
I've stumbled upon unexpected behavior of the .NET 2.0 System.Xml.XmlWriter class when using it to write data to a binary stream (System.IO.Stream). If the amount of data is less than a certain...
1
by: Alexander Higgins | last post by:
>>Thanks for the response.... Point Taken but this is not the case. Thus, if a person writes a text file on her or his computer and does not use UNICODE to save it, the current code page is...
0
by: =?Utf-8?B?Qy4gSHVnaGVz?= | last post by:
Hello, I have a .net 2.0 application using a 'Settings.settings' configuration file to store application settings. The settings are modified at runtime and stored when the user exits the...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.