473,722 Members | 2,161 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

surrogate characters and chars

guy
if a string contains surrogate chars (i.e. Unicode characters that consiste
of more than 1 char) do functions that use an indexer or a string length into
the string e.g. Mid, Len work correctly?

guy
Dec 20 '05 #1
6 2621
All strings in .NET are Unicode. Indexing is per-character, not per byte.
I assume you are coming from a C/C++ background? :)))

"guy" <gu*@discussion s.microsoft.com > wrote in message
news:4A******** *************** ***********@mic rosoft.com...
if a string contains surrogate chars (i.e. Unicode characters that
consiste
of more than 1 char) do functions that use an indexer or a string length
into
the string e.g. Mid, Len work correctly?

guy

Dec 20 '05 #2
From help:

The LenB function in earlier versions of Visual Basic returns the number
of bytes in a string rather than characters. It is used primarily for
converting strings in double-byte character set (DBCS) applications. All
Visual Basic .NET strings are in Unicode, and LenB is no longer supported.

So the short answer is Yes. Len, Mid etc. work fine. If you try in VB6
for example you would (normally) get Len("A") = 1 but LenB("A") = 2
thereby demonstrating that strings are handled as unicode.

guy wrote:
if a string contains surrogate chars (i.e. Unicode characters that consiste
of more than 1 char) do functions that use an indexer or a string length into
the string e.g. Mid, Len work correctly?

guy

Dec 20 '05 #3
Guy,

In addition to the others.
A string is a non mutable array of char.
A stringbuilder is a mutable array of char.

http://msdn2.microsoft.com/en-us/lib...stem.char.aspx

I hope this helps,

Cor

"guy" <gu*@discussion s.microsoft.com > schreef in bericht
news:4A******** *************** ***********@mic rosoft.com...
if a string contains surrogate chars (i.e. Unicode characters that
consiste
of more than 1 char) do functions that use an indexer or a string length
into
the string e.g. Mid, Len work correctly?

guy

Dec 20 '05 #4
if a string contains surrogate chars (i.e. Unicode characters that consiste
of more than 1 char) do functions that use an indexer or a string length into
the string e.g. Mid, Len work correctly?


What is correct to you? They work the same way the String class works,
i.e. Len (just like String.Length) returns the number of 16-bit Chars
in the string. If you want to treat a surrogate pair as a single
element you should have a look at the StringInfo class in .NET 2.0.
Mattias

--
Mattias Sjögren [C# MVP] mattias @ mvps.org
http://www.msjogren.net/dotnet/ | http://www.dotnetinterop.com
Please reply only to the newsgroup.
Dec 20 '05 #5
guy
Sorry for not making myself clear, i am interested in the impact surrogate
char/char pairs have, these are used to extend unicode and in effect are
32bit chars.
what i need to know is if i ahve a string that consists of chinese text -
3 graphically symbols - which in the string are stored as char, surrogate
char and char, char the situation is that the i am using 4 .net chars to
represent 3 graphical characters.
so would Len(myString) return 3 or 4?
would Left(myString,2 ) give me the first 2 graphical characters (3 chars) or
a string consisting of 1 graphical char and a char which is the first half of
the second (surrogate) char?

btw I am Not a C person:-) i have been using basic since 1976!
"Mattias Sjögren" wrote:
if a string contains surrogate chars (i.e. Unicode characters that consiste
of more than 1 char) do functions that use an indexer or a string length into
the string e.g. Mid, Len work correctly?


What is correct to you? They work the same way the String class works,
i.e. Len (just like String.Length) returns the number of 16-bit Chars
in the string. If you want to treat a surrogate pair as a single
element you should have a look at the StringInfo class in .NET 2.0.
Mattias

--
Mattias Sjögren [C# MVP] mattias @ mvps.org
http://www.msjogren.net/dotnet/ | http://www.dotnetinterop.com
Please reply only to the newsgroup.

Dec 21 '05 #6
guy
Looks like the normal vb and string functions dont dfferentiate surrogate
pairs, i will need to use StringInfo.Pars eCombiningChara cters
"guy" wrote:
Sorry for not making myself clear, i am interested in the impact surrogate
char/char pairs have, these are used to extend unicode and in effect are
32bit chars.
what i need to know is if i ahve a string that consists of chinese text -
3 graphically symbols - which in the string are stored as char, surrogate
char and char, char the situation is that the i am using 4 .net chars to
represent 3 graphical characters.
so would Len(myString) return 3 or 4?
would Left(myString,2 ) give me the first 2 graphical characters (3 chars) or
a string consisting of 1 graphical char and a char which is the first half of
the second (surrogate) char?

btw I am Not a C person:-) i have been using basic since 1976!
"Mattias Sjögren" wrote:
if a string contains surrogate chars (i.e. Unicode characters that consiste
of more than 1 char) do functions that use an indexer or a string length into
the string e.g. Mid, Len work correctly?


What is correct to you? They work the same way the String class works,
i.e. Len (just like String.Length) returns the number of 16-bit Chars
in the string. If you want to treat a surrogate pair as a single
element you should have a look at the StringInfo class in .NET 2.0.
Mattias

--
Mattias Sjögren [C# MVP] mattias @ mvps.org
http://www.msjogren.net/dotnet/ | http://www.dotnetinterop.com
Please reply only to the newsgroup.

Dec 21 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
5274
by: Chris Mullins | last post by:
I've got a big unicode character, and i'm trying to build it into a string. The unicode character is in the range "0x10400", so it's going to require a surrogate pair. I've been through all the logic to iterate over strings that already have these pairs in them, but how do I encode this Unicode Character INTO the string? The string is UTF-8 encoded, but none of the things I've trided using the encoders seems to work right...
2
3274
by: Chris Mullins | last post by:
I've spent a bit of time over the last year trying to implement RFC 3454 (Preparation of Internationalized Strings, aka 'StringPrep'). This RFC is also a dependency for RFC 3491 (Internationalized Domain Names / IDNA) which is something that I also need to support. The problem that I've been struggling with in .NET is that of Unicode Code Points > 0xFFFF. These points are encoded into UTF8 using the Surrogate Pair encoding scheme that...
3
4324
by: TeekUS | last post by:
hey ppl, i am currently developing a parsing application my input is a 10MB english text file the parsing works fine however every now and then a non english character appears that messes everything up. i need to get rid of all these characters before i parse.HELP! Teekus (P.S. i used Regex.Replace but that did not take out the non english characters!) *** Sent via Developersdex http://www.developersdex.com ***
3
2007
by: Sakcee | last post by:
Hi In one of the data files that I have , I am seeing these characters \xed\xa0\xa0 . They seem to break the xsl. --------------------------------------------------------------- Extra content at the end of the document XML/XSL Error: </data><data ><![CDATA[ í Pls advice ----------------------------------------------------------------
2
1521
by: metzger | last post by:
I am using the function listed below to handle characters events in SAX. It does not handle multiple sequential calls to this function correctly. For example, I am getting "2 4 816 32 64" as a value for an element when processing <vec2 4 8 16 32 64 </vec> because I am getting 2 calls to process the text in this element, one for "2 4 8" and the other for "16 32 64". I have tried appending a blank to the result after each call to this...
6
2165
by: Deep | last post by:
Suppose there is a character of one byte then max characters possible under this are 256. If characters are of two bytes then max chars should be 65536. Now if character can be of one byte or two byte then how many characters are possible and why? Also write a program to read and write characters in such type of encoding. Would you people help me to do this?
0
3626
by: Janusz Nykiel | last post by:
I've stumbled upon unexpected behavior of the .NET 2.0 System.Xml.XmlWriter class when using it to write data to a binary stream (System.IO.Stream). If the amount of data is less than a certain value (which varies depending on the data being written), characters not available in the encoding specified in the Encoding property of the XmlWritterSettings instance used to create the XmlWriter are being written to the resulting XML document as...
1
2755
by: Alexander Higgins | last post by:
>>Thanks for the response.... Point Taken but this is not the case. Thus, if a person writes a text file on her or his computer and does not use UNICODE to save it, the current code page is used. If this file is given to someone with some other current codepage, the file is not displayed correctly. Simply converting the file to Unicode will make the data display properly. When performing the encoding process the encoding will escape...
0
2087
by: =?Utf-8?B?Qy4gSHVnaGVz?= | last post by:
Hello, I have a .net 2.0 application using a 'Settings.settings' configuration file to store application settings. The settings are modified at runtime and stored when the user exits the application. Occasionaly, this exception will occur when calling the Settings.Default.Save() method: Invalid high surrogate character (0xDE1C). A high surrogate character must
0
8863
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8739
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9238
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9157
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6681
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4502
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4762
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2602
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2147
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.