By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,098 Members | 1,881 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,098 IT Pros & Developers. It's quick & easy.

UTF32 CodePoints, UTF8 Combining Chars / Surrogate Pairs, and .NET

P: n/a
I've spent a bit of time over the last year trying to implement RFC 3454
(Preparation of Internationalized Strings, aka 'StringPrep').

This RFC is also a dependency for RFC 3491 (Internationalized Domain Names /
IDNA) which is something that I also need to support.

The problem that I've been struggling with in .NET is that of Unicode Code
Points > 0xFFFF. These points are encoded into UTF8 using the Surrogate Pair
encoding scheme that the Unicode Spec defined in section 3.7 of the Unicode
Spec (http://www.unicode.org/book/ch03.pdf).

Related to Surrogate Pairs, are the whole set of Unicode Combining
characters.

The problem, then, is this:

When I iterate over a string using the .NET StringInfo class I get a set of
graphemes. These graphemes correctly handle the combining characters and
surrogate pairs, and end up giving me a single UTF-32 Code Point for each
grapheme.

BUT, let's say the original string had U:0x10FF1 encoded as a UTF8 surrogate
pair. This character is illegal in a particular stringprep profile.

The original string also had a combining character sequence U:301 + U:302
(for example) and the grapheme that the StringInfo class reports for this is
also U:0x10FF1.

The problem is that each of the combining characters IS legal in the
stringprep profile, but I have no way of telling if the original data was
the (illegal) UTF-32 code point, or the (legal) combining characters.

Has anyone implemented any of this stuff in .NET ?

--

Chris Mullins
Jul 21 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Hello Chris,

I noticed that you posted this question in
microsoft.public.dotnet.framework also. I have replied you and will follow
up on it there. If you have free time, please check that thread for this
question.

Thanks very much.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Jul 21 '05 #2

P: n/a
Hello Chris,

I noticed that you posted this question in
microsoft.public.dotnet.framework also. I have replied you and will follow
up on it there. If you have free time, please check that thread for this
question.

Thanks very much.

Best regards,
Yanhong Huang
Microsoft Community Support

Get Secure! C www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Jul 21 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.