I ran into a situation at work regarding unicode character encodings
and .NET cultures that left me a tad bit confused.
I was trying to instantiate a CultureInfo object from a locale
identifying a south chinese destination called the Hmong. Its usually
represented as hm-HMN. I peruse the NativeName property to extract
the name of the culture and display it on my application. When I do:
CultureInfo ci = new CultureInfo("hm-HMN");
ci.NativeName prints something that looks like this:
H'mong
However between the letter 'o' and 'n' I see what the unicode
consortium calls the replacement character (
http://en.wikipedia.org/
wiki/Replacement_cha racter), which is basically a diamond with a
question mark inside it. Reading through that section on Replacement
Character in the wikipedia link it appears that the character appears
whenever the application is not able to decode the original byte
stream correctly and when it can't it replaces it with 0xfffd.
What I would like to know is what exactly is causing this problem?
1) Does the native windows API or whatever is called when I
instantiate a new CultureInfo (I haven't had a chance to reflector
into it yet) object encodes that character differently but .NET is not
able to display it because it is trying to decode it using UTF-16
rules?
2) Or is it because the character cannot be displayed because the
default code page is set at 1252?
Can anyone offer some insights on how to get it to display the
characters correctly and also clue me in on the differences between
encodings and code pages?
thanks!