By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,257 Members | 1,322 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,257 IT Pros & Developers. It's quick & easy.

asp.net chinese encoding

P: n/a
Hello all,

I am having a few issues with encoding to chinese characters and
perhaps someone might be able to assist.

At the moment I am only able to see chinese characters when displayed
as part of a datagrid. When an input textbox is displayed it does not
display chinese characters, but rather the unicode characters stored
in the mssql 2000 server backend.

To get this setting to work I have the following web.config file
setting:
<globalization requestEncoding="utf-8" responseEncoding="utf-8"
fileEncoding="utf-8" culture="zh-CN"/>
and on page load I programatically set the following:
Response.Charset = "gb2312";
these are the settings I use to get the above results.

I have tried setting the web.config to:
<globalization requestEncoding="gb2312" responseEncoding="gb2312"
fileEncoding="gb2312" culture="zh-CN"/>
but this provides little help.

Additionally I have used some of the suggestions in the following post
http://www.asp.net/Forums/ShowPost.a...&PostID=518209
by overriding the TextBox Text property where in set I do the
following:
set
{
base.Text = ToSCUnicode(value);
}
which can display correct chinese characters on the initial page load,
however, when this page performs a post-back these chinese characters
are converted back to I assume unicode (could be wrong.)

I am hoping someone may have some experience/suggestions to lead me on
to the right track to solve this problem, any feedback would be
appreciated.

Thanks
Nov 16 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
pabv <pa***********@gmail.com> wrote:
I am having a few issues with encoding to chinese characters and
perhaps someone might be able to assist.

At the moment I am only able to see chinese characters when displayed
as part of a datagrid. When an input textbox is displayed it does not
display chinese characters, but rather the unicode characters stored
in the mssql 2000 server backend.


What *exactly* do you mean by this? All the Chinese characters *are*
Unicode characters.

I suspect you may find that this is a limitation in the browser's
handling of textboxes more than anything else. Have you tried it in
multiple browsers? Have you looked to see exactly what's being sent
back (as opposed to what the browser's displaying).

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #2

P: n/a
Jon,

I am using Internet Explorer.

What I am meant is that when the data is displayed on a data grid, the
unicode characters retrieved from the mssql backend are converted via
the codepage to chinese symbols.

In this example the html page source text has the following
中国石油乌鲁木齐石化分公?新 ...etc (Note: this actually came via
access to the databinder.eval function, what I mean is not set via any
html server control as below) This correctly displays chinese symbols
on the browser.

However, when an textbox control is displayed the html page source
text has the following ÖйúÊ ...etc. On the
browser this displays the html equivalent of
ÖйúÊ ...etc which is
中国石油乌鲁木齐石化分公?新 ...etc (I believe this is the correct
process)

I checked what was being sent back from mssql by both the steps I
mentioned above and it is the following 中国石油乌鲁木齐石化分公?新
Which matches the values stored in the mssql backend, this unicode is
then transformed into the chinese symbols via the codepage.

I understand that I am may not have used all the correct terminology
in my explanation, if I am incorrect/un-clear on anything please let
me know, as any help would be useful to solve this problem.

Thanks again

Jon Skeet [C# MVP] <sk***@pobox.com> wrote in message news:<MP************************@msnews.microsoft. com>...
pabv <pa***********@gmail.com> wrote:
I am having a few issues with encoding to chinese characters and
perhaps someone might be able to assist.

At the moment I am only able to see chinese characters when displayed
as part of a datagrid. When an input textbox is displayed it does not
display chinese characters, but rather the unicode characters stored
in the mssql 2000 server backend.


What *exactly* do you mean by this? All the Chinese characters *are*
Unicode characters.

I suspect you may find that this is a limitation in the browser's
handling of textboxes more than anything else. Have you tried it in
multiple browsers? Have you looked to see exactly what's being sent
back (as opposed to what the browser's displaying).

Nov 16 '05 #3

P: n/a
pabv <pa***********@gmail.com> wrote:
I am using Internet Explorer.

What I am meant is that when the data is displayed on a data grid, the
unicode characters retrieved from the mssql backend are converted via
the codepage to chinese symbols.

In this example the html page source text has the following
中国石油乌鲁木齐石化分公?新 ...etc
Unfortunately including characters like that doesn't help much in a
post - the HTML source just has *bytes*, and an encoding associated
with it, which may well not be the one you used to look at the HTML
source, and is almost certainly not the one your newsreader or mine
uses.
(Note: this actually came via
access to the databinder.eval function, what I mean is not set via any
html server control as below) This correctly displays chinese symbols
on the browser.

However, when an textbox control is displayed the html page source
text has the following ÖйúÊ ...etc. On the
browser this displays the html equivalent of
ÖйúÊ ...etc which is
中国石油乌鲁木齐石化分公?新 ...etc (I believe this is the correct
process)

I checked what was being sent back from mssql by both the steps I
mentioned above and it is the following 中国石油乌鲁木齐石化分公?新
Which matches the values stored in the mssql backend, this unicode is
then transformed into the chinese symbols via the codepage.

I understand that I am may not have used all the correct terminology
in my explanation, if I am incorrect/un-clear on anything please let
me know, as any help would be useful to solve this problem.


It sounds like you're not sending back the correct content encoding
with the web page.

Have a look at http://www.pobox.com/~skeet/csharp/d...ngunicode.html
for the next step - you need to break the connection between SQL and
your web page, so you can check where things are going wrong.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #4

P: n/a
I think I have made some progress in finding the error to this
problem.

I realised that the error seemed to be happening with the asp.net
textbox control. I than investigated further and realised the error
where the chinese character symbols are not being displayed correctly
on the browser occurs with multiline textbox.

Multiline textbox (texbox with TextMode = TextBoxMode.MultiLine) are
rendered on the browser by asp.net as a textarea. This results in
unicode characters (ie. characters not translated by the codepage
encoding process) being displayed on both page-load and any subsequent
page-postbacks before final page submission to have data saved to the
mssql backend.

So I populated a singleline textbox, and on the browser and it
displayed the chinese character symbols correctly. This results in
both page-load and page-postbacks displaying the chinese character
symbols correctly.

Hence, it seems to occur when I populate a multiline textbox that it
is rendered as a textarea in html.

I went a step further and this time I created a html server textarea
control (HtmlTextArea). I populated the control, and on page-load the
chinese character symbols are displayed correctly on the browser.
However, when I change the text inside the htmltextarea control, on
page-postback the chinese characters symbols are no longer displayed
correctly.

So one question I have is what encodings/processes are occurring on
page-postbacks that is causing the htmltextarea control to have this
issue? And most importantly why is the multiline textbox not
displaying chinese character symbols correctly when the htmltextarea
control partially is?

I will investigate further. Perhaps you or others may have some ideas
as to whether this has any importance to the issue.

I will also take your suggestion and follow the steps in your
debugging unicode page.

Thanks again.

Jon Skeet [C# MVP] <sk***@pobox.com> wrote in message news:<MP***********************@msnews.microsoft.c om>...
pabv <pa***********@gmail.com> wrote:
I am using Internet Explorer.

What I am meant is that when the data is displayed on a data grid, the
unicode characters retrieved from the mssql backend are converted via
the codepage to chinese symbols.

In this example the html page source text has the following
? ? ? ? ? ?

...etc

Unfortunately including characters like that doesn't help much in a
post - the HTML source just has *bytes*, and an encoding associated
with it, which may well not be the one you used to look at the HTML
source, and is almost certainly not the one your newsreader or mine
uses.
(Note: this actually came via
access to the databinder.eval function, what I mean is not set via any
html server control as below) This correctly displays chinese symbols
on the browser.

However, when an textbox control is displayed the html page source
text has the following ÖйúÊ ...etc. On the
browser this displays the html equivalent of
ÖйúÊ ...etc which is
? ? ? ? ? ?

...etc (I believe this is the correct
process)

I checked what was being sent back from mssql by both the steps I
mentioned above and it is the following ? ?

? ? ? ?
Which matches the values stored in the mssql backend, this unicode is
then transformed into the chinese symbols via the codepage.

I understand that I am may not have used all the correct terminology
in my explanation, if I am incorrect/un-clear on anything please let
me know, as any help would be useful to solve this problem.


It sounds like you're not sending back the correct content encoding
with the web page.

Have a look at http://www.pobox.com/~skeet/csharp/d...ngunicode.html
for the next step - you need to break the connection between SQL and
your web page, so you can check where things are going wrong.

Nov 16 '05 #5

P: n/a
Hi,

I realised (finally !) that I am connecting to a mssql backend with
collation 1252.
So I have updated my configuration file to the following:

<globalization requestEncoding="windows-1252"
responseEncoding="windows-1252"
fileEncoding="windows-1252"
culture="zh-CN"/>
Also on page load I set programatically the following:

Response.Charset = "gb2312";

This will allow datagrid, textboxes to correctly display chinese
symbol characters on the browser. On the html source file, the text
written to the file is as codepage 1252 characters, which I then
assume if rendered correctly by the browser as I have set the charset
above.
These values are retrieved from the database correctly, and on
page-load, post-backs and saving to the database all is fine.

However, when displaying converted date strings to the browser i am
getting some strange results. When I convert a date (using the
cultures DateTimeFormat patterns) to be displayed with month
abbreviation to the browser for
example with english culture these may be the result 'Jan 2004'.
When the page is displayed with the zh-CH (simplied chinese) culture
it displays '?? 2004', that is the chinese symbols for the abbreviated
month
are not rendered on the browser correctly. In the html source file,
the text written on the file is also showing '??'.

However, when I remove the previous command

Response.Charset = "gb2312";

and replace it with

Session.CodePage = 936;

the displaying of the formatted date string is correct with the month
abbreviation showing the corresponding chinese character symbols.
The html source file, the text written to the file is as codepage 1252
characters, which I then assume if rendered correctly by the browser
as I
have set the codepage above. However, this now makes the previous
datagrids and textboxes not show the corresponding chinese character
symbols.

I when I set both the
Response.Charset = "gb2312";
Session.CodePage = 936;

this does not give the desired results. Why can't I set both of these
values together at the same time to make it work?

I had a look at the data being returned from the mssql backend and its
fine.

I performed the steps detailed in
http://www.pobox.com/~skeet/csharp/d...ngunicode.html and I used
the returned hex values below. As the data is stored as windows-1252,
I then used the double-byte code page conversions
(http://www.microsoft.com/globaldev/reference/WinCP.mspx) to see if
the displayed chinese symbols on the browser correspond to the
database values and they match.

I am a little confused as to what is happening, perhaps someone has
some ideas.

Thanks.
pa***********@gmail.com (pabv) wrote in message news:<da*************************@posting.google.c om>...
I think I have made some progress in finding the error to this
problem.

I realised that the error seemed to be happening with the asp.net
textbox control. I than investigated further and realised the error
where the chinese character symbols are not being displayed correctly
on the browser occurs with multiline textbox.

Multiline textbox (texbox with TextMode = TextBoxMode.MultiLine) are
rendered on the browser by asp.net as a textarea. This results in
unicode characters (ie. characters not translated by the codepage
encoding process) being displayed on both page-load and any subsequent
page-postbacks before final page submission to have data saved to the
mssql backend.

So I populated a singleline textbox, and on the browser and it
displayed the chinese character symbols correctly. This results in
both page-load and page-postbacks displaying the chinese character
symbols correctly.

Hence, it seems to occur when I populate a multiline textbox that it
is rendered as a textarea in html.

I went a step further and this time I created a html server textarea
control (HtmlTextArea). I populated the control, and on page-load the
chinese character symbols are displayed correctly on the browser.
However, when I change the text inside the htmltextarea control, on
page-postback the chinese characters symbols are no longer displayed
correctly.

So one question I have is what encodings/processes are occurring on
page-postbacks that is causing the htmltextarea control to have this
issue? And most importantly why is the multiline textbox not
displaying chinese character symbols correctly when the htmltextarea
control partially is?

I will investigate further. Perhaps you or others may have some ideas
as to whether this has any importance to the issue.

I will also take your suggestion and follow the steps in your
debugging unicode page.

Thanks again.

Nov 16 '05 #6

P: n/a
pabv <pa***********@gmail.com> wrote:
I realised (finally !) that I am connecting to a mssql backend with
collation 1252.
Hmm... I don't know details about what the collation does, but it can't
be restricting it to CP 1252, otherwise you wouldn't have any Chinese
characters at all.
So I have updated my configuration file to the following:

<globalization requestEncoding="windows-1252"
responseEncoding="windows-1252"
fileEncoding="windows-1252"
culture="zh-CN"/>
Also on page load I set programatically the following:

Response.Charset = "gb2312";

This will allow datagrid, textboxes to correctly display chinese
symbol characters on the browser. On the html source file, the text
written to the file is as codepage 1252 characters
Written to which file?
which I then assume if rendered correctly by the browser as I have
set the charset above. These values are retrieved from the database
correctly, and on page-load, post-backs and saving to the database
all is fine.
The trouble is, there are far too many things going on here. There are
lots of steps in the pipeline, and you need to work out what's
happening at each individual stage. Check that you're getting the
correct data to start with, then use a network analyser (or something
similar) to find out *exactly* what response your ASP.NET page is
giving.

Certainly mixing two character encodings in the same response is a bad
idea, IMO.

<snip>
However, when displaying converted date strings to the
I performed the steps detailed in
http://www.pobox.com/~skeet/csharp/d...ngunicode.html and I used
the returned hex values below. As the data is stored as windows-1252,
I then used the double-byte code page conversions
(http://www.microsoft.com/globaldev/reference/WinCP.mspx) to see if
the displayed chinese symbols on the browser correspond to the
database values and they match.


I don't think the data is actually stored as Windows-1252, as otherwise
you wouldn't be able to get any Chinese characters at all.

If you can make sure that you're getting the correct data back from the
database as Unicode characters, you can then ignore the database
character set entirely, and concentrate just on the ASP.NET part,
without even using a database.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #7

P: n/a
Jon Skeet [C# MVP] <sk***@pobox.com> wrote in message news:<MP************************@msnews.microsoft. com>...
pabv <pa***********@gmail.com> wrote:
I realised (finally !) that I am connecting to a mssql backend with
collation 1252.
Hmm... I don't know details about what the collation does, but it can't
be restricting it to CP 1252, otherwise you wouldn't have any Chinese
characters at all.


I use collation 1252 (SQL_Latin1_General_CP1_CI_AS). I think
collations specifies the character set and sort-order used by mssql
server.

I may be incorrect but from my understanding the collation does
restrict the characters stored in mssql to be codepage 1252.

As a small example, for two characters stored in the database at the
moment it stores the codepage 1252 symbols for U+00D5 (D5 = U+00D5 :
LATIN CAPITAL LETTER O WITH TILDE) and U+00CA (CA = U+00CA : LATIN
CAPITAL LETTER E WITH CIRCUMFLEX) which for reference can be seen at
http://www.microsoft.com/globaldev/r.../sbcs/1252.htm

Then to render the chinese characters on to the browser, by sending
the charset gb2312 the browser does a codepage lookup of codepage 936.
Combining the two characters above it displays the chinese symbol
(D5CA = U+5E10 : CJK UNIFIED IDEOGRAPH) which can be seen at
http://www.microsoft.com/globaldev/r...936/936_D5.htm
This is what I understand the process to be. Is this correct? Perhaps
you may be able to correct me if I have mis-understood this process of
displaying the chinese characters onto the browser.
So I have updated my configuration file to the following:

<globalization requestEncoding="windows-1252"
responseEncoding="windows-1252"
fileEncoding="windows-1252"
culture="zh-CN"/>
Also on page load I set programatically the following:

Response.Charset = "gb2312";

This will allow datagrid, textboxes to correctly display chinese
symbol characters on the browser. On the html source file, the text
written to the file is as codepage 1252 characters
Written to which file?


Sorry, I meant the html source file, the current web page I am
viewing. The html file has codepage 1252 characters and the browser
then displays the correct chinese symbol characters.
which I then assume if rendered correctly by the browser as I have
set the charset above. These values are retrieved from the database
correctly, and on page-load, post-backs and saving to the database
all is fine.
The trouble is, there are far too many things going on here. There are
lots of steps in the pipeline, and you need to work out what's
happening at each individual stage. Check that you're getting the
correct data to start with, then use a network analyser (or something
similar) to find out *exactly* what response your ASP.NET page is
giving.

Certainly mixing two character encodings in the same response is a bad
idea, IMO.

<snip>


I am not un-sure as to what I can do. The mssql backend is codepage
1252, but the characters should be displayed using codepage 936. I
need to store both english and chinese characters for the site.

However, when displaying converted date strings to the
I performed the steps detailed in http://www.pobox.com/~skeet/csharp/d...ngunicode.html and I used
the returned hex values below. As the data is stored as windows-1252,
I then used the double-byte code page conversions
(http://www.microsoft.com/globaldev/reference/WinCP.mspx) to see if
the displayed chinese symbols on the browser correspond to the
database values and they match.


I don't think the data is actually stored as Windows-1252, as otherwise
you wouldn't be able to get any Chinese characters at all.

If you can make sure that you're getting the correct data back from the
database as Unicode characters, you can then ignore the database
character set entirely, and concentrate just on the ASP.NET part,
without even using a database.


What should the data actually be stored as? Should it be storing as
code 936? Should the chinese symbol characters be stored in the db?

The site needs to display both english and chinese characters.

Thanks again for your help/suggestions as I have spent a while looking
into the matter.
Nov 16 '05 #8

P: n/a
pabv <pa***********@gmail.com> wrote:
pabv <pa***********@gmail.com> wrote:
I realised (finally !) that I am connecting to a mssql backend with
collation 1252.
Hmm... I don't know details about what the collation does, but it can't
be restricting it to CP 1252, otherwise you wouldn't have any Chinese
characters at all.


I use collation 1252 (SQL_Latin1_General_CP1_CI_AS). I think
collations specifies the character set and sort-order used by mssql
server.

I may be incorrect but from my understanding the collation does
restrict the characters stored in mssql to be codepage 1252.

As a small example, for two characters stored in the database at the
moment it stores the codepage 1252 symbols for U+00D5 (D5 = U+00D5 :
LATIN CAPITAL LETTER O WITH TILDE) and U+00CA (CA = U+00CA : LATIN
CAPITAL LETTER E WITH CIRCUMFLEX) which for reference can be seen at
http://www.microsoft.com/globaldev/r.../sbcs/1252.htm


If it can only store those characters, then it can't possibly store any
Chinese characters, can it?
Then to render the chinese characters on to the browser, by sending
the charset gb2312 the browser does a codepage lookup of codepage 936.
Combining the two characters above it displays the chinese symbol
(D5CA = U+5E10 : CJK UNIFIED IDEOGRAPH) which can be seen at
http://www.microsoft.com/globaldev/r...936/936_D5.htm
No - you don't combine two CP-1252 characters to get a double-byte
character - you combine two *bytes*.
This is what I understand the process to be. Is this correct? Perhaps
you may be able to correct me if I have mis-understood this process of
displaying the chinese characters onto the browser.
You still need to separate the database element from the browser
element. You haven't worked out for sure (as far as I've seen) whether
the database is the problem, or the browser. When you've eliminated one
of them, it doesn't need to appear again.
This will allow datagrid, textboxes to correctly display chinese
symbol characters on the browser. On the html source file, the text
written to the file is as codepage 1252 characters


Written to which file?


Sorry, I meant the html source file, the current web page I am
viewing. The html file has codepage 1252 characters and the browser
then displays the correct chinese symbol characters.


The file has *bytes* in, and the browser is being told to interpret
those bytes as GB2312. The idea of displaying one character as if it
were in a different character set is to the actual one is a really bad
one.
I am not un-sure as to what I can do. The mssql backend is codepage
1252, but the characters should be displayed using codepage 936. I
need to store both english and chinese characters for the site.


So for output, I'd suggest UTF-8, as that covers the whole of Unicode.
However, you need to work out what's actually in the database to start
with. If it's only meant to be storing CP1252 characters, but you want
it to store GB2312 characters, you need to work out exactly how you're
expecting that to happen.
If you can make sure that you're getting the correct data back from the
database as Unicode characters, you can then ignore the database
character set entirely, and concentrate just on the ASP.NET part,
without even using a database.


What should the data actually be stored as? Should it be storing as
code 936? Should the chinese symbol characters be stored in the db?


So long as you've got the database in a mode where it *can* store those
characters, it shouldn't matter *how* it stores them. Set it up so it
can store any Unicode character, and it should be fine. You just read
and write strings, and they don't have encodings associated with them.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.