By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,485 Members | 1,031 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,485 IT Pros & Developers. It's quick & easy.

Server.HTMLEncode with UTF-8

P: n/a
While working on some multilingual code I found a rather strange thing
happening with Server.HTMLEncode.

While loading different languages I change the Codepage and Charset in
ASP to reflect the language. This all works fine. However when I tried
to use Charset UTF-8 with Codepage 65001 everywhere I found that
HTMLEncode always translates all UTF-8 characters to &#xxxx.

Example:

Response.Charset = "shift_jis"
Response.Codepage = 932
Response.Write "Some Japanese Text"
Response.Write Server.HTMLEncode("Some Japanese Text")

Both Write actions output a character string in Shift_JIS, no UTF-8,
no &#xxxx sequences. Just fine and as it should be.

But when I do this:

Response.Charset = "utf-8"
Response.Codepage = 65001
Response.Write "Some Japanese Text"
Response.Write Server.HTMLEncode("Some Japanese Text")

The first write outputs an UTF-8 character string but the second Write
outputs a string encoded into &#xxxx sequences.

Why is that ???

Grtz,
Marco
Sep 15 '06 #1
Share this Question
Share on Google+
1 Reply


P: n/a

"Marco Miltenburg" <mi**@xs4all.nlwrote in message
news:4h********************************@4ax.com...
While working on some multilingual code I found a rather strange thing
happening with Server.HTMLEncode.

While loading different languages I change the Codepage and Charset in
ASP to reflect the language. This all works fine. However when I tried
to use Charset UTF-8 with Codepage 65001 everywhere I found that
HTMLEncode always translates all UTF-8 characters to &#xxxx.

Example:

Response.Charset = "shift_jis"
Response.Codepage = 932
Response.Write "Some Japanese Text"
Response.Write Server.HTMLEncode("Some Japanese Text")

Both Write actions output a character string in Shift_JIS, no UTF-8,
no &#xxxx sequences. Just fine and as it should be.

But when I do this:

Response.Charset = "utf-8"
Response.Codepage = 65001
Response.Write "Some Japanese Text"
Response.Write Server.HTMLEncode("Some Japanese Text")

The first write outputs an UTF-8 character string but the second Write
outputs a string encoded into &#xxxx sequences.

Why is that ???
Whilst all string handling in script is done in unicode, script itself can't
be encoded in unicode. It is possible to run a script encoded as UTF-8
simply because all keywords and operators etc are within the ASCII character
set and therefore are identical when encoded as UTF-8. However string
literals in the code will be treated as single byte ANSI characters despite
having been encoded as UTF-8.

In the real world where the string being encoded by HTMLEncode has be
retrieved from say a database this problem wouldn't occur. If you need
string literals in a multi-language output you will need to store them
somewhere else.

Anthony.
Grtz,
Marco

Sep 15 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.