Hi Mark,
Thanks for your posting.
Yes, I can imagine and believe the screen you got, however, this is infact
not caused by the underlyign charset processing difference between ASP and
ASP.NET. More exactly, this is somewhat caused by the different
globalization support and configuration between ASP and ASP.NET.
In ASP, we have limited configuration on global dev, so generally we have
two things need to set:
1. The codePage value for the serverside page, through
<%@ Language="VBScr ipt" CodePage="65001 " %> or
<%
Session.CodePag e = 65001
%>
the above two aproach all set the serverside page's request processing
charset to utf-8(code page 65001). So the comming querystring will be
decode as utf-8 encoding. If you don't set either of them, ASP will use
the default charset( your system locale on the server) to decode the string
in the comming request.
In ASP.NET, we don't need to set these, since ASP.NET bydefault use utf-8
as the request/response EncodingCharset , we can find the default setting in
web.config's <globalizatio n> element.
2. When the server page write content to clientside, the browser will
automatically use the proper encoding to display the page, also in ASP we
can use the following code to explicitly set.(If not , the server's default
charset will be used)
<%
Response.Charse t = "UTF-8"
%>
In ASP.NET as I mentioned above, the UTF-8 is also the default setting.
Also, this info will indicate the client browser to automatically choose
the correctly encoding to display the page content. If we didn't explicitly
set it, we need to
manually adjust the client browser's view-->encoding to utf-8 to display
the correct content.
Now, as for the byte period you mentiond:
%C7%D1%B1%DB%BA %A3%B3%CA%B9%E6
when using utf-8 to decode them, they'll be parsed as three undiplayable
chars , we should see three empty squares on the page (this is the correct
behavior). We can also confirm this by running the below code in .net's
winform app:
=============== =
byte[] bytes = {0xC7,0xD1,0xB1 ,0xDB,0xBA,0xA3 ,0xB3,0xCA,0xB9 ,0xE6};
string str = System.Text.Enc oding.UTF8.GetS tring(bytes);
MessageBox.Show (string.Format( "string:{0} , length:{1}",str ,str.Length));
=============== =
The reason why you got different behavior in ASP may caused by the ASP use
your server's system locale to parse the querystring rather than (utf-8).
So I suggest you try the following page which explicitly set the server
page's codepage as utf-8 and response charset to utf-8:
=============== ===============
<%@ Language="VBScr ipt" %>
<%
Session.CodePag e = 65001
%>
<%
dim str
str = Request.QuerySt ring("str")
Response.Write( "<br>String : " & str)
Response.Write( "<br>Length : " & Len(str))
Response.CharSe t = "utf-8"
%>
=============== =========
Then, when pass the
%C7%D1%B1%DB%BA %A3%B3%CA%B9%E6
as querystring, we can get three empty squares displayed on page(make sure
the client browser is using utf-8 encoding to display the page) which is
identical to the ASP.NET page(using utf-8 request/response encoding)'s
behavior.
If there're anything unclear or any other related questions, please feel
free to post here. Thanks,
Steven Cheng
Microsoft Online Support
Get Secure!
www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)