473,665 Members | 2,820 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

different encoding handling between old ASP and ASP.Net

Hi...

Just noticed something odd... In old ASP if you had query parameters that
were invalid for their encoding (broken utf-8, say), ASP would give you back
chars representing the 8-bit byte value of the broken encoding, so you still
got something for every input byte.

This appears to have changed radically in ASP.Net, going down to the base
System.Text.Enc oding object. Now, it appears to simply vaporize bytes that
don't fit in the encoding. You don't even get a ? placeholder like you get
in so many other contexts in asp.

Could anyone explain why there was such a dramatic change in the handling
of error cases? Is there a way using the .net framework to know if you had
an encoding error?

An example of the input:
/test.aspx?query =%C7%D1%B1%DB%B A%A3%B3%CA%B9%E 6
In the above, C7, A3, B3, and E6 don't make a valid utf-8 stream, but
looking for Request.QuerySt ring ("query") gives me the decoded version, just
missing any representation of the offending characters, i.e. three characters
1137, 1786, and 697 (which don't render in IE either by the way).

Request.QuerySt ring ("query") in ASP would yield a 10-character string, with
each of the original bytes converted to the raw 8-bit value.

Seems like a pretty big difference in handling things and I don't see a way
of getting any kind of indication (Exception or something) that there was a
conversion error.

Thanks
-mark

Nov 19 '05 #1
4 1609
Mark wrote:
Hi...

Just noticed something odd... In old ASP if you had query parameters
that were invalid for their encoding (broken utf-8, say), ASP would
give you back chars representing the 8-bit byte value of the broken
encoding, so you still got something for every input byte.

This appears to have changed radically in ASP.Net, going down to the
base System.Text.Enc oding object. Now, it appears to simply vaporize
bytes that don't fit in the encoding. You don't even get a ?
placeholder like you get in so many other contexts in asp.

Could anyone explain why there was such a dramatic change in the
handling of error cases? Is there a way using the .net framework to
know if you had an encoding error?

An example of the input:
/test.aspx?query =%C7%D1%B1%DB%B A%A3%B3%CA%B9%E 6
In the above, C7, A3, B3, and E6 don't make a valid utf-8 stream, but
looking for Request.QuerySt ring ("query") gives me the decoded
version, just missing any representation of the offending characters,
i.e. three characters 1137, 1786, and 697 (which don't render in IE
either by the way).

Request.QuerySt ring ("query") in ASP would yield a 10-character
string, with each of the original bytes converted to the raw 8-bit
value.


What does that mean? 0xC7, 0xA3, 0xB3, and 0xE6 are all meaningless in
UTF-8. There's no way to replace these bytes with a replacement
character, because that character's meaning would be ambiguous -- is it
the real character or a replacement? Whatever ASP does in this
situation, it's wrong.

Cheers,

--
http://www.joergjooss.de
mailto:ne****** **@joergjooss.d e
Nov 19 '05 #2
Hi Joerg...

Actually, none of the vaporized characters in the original example are
prohibited from utf-8 per se; what was broken about the original example was
that %C7 was followed by %D1; to be legal utf-8, it would have to have been
followed by %BF or lower.

Taken together, the example string that was supposed to be utf-8 *as a
whole* is invalid, and the question was more about what's the appropriate way
to respond ot that. ASP responded to an invalid utf-8 string by not trying
to find valid bits in it but by giving as close to a "raw" approximation as
it could.

ASP.Net treats it like panning for gold. It sifts through the stream until
it finds byte combos that are legal, keeps those, and drops the rest. It
doesn't even put in ? as a placeholder, like so many of the other apis do. I
don't see how that's any less "wrong" than what ASP does.

What perplexes me more is why the discontinuity? It's just another thing
that won't work the same way when migrating from ASP to ASP.Net. If there's
a rationalization why picking out bits and pieces from an invalid stream is
better than not trying to translate it at all, I'd be curious to know.

If I were God, I'd say that the "right" way to do it in .Net would be to
throw an invalid format exception when garbarge is fed to an Encoding class.
But given how expensive Exception processing is, I could understand why they
might not want to do that. Next down on my most "right" list would be to
have HttpUtility.Url Decode() return an instance of an object where one
member would be the successfully translated string (if any) and another
member would be an array of the raw bytes. Then you could test the result
and make use of the bits if you chose.

Thanks
_mark
"Joerg Jooss" wrote:
Mark wrote:
Hi...

Just noticed something odd... In old ASP if you had query parameters
that were invalid for their encoding (broken utf-8, say), ASP would
give you back chars representing the 8-bit byte value of the broken
encoding, so you still got something for every input byte.

This appears to have changed radically in ASP.Net, going down to the
base System.Text.Enc oding object. Now, it appears to simply vaporize
bytes that don't fit in the encoding. You don't even get a ?
placeholder like you get in so many other contexts in asp.

Could anyone explain why there was such a dramatic change in the
handling of error cases? Is there a way using the .net framework to
know if you had an encoding error?

An example of the input:
/test.aspx?query =%C7%D1%B1%DB%B A%A3%B3%CA%B9%E 6
In the above, C7, A3, B3, and E6 don't make a valid utf-8 stream, but
looking for Request.QuerySt ring ("query") gives me the decoded
version, just missing any representation of the offending characters,
i.e. three characters 1137, 1786, and 697 (which don't render in IE
either by the way).

Request.QuerySt ring ("query") in ASP would yield a 10-character
string, with each of the original bytes converted to the raw 8-bit
value.


What does that mean? 0xC7, 0xA3, 0xB3, and 0xE6 are all meaningless in
UTF-8. There's no way to replace these bytes with a replacement
character, because that character's meaning would be ambiguous -- is it
the real character or a replacement? Whatever ASP does in this
situation, it's wrong.

Cheers,

--
http://www.joergjooss.de
mailto:ne****** **@joergjooss.d e

Nov 19 '05 #3
Hi Mark,

Thanks for your posting.
Yes, I can imagine and believe the screen you got, however, this is infact
not caused by the underlyign charset processing difference between ASP and
ASP.NET. More exactly, this is somewhat caused by the different
globalization support and configuration between ASP and ASP.NET.

In ASP, we have limited configuration on global dev, so generally we have
two things need to set:
1. The codePage value for the serverside page, through
<%@ Language="VBScr ipt" CodePage="65001 " %> or
<%
Session.CodePag e = 65001
%>

the above two aproach all set the serverside page's request processing
charset to utf-8(code page 65001). So the comming querystring will be
decode as utf-8 encoding. If you don't set either of them, ASP will use
the default charset( your system locale on the server) to decode the string
in the comming request.

In ASP.NET, we don't need to set these, since ASP.NET bydefault use utf-8
as the request/response EncodingCharset , we can find the default setting in
web.config's <globalizatio n> element.

2. When the server page write content to clientside, the browser will
automatically use the proper encoding to display the page, also in ASP we
can use the following code to explicitly set.(If not , the server's default
charset will be used)
<%
Response.Charse t = "UTF-8"
%>

In ASP.NET as I mentioned above, the UTF-8 is also the default setting.
Also, this info will indicate the client browser to automatically choose
the correctly encoding to display the page content. If we didn't explicitly
set it, we need to
manually adjust the client browser's view-->encoding to utf-8 to display
the correct content.

Now, as for the byte period you mentiond:

%C7%D1%B1%DB%BA %A3%B3%CA%B9%E6

when using utf-8 to decode them, they'll be parsed as three undiplayable
chars , we should see three empty squares on the page (this is the correct
behavior). We can also confirm this by running the below code in .net's
winform app:
=============== =
byte[] bytes = {0xC7,0xD1,0xB1 ,0xDB,0xBA,0xA3 ,0xB3,0xCA,0xB9 ,0xE6};

string str = System.Text.Enc oding.UTF8.GetS tring(bytes);

MessageBox.Show (string.Format( "string:{0} , length:{1}",str ,str.Length));
=============== =

The reason why you got different behavior in ASP may caused by the ASP use
your server's system locale to parse the querystring rather than (utf-8).
So I suggest you try the following page which explicitly set the server
page's codepage as utf-8 and response charset to utf-8:
=============== ===============
<%@ Language="VBScr ipt" %>

<%
Session.CodePag e = 65001
%>

<%
dim str

str = Request.QuerySt ring("str")

Response.Write( "<br>String : " & str)
Response.Write( "<br>Length : " & Len(str))

Response.CharSe t = "utf-8"
%>

=============== =========

Then, when pass the
%C7%D1%B1%DB%BA %A3%B3%CA%B9%E6

as querystring, we can get three empty squares displayed on page(make sure
the client browser is using utf-8 encoding to display the page) which is
identical to the ASP.NET page(using utf-8 request/response encoding)'s
behavior.

If there're anything unclear or any other related questions, please feel
free to post here. Thanks,

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)


Nov 19 '05 #4
Mark wrote:
Hi Joerg...

Actually, none of the vaporized characters in the original example
are prohibited from utf-8 per se; what was broken about the original
example was that %C7 was followed by %D1; to be legal utf-8, it
would have to have been followed by %BF or lower.
Yep -- I was talking about bytes, not characters.
Taken together, the example string that was supposed to be utf-8 *as
a whole* is invalid, and the question was more about what's the
appropriate way to respond ot that. ASP responded to an invalid
utf-8 string by not trying to find valid bits in it but by giving as
close to a "raw" approximation as it could.

ASP.Net treats it like panning for gold. It sifts through the stream
until it finds byte combos that are legal, keeps those, and drops the
rest. It doesn't even put in ? as a placeholder, like so many of the
other apis do. I don't see how that's any less "wrong" than what ASP
does.


As I pointed put, replacement characters are misleading, because you
have no idea whether the '?' is genuine or a replacement.

What we really need here is a HttpRequest property that indicates
whether form data or the query string were decoded without skipping
input bytes.

Cheers,

--
http://www.joergjooss.de
mailto:ne****** **@joergjooss.d e
Nov 19 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
2520
by: Dylan | last post by:
Here's what I'm trying to do: - scrape some html content from various sources The issue I'm running to: - some of the sources have incorrectly encoded characters... for example, cp1252 curly quotes that were likely the result of the author copying and pasting content from Word
38
2836
by: lawrence | last post by:
I'm just now trying to give my site a character encoding of UTF-8. The site has been built in a hodge-podge way over the last 6 years. The validator tells me I've lots of characters that don't belong to the UTF-8 encoding. Other than changing them by hand, can anyone think of a clever way to find and convert these characters? http://validator.w3.org/check?uri=http%3A%2F%2Fwww.krubner.com%2F
11
3174
by: beachboy | last post by:
Hello all, I am building a CMS which has 2 language: English & Traditional Chinese my problem is all data are represent as "?????????", all pagecode are set to utf8 do I need to encoding(-> utf8) before insert into DB? OR do I need to do anything when content display? Thanks in advanced.
19
3323
by: Thomas W | last post by:
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". I stored the string as "fødselsdag" but somewhere in my code it got translated into the mess above and I cannot get the original string back. It cannot be printed in the console or written a plain text-file. I've tried to convert it using
6
3093
by: kath | last post by:
Hi all, Platform: winxp Version: Python 2.3 I have a task of reading files in a folder and creating an one excel file with sheets, one sheet per file, with sheet named as filename. I am facing problem in handling special characters. I am using XLRD and XLW package to read/write from/to file. But facing problem in handling special characters. I am getting encode error.
0
1332
by: henk-jan ebbers | last post by:
Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.9 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive:...
6
4402
by: Hongbo | last post by:
Hi, I use System.Security.Cryptography.HashAlgorithm.ComputeHash() method with SHA512 to encrypt password. I recently upgrade my website from .Net 1.1 to .Net 2.0. The passwords stop working. Would you please tell me if the System.Security.Cryptography.HashAlgorithm.ComputeHash() generate exact same hash code in both versions of .Net for the same given byte?
3
4840
by: Benjamin Hell | last post by:
Hi! I have a problem with the cx_Oracle module (Oracle database access): On a computer with cx_Oracle version 4.1 (Python 2.4.3, Oracle 10g) I can get query results consisting of strings including non-ASCII characters, e.g. the code example below outputs "é 0xe9" (which is the correct ISO-8859-1 hex code for "é"). On a newer installation with cx_Oracle 4.3.3 (Python 2.5.1, connecting to the same Oracle 10g server) these characters are...
5
1995
by: =?Utf-8?B?TWFyaw==?= | last post by:
Hi... Have another thread going on in scripting.jscript trying to work around some deficiencies in the way IE and IIS interact. The nub of it is this: ASP.Net explicitly sets an output encoding header which IE seems to want to ignore most of the time. At the same time, ASP.Net emits a properly encoded stream *but* doesn't output the BOM on utf-* output which IE would use and respect if it had it.
0
8863
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8779
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8636
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7376
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6187
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5660
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4356
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2765
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2004
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.