473,474 Members | 1,669 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Determining which encoding the browser used for a url

Hi,

I am working on a dotnet url rewriting mechanism that has to be able to deal
with urls containing non-standard characters, eg
http://www.mysite.com/Télécharger.

The problem is that some browsers will encode this url using utf8 & some
using ISO 8859 (I *think* those are the only two possibilities). For ISO
8859 I can use the built-in UrlDecode function, for utf8 I am using a
function I found on Google groups:

public static string Utf8ToString(string inputString)
{
byte[] utf8Bytes = new byte[inputString.Length];
for (int i=0; i < utf8Bytes.Length; i++)
{
utf8Bytes[i] = (byte)inputString[i];
}
return Encoding.UTF8.GetString(utf8Bytes);
}

The problem is deciding *which* encoding the browser has used, and therefore
which decoding function I need to use. It seems that Mozilla-based browsers
use ISO 8859, whereas IE can use either, depending on a user-setting, and I
haven't looked at any other browsers yet.

As far as I know, the browser does NOT send anything in the headers that
tell you what url-encoding it is using, so I guess I need some way of
looking at the raw url and working out which encoding it's using.

Can anyone help me with writing a function to do this? The ideal would be a
GetEncoding(string testString) function, but I'd settle for a function
IsUtf8Encoded(string testString), on the grounds that if it *isn't* utf8, it
must be ISO 8859.

TIA,

JON

Nov 19 '05 #1
1 992
Jon Maz wrote:
Hi,

I am working on a dotnet url rewriting mechanism that has to be able
to deal with urls containing non-standard characters, eg
http://www.mysite.com/Télécharger.
Doing this without direct control over your clients' configuration is a
daunting task, as you've just found out ;-)

The problem is that some browsers will encode this url using utf8 &
some using ISO 8859 (I think those are the only two possibilities).
Depends on you audience. Don't expect Chinese users to send ISO-8859-x.
For ISO 8859 I can use the built-in UrlDecode function, for utf8 I am
using a function I found on Google groups:

public static string Utf8ToString(string inputString)
{
byte[] utf8Bytes = new byte[inputString.Length];
for (int i=0; i < utf8Bytes.Length; i++)
{
utf8Bytes[i] = (byte)inputString[i];
}
return Encoding.UTF8.GetString(utf8Bytes);
}
Um... why? System.Web.HttpUtility has tons of methods for this,
including
public static string UrlDecode(string, Encoding);
The problem is deciding which encoding the browser has used, and
therefore which decoding function I need to use. It seems that
Mozilla-based browsers use ISO 8859, whereas IE can use either,
depending on a user-setting, and I haven't looked at any other
browsers yet.
You can't solve this. It's like trying to open an arbitrary file and
guess a correct character encoding.
As far as I know, the browser does NOT send anything in the headers
that tell you what url-encoding it is using, so I guess I need some
way of looking at the raw url and working out which encoding it's
using.
You're right it's not defined what encoding to use. Sender and receiver
need to agree on this.

Can anyone help me with writing a function to do this? The ideal
would be a GetEncoding(string testString) function, but I'd settle
for a function IsUtf8Encoded(string testString), on the grounds that
if it *isn't* utf8, it must be ISO 8859.


I'd rather drop the requirement of transparently supporting non ASCII
URL paths.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 19 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

35
by: Dr.Tube | last post by:
Hi there, I have this web site (www.DrTube.com) which has the following DTD: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> which switches...
1
by: Jon Maz | last post by:
Hi, I am working on a dotnet url rewriting mechanism that has to be able to deal with urls containing non-standard characters, eg http://www.mysite.com/Télécharger. The problem is that some...
9
by: PAN | last post by:
I need some guidance here I've written this HTML code using the Windows Notebook: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EL"> <html> <head> <title>This is a Greek language title ->...
8
by: pabv | last post by:
Hello all, I am having a few issues with encoding to chinese characters and perhaps someone might be able to assist. At the moment I am only able to see chinese characters when displayed as...
4
by: Rémi | last post by:
Question: How can you determine the character set used by a webpage you built? My understanding of the issue is that the character set used by an HTML file (or any other file, for that matter)...
2
by: Alex Maghen | last post by:
I have created the world's simplest Code-Behind ASPX page in VS 2005. I've used all the defaults. But when I go to build and run the page, it comes up in Chinese (or some Asian character set). I've...
4
by: Provost Zakharov | last post by:
Hello, I just needed some help on how the DOM is encoded by the IE parser. As per the MSDN page, http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset4.asp ,server encodings...
3
by: Tony Houghton | last post by:
In Linux it's possible for filesystems to have a different encoding from the system's setting. Given a filename, is there a (preferably) portable way to determine its encoding? -- TH *...
8
by: Erwin Moller | last post by:
Hi group, I could use a bit of guidance on the following matter. I am starting a new project now and must make some decisions regarding encoding. Environment: PHP4.3, Postgres7.4.3 I must...
4
by: =?ISO-8859-1?Q?Nordl=F6w?= | last post by:
How do I efficiently determine which possible encoding(s) a given text is in? Can I use the iconv.h api somehow? Thanks in advance, Nordlöw
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.