473,320 Members | 2,088 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Determining which encoding the browser used for a url

Hi,

I am working on a dotnet url rewriting mechanism that has to be able to deal
with urls containing non-standard characters, eg
http://www.mysite.com/Télécharger.

The problem is that some browsers will encode this url using utf8 & some
using ISO 8859 (I *think* those are the only two possibilities). For ISO
8859 I can use the built-in UrlDecode function, for utf8 I am using a
function I found on Google groups:

public static string Utf8ToString(string inputString)
{
byte[] utf8Bytes = new byte[inputString.Length];
for (int i=0; i < utf8Bytes.Length; i++)
{
utf8Bytes[i] = (byte)inputString[i];
}
return Encoding.UTF8.GetString(utf8Bytes);
}

The problem is deciding *which* encoding the browser has used, and therefore
which decoding function I need to use. It seems that Mozilla-based browsers
use ISO 8859, whereas IE can use either, depending on a user-setting, and I
haven't looked at any other browsers yet.

As far as I know, the browser does NOT send anything in the headers that
tell you what url-encoding it is using, so I guess I need some way of
looking at the raw url and working out which encoding it's using.

Can anyone help me with writing a function to do this? The ideal would be a
GetEncoding(string testString) function, but I'd settle for a function
IsUtf8Encoded(string testString), on the grounds that if it *isn't* utf8, it
must be ISO 8859.

TIA,

JON

Nov 16 '05 #1
1 1396
Jon Maz wrote:
Hi,

I am working on a dotnet url rewriting mechanism that has to be able
to deal with urls containing non-standard characters, eg
http://www.mysite.com/Télécharger.
Doing this without direct control over your clients' configuration is a
daunting task, as you've just found out ;-)

The problem is that some browsers will encode this url using utf8 &
some using ISO 8859 (I think those are the only two possibilities).
Depends on you audience. Don't expect Chinese users to send ISO-8859-x.
For ISO 8859 I can use the built-in UrlDecode function, for utf8 I am
using a function I found on Google groups:

public static string Utf8ToString(string inputString)
{
byte[] utf8Bytes = new byte[inputString.Length];
for (int i=0; i < utf8Bytes.Length; i++)
{
utf8Bytes[i] = (byte)inputString[i];
}
return Encoding.UTF8.GetString(utf8Bytes);
}
Um... why? System.Web.HttpUtility has tons of methods for this,
including
public static string UrlDecode(string, Encoding);
The problem is deciding which encoding the browser has used, and
therefore which decoding function I need to use. It seems that
Mozilla-based browsers use ISO 8859, whereas IE can use either,
depending on a user-setting, and I haven't looked at any other
browsers yet.
You can't solve this. It's like trying to open an arbitrary file and
guess a correct character encoding.
As far as I know, the browser does NOT send anything in the headers
that tell you what url-encoding it is using, so I guess I need some
way of looking at the raw url and working out which encoding it's
using.
You're right it's not defined what encoding to use. Sender and receiver
need to agree on this.

Can anyone help me with writing a function to do this? The ideal
would be a GetEncoding(string testString) function, but I'd settle
for a function IsUtf8Encoded(string testString), on the grounds that
if it *isn't* utf8, it must be ISO 8859.


I'd rather drop the requirement of transparently supporting non ASCII
URL paths.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 16 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

35
by: Dr.Tube | last post by:
Hi there, I have this web site (www.DrTube.com) which has the following DTD: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> which switches...
1
by: Jon Maz | last post by:
Hi, I am working on a dotnet url rewriting mechanism that has to be able to deal with urls containing non-standard characters, eg http://www.mysite.com/Télécharger. The problem is that some...
9
by: PAN | last post by:
I need some guidance here I've written this HTML code using the Windows Notebook: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EL"> <html> <head> <title>This is a Greek language title ->...
8
by: pabv | last post by:
Hello all, I am having a few issues with encoding to chinese characters and perhaps someone might be able to assist. At the moment I am only able to see chinese characters when displayed as...
4
by: Rémi | last post by:
Question: How can you determine the character set used by a webpage you built? My understanding of the issue is that the character set used by an HTML file (or any other file, for that matter)...
2
by: Alex Maghen | last post by:
I have created the world's simplest Code-Behind ASPX page in VS 2005. I've used all the defaults. But when I go to build and run the page, it comes up in Chinese (or some Asian character set). I've...
4
by: Provost Zakharov | last post by:
Hello, I just needed some help on how the DOM is encoded by the IE parser. As per the MSDN page, http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset4.asp ,server encodings...
3
by: Tony Houghton | last post by:
In Linux it's possible for filesystems to have a different encoding from the system's setting. Given a filename, is there a (preferably) portable way to determine its encoding? -- TH *...
8
by: Erwin Moller | last post by:
Hi group, I could use a bit of guidance on the following matter. I am starting a new project now and must make some decisions regarding encoding. Environment: PHP4.3, Postgres7.4.3 I must...
4
by: =?ISO-8859-1?Q?Nordl=F6w?= | last post by:
How do I efficiently determine which possible encoding(s) a given text is in? Can I use the iconv.h api somehow? Thanks in advance, Nordlöw
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shćllîpôpď 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.