473,226 Members | 1,571 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,226 software developers and data experts.

Determining which encoding the browser used for a url

Hi,

I am working on a dotnet url rewriting mechanism that has to be able to deal
with urls containing non-standard characters, eg
http://www.mysite.com/Télécharger.

The problem is that some browsers will encode this url using utf8 & some
using ISO 8859 (I *think* those are the only two possibilities). For ISO
8859 I can use the built-in UrlDecode function, for utf8 I am using a
function I found on Google groups:

public static string Utf8ToString(string inputString)
{
byte[] utf8Bytes = new byte[inputString.Length];
for (int i=0; i < utf8Bytes.Length; i++)
{
utf8Bytes[i] = (byte)inputString[i];
}
return Encoding.UTF8.GetString(utf8Bytes);
}

The problem is deciding *which* encoding the browser has used, and therefore
which decoding function I need to use. It seems that Mozilla-based browsers
use ISO 8859, whereas IE can use either, depending on a user-setting, and I
haven't looked at any other browsers yet.

As far as I know, the browser does NOT send anything in the headers that
tell you what url-encoding it is using, so I guess I need some way of
looking at the raw url and working out which encoding it's using.

Can anyone help me with writing a function to do this? The ideal would be a
GetEncoding(string testString) function, but I'd settle for a function
IsUtf8Encoded(string testString), on the grounds that if it *isn't* utf8, it
must be ISO 8859.

TIA,

JON

Jul 21 '05 #1
1 1513
Jon Maz wrote:
Hi,

I am working on a dotnet url rewriting mechanism that has to be able
to deal with urls containing non-standard characters, eg
http://www.mysite.com/Télécharger.
Doing this without direct control over your clients' configuration is a
daunting task, as you've just found out ;-)

The problem is that some browsers will encode this url using utf8 &
some using ISO 8859 (I think those are the only two possibilities).
Depends on you audience. Don't expect Chinese users to send ISO-8859-x.
For ISO 8859 I can use the built-in UrlDecode function, for utf8 I am
using a function I found on Google groups:

public static string Utf8ToString(string inputString)
{
byte[] utf8Bytes = new byte[inputString.Length];
for (int i=0; i < utf8Bytes.Length; i++)
{
utf8Bytes[i] = (byte)inputString[i];
}
return Encoding.UTF8.GetString(utf8Bytes);
}
Um... why? System.Web.HttpUtility has tons of methods for this,
including
public static string UrlDecode(string, Encoding);
The problem is deciding which encoding the browser has used, and
therefore which decoding function I need to use. It seems that
Mozilla-based browsers use ISO 8859, whereas IE can use either,
depending on a user-setting, and I haven't looked at any other
browsers yet.
You can't solve this. It's like trying to open an arbitrary file and
guess a correct character encoding.
As far as I know, the browser does NOT send anything in the headers
that tell you what url-encoding it is using, so I guess I need some
way of looking at the raw url and working out which encoding it's
using.
You're right it's not defined what encoding to use. Sender and receiver
need to agree on this.

Can anyone help me with writing a function to do this? The ideal
would be a GetEncoding(string testString) function, but I'd settle
for a function IsUtf8Encoded(string testString), on the grounds that
if it *isn't* utf8, it must be ISO 8859.


I'd rather drop the requirement of transparently supporting non ASCII
URL paths.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Jul 21 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

35
by: Dr.Tube | last post by:
Hi there, I have this web site (www.DrTube.com) which has the following DTD: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> which switches...
9
by: PAN | last post by:
I need some guidance here I've written this HTML code using the Windows Notebook: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EL"> <html> <head> <title>This is a Greek language title ->...
8
by: pabv | last post by:
Hello all, I am having a few issues with encoding to chinese characters and perhaps someone might be able to assist. At the moment I am only able to see chinese characters when displayed as...
1
by: Jon Maz | last post by:
Hi, I am working on a dotnet url rewriting mechanism that has to be able to deal with urls containing non-standard characters, eg http://www.mysite.com/Télécharger. The problem is that some...
4
by: Rémi | last post by:
Question: How can you determine the character set used by a webpage you built? My understanding of the issue is that the character set used by an HTML file (or any other file, for that matter)...
2
by: Alex Maghen | last post by:
I have created the world's simplest Code-Behind ASPX page in VS 2005. I've used all the defaults. But when I go to build and run the page, it comes up in Chinese (or some Asian character set). I've...
4
by: Provost Zakharov | last post by:
Hello, I just needed some help on how the DOM is encoded by the IE parser. As per the MSDN page, http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset4.asp ,server encodings...
3
by: Tony Houghton | last post by:
In Linux it's possible for filesystems to have a different encoding from the system's setting. Given a filename, is there a (preferably) portable way to determine its encoding? -- TH *...
8
by: Erwin Moller | last post by:
Hi group, I could use a bit of guidance on the following matter. I am starting a new project now and must make some decisions regarding encoding. Environment: PHP4.3, Postgres7.4.3 I must...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.