473,486 Members | 2,162 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Determining which encoding the browser used for a url

Hi,

I am working on a dotnet url rewriting mechanism that has to be able to deal
with urls containing non-standard characters, eg
http://www.mysite.com/Télécharger.

The problem is that some browsers will encode this url using utf8 & some
using ISO 8859 (I *think* those are the only two possibilities). For ISO
8859 I can use the built-in UrlDecode function, for utf8 I am using a
function I found on Google groups:

public static string Utf8ToString(string inputString)
{
byte[] utf8Bytes = new byte[inputString.Length];
for (int i=0; i < utf8Bytes.Length; i++)
{
utf8Bytes[i] = (byte)inputString[i];
}
return Encoding.UTF8.GetString(utf8Bytes);
}

The problem is deciding *which* encoding the browser has used, and therefore
which decoding function I need to use. It seems that Mozilla-based browsers
use ISO 8859, whereas IE can use either, depending on a user-setting, and I
haven't looked at any other browsers yet.

As far as I know, the browser does NOT send anything in the headers that
tell you what url-encoding it is using, so I guess I need some way of
looking at the raw url and working out which encoding it's using.

Can anyone help me with writing a function to do this? The ideal would be a
GetEncoding(string testString) function, but I'd settle for a function
IsUtf8Encoded(string testString), on the grounds that if it *isn't* utf8, it
must be ISO 8859.

TIA,

JON

Jul 21 '05 #1
1 1521
Jon Maz wrote:
Hi,

I am working on a dotnet url rewriting mechanism that has to be able
to deal with urls containing non-standard characters, eg
http://www.mysite.com/Télécharger.
Doing this without direct control over your clients' configuration is a
daunting task, as you've just found out ;-)

The problem is that some browsers will encode this url using utf8 &
some using ISO 8859 (I think those are the only two possibilities).
Depends on you audience. Don't expect Chinese users to send ISO-8859-x.
For ISO 8859 I can use the built-in UrlDecode function, for utf8 I am
using a function I found on Google groups:

public static string Utf8ToString(string inputString)
{
byte[] utf8Bytes = new byte[inputString.Length];
for (int i=0; i < utf8Bytes.Length; i++)
{
utf8Bytes[i] = (byte)inputString[i];
}
return Encoding.UTF8.GetString(utf8Bytes);
}
Um... why? System.Web.HttpUtility has tons of methods for this,
including
public static string UrlDecode(string, Encoding);
The problem is deciding which encoding the browser has used, and
therefore which decoding function I need to use. It seems that
Mozilla-based browsers use ISO 8859, whereas IE can use either,
depending on a user-setting, and I haven't looked at any other
browsers yet.
You can't solve this. It's like trying to open an arbitrary file and
guess a correct character encoding.
As far as I know, the browser does NOT send anything in the headers
that tell you what url-encoding it is using, so I guess I need some
way of looking at the raw url and working out which encoding it's
using.
You're right it's not defined what encoding to use. Sender and receiver
need to agree on this.

Can anyone help me with writing a function to do this? The ideal
would be a GetEncoding(string testString) function, but I'd settle
for a function IsUtf8Encoded(string testString), on the grounds that
if it *isn't* utf8, it must be ISO 8859.


I'd rather drop the requirement of transparently supporting non ASCII
URL paths.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Jul 21 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

35
5126
by: Dr.Tube | last post by:
Hi there, I have this web site (www.DrTube.com) which has the following DTD: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> which switches...
9
14217
by: PAN | last post by:
I need some guidance here I've written this HTML code using the Windows Notebook: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EL"> <html> <head> <title>This is a Greek language title ->...
8
11927
by: pabv | last post by:
Hello all, I am having a few issues with encoding to chinese characters and perhaps someone might be able to assist. At the moment I am only able to see chinese characters when displayed as...
1
1397
by: Jon Maz | last post by:
Hi, I am working on a dotnet url rewriting mechanism that has to be able to deal with urls containing non-standard characters, eg http://www.mysite.com/Télécharger. The problem is that some...
4
2036
by: Rémi | last post by:
Question: How can you determine the character set used by a webpage you built? My understanding of the issue is that the character set used by an HTML file (or any other file, for that matter)...
2
2200
by: Alex Maghen | last post by:
I have created the world's simplest Code-Behind ASPX page in VS 2005. I've used all the defaults. But when I go to build and run the page, it comes up in Chinese (or some Asian character set). I've...
4
2445
by: Provost Zakharov | last post by:
Hello, I just needed some help on how the DOM is encoded by the IE parser. As per the MSDN page, http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset4.asp ,server encodings...
3
2314
by: Tony Houghton | last post by:
In Linux it's possible for filesystems to have a different encoding from the system's setting. Given a filename, is there a (preferably) portable way to determine its encoding? -- TH *...
8
2663
by: Erwin Moller | last post by:
Hi group, I could use a bit of guidance on the following matter. I am starting a new project now and must make some decisions regarding encoding. Environment: PHP4.3, Postgres7.4.3 I must...
0
7100
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7175
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6842
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7330
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
4559
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3070
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1378
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
598
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
262
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.