473,508 Members | 2,744 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

WebClient and Encoding

Is it possible to tell to the WebClient to use an "automatic" encoding when
doing DownloadString? The encoding of the connection is written in the
header, so the WebClient should be able to sense it, but I wasn't able to
find the option. I can only use a fixed Encoding (UTF8 for example) and hope
the site use it.

--- bye
May 27 '07 #1
8 24491
Hello MaxMax,

See HttpResponse.Charset and HttpResponse.ContentEncoding

---
WBR, Michael Nemtsev [.NET/C# MVP].
My blog: http://spaces.live.com/laflour
Team blog: http://devkids.blogspot.com/

"The greatest danger for most of us is not that our aim is too high and we
miss it, but that it is too low and we reach it" (c) Michelangelo

MIs it possible to tell to the WebClient to use an "automatic"
Mencoding when doing DownloadString? The encoding of the connection is
Mwritten in the header, so the WebClient should be able to sense it,
Mbut I wasn't able to find the option. I can only use a fixed Encoding
M(UTF8 for example) and hope the site use it.
M>
M--- bye
M>
May 27 '07 #2
MIs it possible to tell to the WebClient to use an "automatic"
Mencoding when doing DownloadString? The encoding of the connection is
Mwritten in the header, so the WebClient should be able to sense it,
Mbut I wasn't able to find the option. I can only use a fixed Encoding
M(UTF8 for example) and hope the site use it.
See HttpResponse.Charset and HttpResponse.ContentEncoding
In the "classical" example of DownloadString from the MSDN:

{
WebClient client = new WebClient ();
string reply = client.DownloadString (address);

Console.WriteLine (reply);
}

I can't use the HttpResponse before I make the query.... And if I use it
later then it's useless: DownloadString has already decodified (using a
possibly wrong codepage) the stream to a CodePage.

--- bye
May 27 '07 #3
On Sun, 27 May 2007 13:45:54 +0200, MaxMax <no**@none.comwrote:
>MIs it possible to tell to the WebClient to use an "automatic"
Mencoding when doing DownloadString? The encoding of the connectionis
Mwritten in the header, so the WebClient should be able to sense it,
Mbut I wasn't able to find the option. I can only use a fixed Encoding
M(UTF8 for example) and hope the site use it.
>See HttpResponse.Charset and HttpResponse.ContentEncoding

In the "classical" example of DownloadString from the MSDN:

{
WebClient client = new WebClient ();
string reply = client.DownloadString (address);

Console.WriteLine (reply);
}

I can't use the HttpResponse before I make the query.... And if I use it
later then it's useless: DownloadString has already decodified (using a
possibly wrong codepage) the stream to a CodePage.

--- bye
WebClient.DownloadString uses the encoding specified in the WebClient object when it converts the downloaded data to string. If you know the encoding in advance you can use WebClient.Encoding to set it to the properencoding, otherwise it will use Encoding.Default, which is the codepageused by your operating system.

If you don't know the Encoding in advance you probably should take a closer look at the HttpRequest/HttpResponse classes. The trick is to download it as a byte[], then using the information provides by the headers toconvert it to the proper string format.

--
Happy coding!
Morten Wennevik [C# MVP]
May 27 '07 #4
WebClient internally uses a WebRequest to do the downloading; and it will
use WebRequest.ContentType to search for "charset" header as the encoding.
If the ContentType/charset header doesn't exist or contains invalid
charset, WebClient.Encoding is used (which is Encoding.Default by default
or you can assign it before hand); however you should be aware that
WebClient.Encoding is used as a fallback, if the response contains a valid
encoding, it's always used to decode the returned data.

For a HttpWebRequest, the ContentType is from the HttpWebResponse. You can
use Fiddler (http://www.fiddlertool.com/) to trace the http headers and
see if WebClient used the correct Encoding to return the string.
Regards,
Walter Wang (wa****@online.microsoft.com, remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
May 27 '07 #5

"Walter Wang [MSFT]" <wa****@online.microsoft.comha scritto nel messaggio
news:Oj**************@TK2MSFTNGHUB02.phx.gbl...
WebClient internally uses a WebRequest to do the downloading; and it will
use WebRequest.ContentType to search for "charset" header as the encoding.
If the ContentType/charset header doesn't exist or contains invalid
charset, WebClient.Encoding is used (which is Encoding.Default by default
or you can assign it before hand); however you should be aware that
WebClient.Encoding is used as a fallback, if the response contains a valid
encoding, it's always used to decode the returned data.
I'm pretty sure it isn't so. If I set Encoding to (for example) UTF32 the
WebClient throws an exception. And if I have a page with an UTF8 character
(a page that in the WebRequest IS correctly shown as UTF8 page) and I don't
set the Encoder I receive a wrong String.

--- bye
May 28 '07 #6
On Mon, 28 May 2007 08:19:21 +0200, MaxMax <no**@none.comwrote:
>
"Walter Wang [MSFT]" <wa****@online.microsoft.comha scritto nel messaggio
news:Oj**************@TK2MSFTNGHUB02.phx.gbl...
>WebClient internally uses a WebRequest to do the downloading; and it will
use WebRequest.ContentType to search for "charset" header as the encoding.
If the ContentType/charset header doesn't exist or contains invalid
charset, WebClient.Encoding is used (which is Encoding.Default by default
or you can assign it before hand); however you should be aware that
WebClient.Encoding is used as a fallback, if the response contains a valid
encoding, it's always used to decode the returned data.
I'm pretty sure it isn't so. If I set Encoding to (for example) UTF32 the
WebClient throws an exception. And if I have a page with an UTF8 character
(a page that in the WebRequest IS correctly shown as UTF8 page) and I don't
set the Encoder I receive a wrong String.

--- bye
Try this code. It attemps to get the CharacterSet in various ways and falls back to UTF-8. Checking for ContentEncoding may not be necessary as I have yet to see it specified. The code is a bit of cut and paste and you may have to tweak it to get it running.

public string DownloadPage(url)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);

using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
{

using (Stream s = resp.GetResponseStream())
{
buffer = ReadStream(s);
}

string pageEncoding = "";
Encoding e = Encoding.UTF8;
if (resp.ContentEncoding != "")
pageEncoding = resp.ContentEncoding;
else if (resp.CharacterSet != "")
pageEncoding = resp.CharacterSet;
else if (resp.ContentType != "")
pageEncoding = GetCharacterSet(resp.ContentType);

if(pageEncoding == "")
pageEncoding = GetCharacterSet(buffer);

if (pageEncoding != "")
{
try
{
e = Encoding.GetEncoding(pageEncoding);
}
catch
{
MessageBox.Show("Invalid encoding: " + pageEncoding);
}
}

string data = e.GetString(buffer);

Status = "";

return data;
}
}

private string GetCharacterSet(string s)
{
s = s.ToUpper();
int start = s.LastIndexOf("CHARSET");
if (start == -1)
return "";

start = s.IndexOf("=", start);
if (start == -1)
return "";

start++;
s = s.Substring(start).Trim();
int end = s.Length;

int i = s.IndexOf(";");
if (i != -1)
end = i;
i = s.IndexOf("\"");
if (i != -1 && i < end)
end = i;
i = s.IndexOf("'");
if (i != -1 && i < end)
end = i;
i = s.IndexOf("/");
if (i != -1 && i < end)
end = i;

return s.Substring(0, end).Trim();
}

private string GetCharacterSet(byte[] data)
{
string s = Encoding.Default.GetString(data);
return GetCharacterSet(s);
}

private byte[] ReadStream(Stream s)
{
try
{
byte[] buffer = new byte[8096];
using (MemoryStream ms = new MemoryStream())
{
while (true)
{
int read = s.Read(buffer, 0, buffer.Length);
if (read <= 0)
{
CurLength = 0;
return ms.ToArray();
}
ms.Write(buffer, 0, read);
CurLength = ms.Length;
}
}
}
catch (Exception ex)
{
return null;
}
}

--
Happy coding!
Morten Wennevik [C# MVP]
May 28 '07 #7
Hi MaxMax,

I've done some test and it seems my previous comment isn't correct. Sorry
about that.

Please use Morten's posted code to detect the encoding and read the text
correctly.

I will consult this question within our internal discussion list to see if
this is a known issue.

Regards,
Walter Wang (wa****@online.microsoft.com, remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.

May 29 '07 #8
We have confirmed this is an issue in WebClient. I've filed an internal bug
for it.

Thanks for the feedback!

Regards,
Walter Wang (wa****@online.microsoft.com, remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.

May 30 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
4667
by: Gary Short | last post by:
Hello group, I was wondering if anyone here could help me with an odd WebClient problem. When I run the following code: WebClient aWebClient = new WebClient(); Byte aBA =...
0
4051
by: Kumar | last post by:
Hi all, I have the following code which uses WebClient.UploadValues myNameValueCollection.Add("Name", name) myNameValueCollection.Add("Age", age) .............. ............. Dim web As New...
3
7555
by: Manuel | last post by:
I have an asp page ("test.asp") that presents the data it receives from a post.When I try the following code, test.asp doesn't return the values (supposedly) posted to it. If I make a web page with...
1
1648
by: Soda | last post by:
I try to write a ASP.Net web application which will post data to other websites I'm use NameValueCollection class add(...) method and WebClient class uploadvalue(...) method to post data to...
3
2889
by: bss2004 | last post by:
Help! I'm posting a PDF Doc to a remote server using WebClient UploadData and the following code. The DOC posts fine and the server returns a positive response. If I access the remote file in...
7
3925
by: Crirus | last post by:
Hi all! I use a webClient for requesting data from a server of mine. Should I worry about long ammount of data sent by server in the client side? Or, another way, should I send some kind of a...
3
8355
by: Giggi | last post by:
Hi! I need to pass some strings via an HTTP POST from page1 (in my app) to a page on another server. I stored my data in a NameValueCollection and then sent it using WebClient.UploadValues....
1
12461
by: Mike | last post by:
I am using PowerShell to download an XML file into a string using the DownloadString(URI) method of System.Net.WebClient object. Sample commands below: $Result = $Null; $WebClient = new-object...
2
10130
by: MichaelSchoeler | last post by:
Hi, I'm having problems with the WebClient class regarding UTF-8 encoded data. When I access a specific webservice directly I can see the data arrives in corretly formatted UTF-8. But when I...
0
7228
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7128
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7393
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7058
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5635
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4715
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3191
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1565
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
426
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.