469,964 Members | 1,651 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,964 developers. It's quick & easy.

Is this an encoding problem?

In .net I am using a HttpWebRequest to read from a WebSite. I am getting
everything back except for some characters above hex 7F which appear to have
been stripped out of my response. I see these characters if I examine the
site with IE.

It has been suggested that this is an encoding problem, but I'm unsure as
what I need to do about it. Can anybody help?
Nov 16 '05 #1
3 1662
Hi David,

This does indeed look like an encoding problem. The WebSite probably does
not use UTF-8 which is the default encoding for the StreamReader used by
GetResponseStream. If there is no encoding information in the headers you
should be able to find the information somewhere in the data itself.

Check WebResponse.ContentEncoding or if no encoding is set look for
'charset' in the source code.
You can download the data using UTF-8 then convert it to a bytestream and
back to a string using the new encoding.

--
Happy Coding!
Morten Wennevik [C# MVP]
Nov 16 '05 #2
It certainly sounds as though you know what you are talking about, however I
need more help.

The WebSite talks OK to IE just not to my C# .net program.

I checked the WebResponse with a quick watch and the stated encoding was "".

Can you tell me more precisely what coding changes I need to make to my code
to get it to work. I tried setting the TransferEncoding but that just caused
an error in Get Response.

My current coding snippet for getting the URL response is as follows.

private String ReadURL()

{

HttpWebRequest reqURL =
(HttpWebRequest)WebRequest.Create(ToString());

reqURL.Credentials = CredentialCache.DefaultCredentials;

HttpWebResponse respURL = (HttpWebResponse)reqURL.GetResponse();

Stream streamURL = respURL.GetResponseStream();

return (new StreamReader(streamURL)).ReadToEnd();

}
Nov 16 '05 #3
This is a piece of code I used to handle encodings not found in the
response stream.

Stream s = resp.GetResponseStream();
byte[] buffer = ReadStream(s); // ReadStream reads the Stream into a byte[]

// time to check encoding

string urlEnc = resp.ContentEncoding;

Encoding e = null;

if(urlEnc.Length > 0)
e = Encoding.GetEncoding(urlEnc);
else
e = Encoding.UTF8;

string temp = e.GetString(buffer, 0, buffer.Length);

// in case if no encoding, redecode the page
if(resp.ContentEncoding.Length == 0)
{
string charset = GetCharSet(resp.ContentType, true);
if(charset == null)
charset = GetCharSet(temp, false);
if(charset != null)
temp = Encoding.GetEncoding(charset).GetString(buffer, 0, buffer.Length);
}

....

// the idea of getcharset is to look for the charset tag in the source
// I forgot why all the details, but those are probably to ensure all
manners of writing will be detected

private static string GetCharSet(string s, bool header)
{
try
{
int i = s.IndexOf("charset"); // try lower case first
if(i == -1)
i = s.IndexOf("CHARSET");
if(i == -1) // charset not found, return
return null;

int j = s.IndexOf("=", i+1);
if(j == -1)
return null;

if(header)
{
int n = s.IndexOf(";", j+1);
if(n == -1)
return s.Substring(j+1);
else
return s.Substring(j+1, n-(j+1));
}

int k = s.IndexOf("\"", j+1);
int l = s.IndexOf(">", j+1);
int m = s.IndexOf("'", j+1);

if(k == -1 && l == -1 && m == -1) // not able to detect end of the
encoding word
return null;

if(k == -1)
k = Int32.MaxValue;
if(l == -1)
l = Int32.MaxValue;
if(m == -1)
l = Int32.MaxValue;
if(k == Int32.MaxValue)
return null;

// the previous eight lines are probably obsolete code I forgot to remove
// if k == -1 the substring wouldn't work

string temp = s.Substring(j+1, k-j-1);
if(temp.Length == 0)
return null;
else
return temp;
}
catch(Exception ex)
{
MessageBox.Show("GetCharSet Error: " + ex.Message);
return null;
}
}
--
Happy Coding!
Morten Wennevik [C# MVP]
Nov 16 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Ann | last post: by
8 posts views Thread by davisjoseph | last post: by
8 posts views Thread by Demon News | last post: by
4 posts views Thread by fitsch | last post: by
23 posts views Thread by Allan Ebdrup | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.