473,320 Members | 2,162 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

HTTPWebrequest foreign characters are excluded in the response stream.

hi,
I am using HTTPWebrequest object to download google results.
in the response stream I am not getting some foreign characters
eg. If I search "signo de pregunta", all the spanish characters are
missing in response stream.
The same search in the internet explorer shows all the characters.

I am sending all the required headers with the HTTPWebrequest.

Following code is used to get gogle results
-------------------------------------------------------------------
' Setup our Web request
objrequest = CType(WebRequest.Create(URL),
HttpWebRequest)
objrequest.Accept = "*/*"
objrequest.Headers.Add("Accept-Encoding", "gzip,
deflate")
objrequest.Headers.Add("Accept-Language", "en-us")
objrequest.ContentType = "text/html; charset=UTF-8"
objrequest.Timeout = TimeoutSeconds * 1000

' Retrieve data from request
objResponse = CType(objrequest.GetResponse,
HttpWebResponse)
'objStreamReceive = objResponse.GetResponseStream

objEncoding =
System.Text.Encoding.GetEncoding("utf-8")
objStreamRead = New
System.IO.StreamReader(objResponse.GetResponseStre am,
Text.Encoding.UTF7)

' Set function return value
PageHTML = objStreamRead.ReadToEnd()

-------------------------------------------------------------------
TIA
-Mangesh
Jul 21 '05 #1
9 4889
Mangesh wrote:
hi,
I am using HTTPWebrequest object to download google results.
in the response stream I am not getting some foreign characters
eg. If I search "signo de pregunta", all the spanish characters are
missing in response stream.
The same search in the internet explorer shows all the characters.

I am sending all the required headers with the HTTPWebrequest.

Following code is used to get gogle results
-------------------------------------------------------------------
' Setup our Web request
objrequest = CType(WebRequest.Create(URL),
HttpWebRequest)
objrequest.Accept = "*/*"
objrequest.Headers.Add("Accept-Encoding", "gzip,
deflate")
objrequest.Headers.Add("Accept-Language", "en-us")
objrequest.ContentType = "text/html; charset=UTF-8"
objrequest.Timeout = TimeoutSeconds * 1000

' Retrieve data from request
objResponse = CType(objrequest.GetResponse,
HttpWebResponse)
'objStreamReceive = objResponse.GetResponseStream

objEncoding =
System.Text.Encoding.GetEncoding("utf-8")
objStreamRead = New
System.IO.StreamReader(objResponse.GetResponseStre am,
Text.Encoding.UTF7)

' Set function return value
PageHTML = objStreamRead.ReadToEnd()

-------------------------------------------------------------------


To be quite frank, there's a lot that's wrong with your code. But my main
concern is that you're using UTF-7 for decoding the web response. That's
just plain wrong. Use UTF-8 (you're constructing an instance without using
it) or ISO-8859-1.

Cheers,

--
Joerg Jooss
www.joergjooss.de
ne**@joergjooss.de
Jul 21 '05 #2
Hi,
Yes, you may find some wrong things in the code but this is the result
of trying different things to get correct result.
some of the headers may not be required but I haven't removed them.
please see the below code as it works fine now.
I am using windows-1252 encoding instead of utf-8 and thats working
fine.

-----------------------------------------------
' Setup our Web request
objrequest = CType(WebRequest.Create(URL),
HttpWebRequest)

'headers
objrequest.Accept = "*/*"
objrequest.Headers.Add("Accept-Encoding", "gzip,
deflate")
objrequest.Headers.Add("Accept-Language", "en-us")
objrequest.Headers.Add("HTTP_USER_AGENT", "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR
1.1.4322)")
objrequest.ContentType = "text/html; charset=UTF-8"

'timeout
objrequest.Timeout = TimeoutSeconds * 1000

' Retrieve data from request
objResponse = CType(objrequest.GetResponse,
HttpWebResponse)

'use windows encoding
objEncoding = System.Text.Encoding.GetEncoding(1252)

objStreamRead = New
System.IO.StreamReader(objResponse.GetResponseStre am, objEncoding)

' Set function return value
getPageHTML = objStreamRead.ReadToEnd()
-----------------------------------------------
Thanks for your suggestion.
-Mangesh

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Jul 21 '05 #3
Mangesh <ma********@hotmail.com> wrote:
I am using HTTPWebrequest object to download google results.
in the response stream I am not getting some foreign characters
eg. If I search "signo de pregunta", all the spanish characters are
missing in response stream.
The same search in the internet explorer shows all the characters.

I am sending all the required headers with the HTTPWebrequest.

Following code is used to get gogle results
-------------------------------------------------------------------
' Setup our Web request
objrequest = CType(WebRequest.Create(URL),
HttpWebRequest)
objrequest.Accept = "*/*"
objrequest.Headers.Add("Accept-Encoding", "gzip,
deflate")
objrequest.Headers.Add("Accept-Language", "en-us")
objrequest.ContentType = "text/html; charset=UTF-8"
objrequest.Timeout = TimeoutSeconds * 1000

' Retrieve data from request
objResponse = CType(objrequest.GetResponse,
HttpWebResponse)
'objStreamReceive = objResponse.GetResponseStream

objEncoding =
System.Text.Encoding.GetEncoding("utf-8")
objStreamRead = New
System.IO.StreamReader(objResponse.GetResponseStre am,
Text.Encoding.UTF7)


Well, you're assuming the response is in UTF-7, which it almost
certainly isn't. You need to find out what the response character set
actually *is*, and use that.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #4
Mangesh Paranjape <ma********@hotmail.com> wrote:
Yes, you may find some wrong things in the code but this is the result
of trying different things to get correct result.
some of the headers may not be required but I haven't removed them.
please see the below code as it works fine now.
I am using windows-1252 encoding instead of utf-8 and thats working
fine.


I think it's unlikely that that's the correct way to do things though -
web servers really shouldn't be using code page 1252. It's more likely
it's sending back ISO-8859-1. You should use
HttpWebResponse.CharacterSet to find out what the server has told you
the response is in.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #5
Hi,
I have already tried this.
HTTPWebresponse.CharacterSet property has null value.
I also tried HTTPWebresponse.ContentEncoding which is also empty. any
idea?

But Google html response comes with a header which says
"charset=ISO-8859-1".
I think I should change my encoding from winodws to ISO-8859-1.

Thanks for that,
-Mangesh


*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Jul 21 '05 #6
Mangesh Paranjape <ma********@hotmail.com> wrote:
I have already tried this.
HTTPWebresponse.CharacterSet property has null value.
I also tried HTTPWebresponse.ContentEncoding which is also empty. any
idea?
That sounds very odd.
But Google html response comes with a header which says
"charset=ISO-8859-1".
Hang on - how are you seeing that? Just from a browser, or what? It
should be present in the response from the web client too.
I think I should change my encoding from winodws to ISO-8859-1.


That would certainly be a start, use.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #7
>Hang on - how are you seeing that? Just from a browser, or >what? It
should be present in the response from the web client too.


I am sorry, I mean the response HTML stream contains "<meta" tag which
has charset attribute.
shown below

<meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">

If you do the same search on the browser, the "<meta>" tag is different.
shown below

<meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">

overall, using ISO-8859-1 is the best bet.

Thanks,
-Mangesh.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Jul 21 '05 #8
Mangesh Paranjape <ma********@hotmail.com> wrote:
Hang on - how are you seeing that? Just from a browser, or >what? It
should be present in the response from the web client too.
I am sorry, I mean the response HTML stream contains "<meta" tag which
has charset attribute.
shown below

<meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">


Ah, right. Shame it doesn't put it in the headers appropriately :(
If you do the same search on the browser, the "<meta>" tag is different.
shown below

<meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">

overall, using ISO-8859-1 is the best bet.


Well, arguably making the same kind of request that the browser does,
and using UTF-8, would be better than using ISO-8859-1 as it wouldn't
be as restrictive.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #9
Mangesh Paranjape wrote:
Hi,
Yes, you may find some wrong things in the code but this is the result
of trying different things to get correct result.
some of the headers may not be required but I haven't removed them.
please see the below code as it works fine now.
I am using windows-1252 encoding instead of utf-8 and thats working
fine.
As Jon pointed out, Windows-1252 isn't really a common encoding for HTML
content. HttpWebResponse.ContentEncoding gives you the Content-Encoding
header -- stuff like gzip, deflate etc. HttpWebResponse.CharacterSet parses
the "charset" from the Content-type header, but it doesn't work for me
either...
objrequest.Headers.Add("HTTP_USER_AGENT", "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR
1.1.4322)")


This header is called "User-Agent".

Cheers,

--
Joerg Jooss
www.joergjooss.de
ne**@joergjooss.de
Jul 21 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Kueishiong Tu | last post by:
I have a url, I pass it to Webclient, and I get response without any problem. String* uriString = S"trade7.masterlink.com.tw/futures/QuotePrice.jsp"; String* postData = S""; // Create a new...
1
by: Bhupesh Saini | last post by:
I am trying to call a ASPX page using HttpWebRequest class and pass cookie information to it. My ASPX pages gets called just fine, however none of the request cookies are available to the ASPX page....
16
by: thomas peter | last post by:
I am building a precache engine... one that request over 100 pages on an remote server to cache them remotely... can i use the HttpWebRequest and WebResponse classes for this? or must i use the...
0
by: Sivashankaran Vaidhyalingam | last post by:
Hi folks, I have an aspx application App A hosted in a server which is inside the intranet . I need to serve pages from this application _through_ another application App B which acts as a proxy...
2
by: VS | last post by:
Hi I am trying to access a web application from another program using HttpWebRequest class. This web application is nothing but a web site consisting of a login page and few other pages. I'm...
1
by: iana_kosio | last post by:
Hi, I am using HttpWebRequest class to communicate with remote server. In some cases the server would return 5xx status code which results in HttpWebRequest object throwing an exception. I,...
1
by: sfoxover | last post by:
Hi, Could someone please give me some suggestions on how to make this class robust. I need to be able to handle around 20 similtanious requests to this class which causes a web browser to...
9
by: Mangesh | last post by:
hi, I am using HTTPWebrequest object to download google results. in the response stream I am not getting some foreign characters eg. If I search "signo de pregunta", all the spanish characters are...
3
by: wavemill | last post by:
Hello! This is my problem: I would like acces to my ebay account with post data. I have a problem with cookie. The registration in the ebay website work well after there is an error" you...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.