473,804 Members | 3,562 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Is this HttpWebRequest correct?

I am currently using the HttpWebRequest and HttpWebResponse to pull
webpages down from a few urls.

string url = "some url";
HttpWebRequest httpWebRequest =
(HttpWebRequest )WebRequest.Cre ate(url);

using (HttpWebRespons e httpWebResponse =
(HttpWebRespons e)httpWebReques t.GetResponse() )
{
string html = string.Empty;

StreamReader responseReader = new
StreamReader(ht tpWebResponse.G etResponseStrea m(), Encoding.UTF7);
html = responseReader. ReadToEnd();
}

My code works but my question is, am I doing it the right way
(especially the encoding part)? Some of the websites I pull content
from have charachters in them that do not exist in the english
alphabet and currently the only way for these to be read correctly by
my streamreader is if I am using UTF7 encoding. Is this really the
only way?

Before I move forward in the project I would like to understand if
this indeed is the way to do this or if I am missing anything?

Any help is appreciated.

Thanks
Oct 3 '08 #1
15 3100
Nightcrawler wrote:
I am currently using the HttpWebRequest and HttpWebResponse to pull
webpages down from a few urls.

string url = "some url";
HttpWebRequest httpWebRequest =
(HttpWebRequest )WebRequest.Cre ate(url);

using (HttpWebRespons e httpWebResponse =
(HttpWebRespons e)httpWebReques t.GetResponse() )
{
string html = string.Empty;

StreamReader responseReader = new
StreamReader(ht tpWebResponse.G etResponseStrea m(), Encoding.UTF7);
html = responseReader. ReadToEnd();
}

My code works but my question is, am I doing it the right way
(especially the encoding part)? Some of the websites I pull content
from have charachters in them that do not exist in the english
alphabet and currently the only way for these to be read correctly by
my streamreader is if I am using UTF7 encoding. Is this really the
only way?
You should check the HTTP response header Content-Type for a charset
parameter and use that to create the stream reader. So for instance if
the server sends a header
Content-Type: text/html; charset=Windows-1252
then you would use
new StreamReader(ht tpWebResponse.G etResponseStrea m(),
Encoding.GetEnc oding("Windows-1252"))

On the other hand on the wild wild web the server often does not send a
charset parameter and the author of the HTML document only includes the
charset in a meta element e.g.
<meta http-equiv="Content-Type" content="text/html;
charset=Windows-1252">
Therefore user agents like browsers put in a lot of effort to try to
read enough of the document to find and parse that meta element to then
be able to decode the rest of the document.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Oct 3 '08 #2
So what you basically are saying is that my best bet is to look for
the meta tags in the page to determine the encoding to use and don't
rely on the HTTP response header.

Most of the sites I read using the streamreader say: <meta http-
equiv="Content-type" content="text/html; charset=UTF-8" /but there
are a few that do not have that meta tag included in their code. How
should I approach those? Is there a way for the streamreader to detect
what encoding the page is using?

Thanks for you help!
Oct 3 '08 #3
What is even more annoying is that one of the websites I read is
stating it's using UTF-8 and my streamreader still does not translate
the charachters correctly. I get little square boxes instead of the
charachters.
Oct 3 '08 #4
On Fri, 03 Oct 2008 10:28:21 -0700, Nightcrawler
<th************ @gmail.comwrote :
What is even more annoying is that one of the websites I read is
stating it's using UTF-8 and my streamreader still does not translate
the charachters correctly. I get little square boxes instead of the
charachters.
"little square boxes" might, but does not necessarily, mean that the
characters are being decoded incorrectly. It may simply be that the
characters are not displaying with whatever font you're using to show them.

How are you determining that the StreamReader doesn't correctly decode the
characters? How are you specifying, if at all, that the encoding used by
the StreamReader is UTF-8?

Pete
Oct 3 '08 #5
If I view the very same page in my browser it shows up correctly.

The meta tag states it's using UTF-8 but when I use:

StreamReader responseReader = new
StreamReader(ht tpWebResponse.G etResponseStrea m(), Encoding.UTF8);

The charachters are still unreadable. However, if I use UTF7 instead
the charachters show up correctly BUT, when I try to convert the page
to XML I get an error saying "hexadecima l value 0xD85E, is an invalid
character". I am very confused with all this. Seems a little like the
wild wild west.

Any further help is highly appreciated.

Thanks
Oct 3 '08 #6
I guess another interesting point is that when I change the code to
use: "ISO-8859-1" instead of UTF-8 like the website claims it uses, it
seems that it actuallly is reading the charachters correctly AND the
string translates into XML without any issues. Why? I have no idea and
I wish I understood it better. Again, any insight to this problem is
appreciated.

Thanks
Oct 3 '08 #7
On Fri, 03 Oct 2008 10:43:19 -0700, Nightcrawler
<th************ @gmail.comwrote :
If I view the very same page in my browser it shows up correctly.
Unless your own code is using the same fonts to display the text that the
browser uses, that's not a relevant test.

As for the other behaviors you've noticed, it does sound to me as though
it's possible that the page is not encoded in UTF-8, but rather
ISO-8859-1. But it's hard to know for sure, since we don't have the
actual data to look at.

Pete
Oct 3 '08 #8
Pete,

You can see the page if you go to the link below. It's iTunes
linkmaker page:

http://ax.phobos.apple.com.edgesuite...ss&media=music

As you can see they claim they use utf-8 but when you read it using a
streamreader with that encoding, it does not read "foreign"
charachters correctly. However, when I tried the ISO-8859-1 encoding
it seemed to work.

Thanks
Oct 3 '08 #9
On Fri, 03 Oct 2008 13:47:43 -0700, Nightcrawler
<th************ @gmail.comwrote :
Pete,

You can see the page if you go to the link below. It's iTunes
linkmaker page:

http://ax.phobos.apple.com.edgesuite...ss&media=music

As you can see they claim they use utf-8 but when you read it using a
streamreader with that encoding, it does not read "foreign"
charachters correctly. However, when I tried the ISO-8859-1 encoding
it seemed to work.
What data in the page are you having trouble with? Can you be more
specific about what's not being shown correctly?

I haven't spend a huge amount of time with the file. But a cursory look
at it shows that it appears, at least to me, to have ISO-8859-1 data
embedded within the page itself, in certain URLs.

It seems possible to me that the page encoding is technically UTF-8, but
using only the subset of UTF-8 that is the same as ISO-8859-1, and that
the page also has data that's not supposed to be interpreted as text
within the HTML, but rather should be decoded as ISO-8859-1.

That would explain why the page claims to be encoded as UTF-8 but there
are still characters that don't display correctly unless you read the HTML
as ISO-8859-1.

Or maybe the meta tag really is wrong. I'm not completely sure. :)

Pete
Oct 3 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1198
by: Darryl | last post by:
I'm trying to use the HttpWebRequest class to retrieve XML generated by a jsp page, but an 'The remote server returned an error: (500) Internal Server Error' exception is thrown when I call the WebRequest's GetResponse() method. If I type the URL into the browser (IE), the correct XML is returned. Similarly, the correct XML is returned if I open the URL using the WebClient class' OpenRead() method. Here's the code that I'm using: //The...
16
12650
by: thomas peter | last post by:
I am building a precache engine... one that request over 100 pages on an remote server to cache them remotely... can i use the HttpWebRequest and WebResponse classes for this? or must i use the MSHTML objects to really load the HTML and request all of the images on site? string lcUrl = http://www.cnn.com; // *** Establish the request
0
9206
by: Peter Qian | last post by:
Hi, I'm working on a windows form based program that can log into a web service (Apache based, https is used for auth). I was able to post the login data and obtain a sessionID. However I'm not sure how to maintain this id over multiple requests. Here are my code: Globle Varibles private static int timeOut = 20000; private CookieContainer cookieContainer; /* other cookies */
2
20178
by: GlennLanier | last post by:
Hello, I've searched the forums and can't find an answer -- if it i there, kindly point me in that direction. I would like to simulate a browser POSTing a FORM and be able to pars the response. I have the following code in my Page_Load (litResponse is defined a <ASP:Literal>):
16
11058
by: Cheung, Jeffrey Jing-Yen | last post by:
I have a windows form application that generates a request, downloads an image, and waits the user to enter in login info. Unfortunately, this image is dynamic and based on session data. I have read documents on the CookieCollection property of HttpWebRequest. Currently, I have the functionality in my code to be able to accept cookies, and return them upon a new HttpWebRequest; however, upon further inspection of the returning...
5
35293
by: rlueneberg | last post by:
I am totally confused. Can someone please illuminate what is going on under the hood in this piece of code from John Lewis. My main confusion is how the cookieContainer can be passed to the subsequent request if it is not assigned anywhere? So far this is what I understand: -Creates New cookieContainer CookieContainer cookieContainer = new CookieContainer();
10
29988
by: rlueneberg | last post by:
I am trying to foward the old sessionID using "Session.SessionID" to an HttpWebRequest CookieContainer so that I can capture the requested page session variables but it is not working as it is supposed to. The HttpResponse object always returns a different sessionID from the old one which I am trying to force. Why is objRequest not carrying over the old SessionID? private String ReadHtmlPage(string url) { String result = string.Empty;
7
7692
by: Marc Bartsch | last post by:
Hi, I have a background worker in my C# app that makes a synchronous HttpWebRequest.GetResponse() call. The idea is to POST a file to a server on the internet. When I call HttpWebRequest.Abort() on the request object on another thread, GetResponse() returns with an exception as expected. However, when I monitor the network traffic, it does not seem to stop, but to continue to be active and to upload the file. The network is active even...
1
8048
by: Proogeren | last post by:
I have a problem with a httpwebrequest that I am creating. The request in itself looks correct but using fiddler I see that a www-authentication header is sent along as well. The code is pasted below. I do not add any www-authentication header here so I was wondering if anyone knows how to remove it. I have used almost 2 days trying to figure this out so help would be highly appreciated. CORRECT No proxy-authenticate header is present no...
0
9706
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9582
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10580
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10335
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10082
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9157
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7621
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6854
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
2
3821
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.