On Tue, 29 Apr 2008 09:50:21 -0700, Nik0001 <so****@gmail.comwrote:
I need to download several HTML pages and get meta-tags out of the
code. I decided it would be better to download only the meta-tags
rather than downloading the whole page. But the standard method
(HttpWebRequest) in C# only allows me to download the whole page. Is
there some alternative method?
First thing that comes to mind is that I have some vague recollection that
there's a way to ask the web server for only the HEAD section of the
page. My guess is that there's some way to configure the web request so
that only the HEAD is returned.
If that doesn't pan out...
You can always use a TcpClient or Socket instance to communicate with the
server directly. Then you have complete control over the communications.
For that matter, when you process the HttpWebRequest, you can get a stream
and retrieve the data from the web server via the stream. Using that
mechanism, you have similar control over the process. I haven't looked at
the HttpWebRequest class, but I suspect that you can either close the
stream, or otherwise cancel the operation in the middle of it, without
receiving all of the data for the page.
But, that said...the web page itself should be reasonably small. It's not
like you're downloading all of the image links and other references within
the page. It's true that stopping the transfer early would save you some
data, but if for some reason I was mistaken about being able to ask only
for the HEAD of the page, it's possible that you're not really going to
save that much in actual transfer anyway, since the rest of the page could
wind up buffered by your local network drivers anyway. By that point in
time, you wouldn't save very much by stopping the transfer early.
Pete