473,396 Members | 1,713 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Downloading only a part of a HTML page

Hello everyone!

I have the following problem

I need to download several HTML pages and get meta-tags out of the
code. I decided it would be better to download only the meta-tags
rather than downloading the whole page. But the standard method
(HttpWebRequest) in C# only allows me to download the whole page. Is
there some alternative method?

Thanks in advance.
Jun 27 '08 #1
4 3422
On Tue, 29 Apr 2008 09:50:21 -0700, Nik0001 <so****@gmail.comwrote:
I need to download several HTML pages and get meta-tags out of the
code. I decided it would be better to download only the meta-tags
rather than downloading the whole page. But the standard method
(HttpWebRequest) in C# only allows me to download the whole page. Is
there some alternative method?
First thing that comes to mind is that I have some vague recollection that
there's a way to ask the web server for only the HEAD section of the
page. My guess is that there's some way to configure the web request so
that only the HEAD is returned.

If that doesn't pan out...

You can always use a TcpClient or Socket instance to communicate with the
server directly. Then you have complete control over the communications.

For that matter, when you process the HttpWebRequest, you can get a stream
and retrieve the data from the web server via the stream. Using that
mechanism, you have similar control over the process. I haven't looked at
the HttpWebRequest class, but I suspect that you can either close the
stream, or otherwise cancel the operation in the middle of it, without
receiving all of the data for the page.

But, that said...the web page itself should be reasonably small. It's not
like you're downloading all of the image links and other references within
the page. It's true that stopping the transfer early would save you some
data, but if for some reason I was mistaken about being able to ask only
for the HEAD of the page, it's possible that you're not really going to
save that much in actual transfer anyway, since the rest of the page could
wind up buffered by your local network drivers anyway. By that point in
time, you wouldn't save very much by stopping the transfer early.

Pete
Jun 27 '08 #2
Yes; to get just the http-headers, you can use the "HEAD" verb - if
the server supports it ;-p

var req = HttpWebRequest.Create("http://www.google.com/");
req.Method = "HEAD";
using(var resp = req.GetResponse()) {
foreach(string key in resp.Headers.Keys) {
Console.WriteLine("{0}={1}", key,
resp.Headers[key]);
}
}

If you want the meta tags from the body, then just get the body and
parse it. The good news is that if you want the body, you don't need
to mess with HttpWebRequest etc (which frankly I find confusing):
WebClient is simpler:

using (WebClient client = new WebClient())
{
string body = client.DownloadString("http://
www.google.com/");
}

Of course, now the problem becomes parsing the html (which may or may-
not be xhtml)...

Marc
Jun 27 '08 #3
Meta tags normailly appear inside the <Headelement, but even if you were
successful in "chunk" downloading you'd have to stop after you read the
</headclosing tag, and since there could be a lot of script and css inside
the HEAD element, it most likely would not be much use. One of the best ways
to do this sort of thing is to use Simon Mourier's HttpAgilityPack library
(on Codeplex.com I believe)
as it produces an HtmlDocument that works with XPATH just like an XmlDocument.
-- Peter
To be a success, arm yourself with the tools you need and learn how to use
them.

Site: http://www.eggheadcafe.com
http://petesbloggerama.blogspot.com
http://ittyurl.net
"Nik0001" wrote:
Hello everyone!

I have the following problem

I need to download several HTML pages and get meta-tags out of the
code. I decided it would be better to download only the meta-tags
rather than downloading the whole page. But the standard method
(HttpWebRequest) in C# only allows me to download the whole page. Is
there some alternative method?

Thanks in advance.
Jun 27 '08 #4
Thanks everyone! I'll be trying your methods.
Jun 27 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Ruskin Hardie | last post by:
The help is not very informative here, but it says, that it is possible to download files, by using the 'a' tag. What I am trying to do, is create a link, that the user clicks, which will download...
5
by: John Morgan | last post by:
I am using the following link to download a file of about 50k <a target= "_blank" href="http://www.bsecs.org.uk/ExecDocs/documentStore/elfridaWord.doc">open file</a> If I save the file to...
7
by: theyas | last post by:
How can I get my code to NOT display two "Open/Save/Cancel/More Info" dialog boxes when using the "Response.WriteFile" method to download a file to IE I've asked about this before and didn't get a...
3
by: Cathryn Johns | last post by:
Hi I'm trying to download some content as a file to the client. My code behind looks like this: private void DownloadFile(byte contents) { Response.ContentType = "text/csv";...
23
by: Doug van Vianen | last post by:
Hi, Is there some way in JavaScript to stop the downloading of pictures from a web page? Thank you. Doug van Vianen
4
by: aldonnelley | last post by:
Hi there: a bit of a left-field question, I think. I'm writing a program that analyses image files downloaded with a basic crawler, and it's slow, mainly because I only want to analyse files...
3
by: Chuck Renner | last post by:
Please help! This MIGHT even be a bug in PHP! I'll provide version numbers and site specific information (browser, OS, and kernel versions) if others cannot reproduce this problem. I'm...
2
by: helpmefinda | last post by:
Hello everyone, I am currently writing a web spider, and I have it working for the most part using winINet functions. However, I have found that winINet functions do not get close to retrieving...
7
by: Jetus | last post by:
I am able to download this page (enclosed code), but I then want to download a pdf file that I can view in a regular browser by clicking on the "view" link. I don't know how to automate this next...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.