Downloading only a part of a HTML page

Nik0001

Hello everyone!

I have the following problem

I need to download several HTML pages and get meta-tags out of the
code. I decided it would be better to download only the meta-tags
rather than downloading the whole page. But the standard method
(HttpWebRequest) in C# only allows me to download the whole page. Is
there some alternative method?

Thanks in advance.

Jun 27 '08 #1

Subscribe Post Reply

3422

Peter Duniho

On Tue, 29 Apr 2008 09:50:21 -0700, Nik0001 <so****@gmail.comwrote:

I need to download several HTML pages and get meta-tags out of the
code. I decided it would be better to download only the meta-tags
rather than downloading the whole page. But the standard method
(HttpWebRequest) in C# only allows me to download the whole page. Is
there some alternative method?

First thing that comes to mind is that I have some vague recollection that
there's a way to ask the web server for only the HEAD section of the
page. My guess is that there's some way to configure the web request so
that only the HEAD is returned.

If that doesn't pan out...

You can always use a TcpClient or Socket instance to communicate with the
server directly. Then you have complete control over the communications.

For that matter, when you process the HttpWebRequest, you can get a stream
and retrieve the data from the web server via the stream. Using that
mechanism, you have similar control over the process. I haven't looked at
the HttpWebRequest class, but I suspect that you can either close the
stream, or otherwise cancel the operation in the middle of it, without
receiving all of the data for the page.

But, that said...the web page itself should be reasonably small. It's not
like you're downloading all of the image links and other references within
the page. It's true that stopping the transfer early would save you some
data, but if for some reason I was mistaken about being able to ask only
for the HEAD of the page, it's possible that you're not really going to
save that much in actual transfer anyway, since the rest of the page could
wind up buffered by your local network drivers anyway. By that point in
time, you wouldn't save very much by stopping the transfer early.

Pete

Jun 27 '08 #2

Marc Gravell

Yes; to get just the http-headers, you can use the "HEAD" verb - if
the server supports it ;-p

var req = HttpWebRequest.Create("http://www.google.com/");
req.Method = "HEAD";
using(var resp = req.GetResponse()) {
foreach(string key in resp.Headers.Keys) {
Console.WriteLine("{0}={1}", key,
resp.Headers[key]);
}
}

If you want the meta tags from the body, then just get the body and
parse it. The good news is that if you want the body, you don't need
to mess with HttpWebRequest etc (which frankly I find confusing):
WebClient is simpler:

using (WebClient client = new WebClient())
{
string body = client.DownloadString("http://
www.google.com/");
}

Of course, now the problem becomes parsing the html (which may or may-
not be xhtml)...

Marc

Jun 27 '08 #3

=?Utf-8?B?UGV0ZXIgQnJvbWJlcmcgW0MjIE1WUF0=?=

Meta tags normailly appear inside the <Headelement, but even if you were
successful in "chunk" downloading you'd have to stop after you read the
</headclosing tag, and since there could be a lot of script and css inside
the HEAD element, it most likely would not be much use. One of the best ways
to do this sort of thing is to use Simon Mourier's HttpAgilityPack library
(on Codeplex.com I believe)
as it produces an HtmlDocument that works with XPATH just like an XmlDocument.
-- Peter
To be a success, arm yourself with the tools you need and learn how to use
them.

Site: http://www.eggheadcafe.com
http://petesbloggerama.blogspot.com
http://ittyurl.net
"Nik0001" wrote:

Hello everyone!

I have the following problem

I need to download several HTML pages and get meta-tags out of the
code. I decided it would be better to download only the meta-tags
rather than downloading the whole page. But the standard method
(HttpWebRequest) in C# only allows me to download the whole page. Is
there some alternative method?

Thanks in advance.

Jun 27 '08 #4

Nik0001

Thanks everyone! I'll be trying your methods.

Jun 27 '08 #5

Similar topics

Downloading files...

by: Ruskin Hardie | last post by:

The help is not very informative here, but it says, that it is possible to download files, by using the 'a' tag. What I am trying to do, is create a link, that the user clicks, which will download...

ASP / Active Server Pages

Problem with downloading Word doc files and rtf files through browser

by: John Morgan | last post by:

I am using the following link to download a file of about 50k <a target= "_blank" href="http://www.bsecs.org.uk/ExecDocs/documentStore/elfridaWord.doc">open file</a> If I save the file to...

HTML / CSS

Double "Open/Save/Cancel/More Info" dialog when downloading text file from ASP.NET

by: theyas | last post by:

How can I get my code to NOT display two "Open/Save/Cancel/More Info" dialog boxes when using the "Response.WriteFile" method to download a file to IE I've asked about this before and didn't get a...

ASP.NET

Downloading file to client

by: Cathryn Johns | last post by:

Hi I'm trying to download some content as a file to the client. My code behind looks like this: private void DownloadFile(byte contents) { Response.ContentType = "text/csv";...

ASP.NET

Precluding the Downloading of Pictures

by: Doug van Vianen | last post by:

Hi, Is there some way in JavaScript to stop the downloading of pictures from a web page? Thank you. Doug van Vianen

Javascript

Is it possible to get image size before/without downloading?

by: aldonnelley | last post by:

Hi there: a bit of a left-field question, I think. I'm writing a program that analyses image files downloaded with a basic crawler, and it's slow, mainly because I only want to analyse files...

Python

HELP: strange php behavior downloading html

by: Chuck Renner | last post by:

Please help! This MIGHT even be a bug in PHP! I'll provide version numbers and site specific information (browser, OS, and kernel versions) if others cannot reproduce this problem. I'm...

PHP

Downloading full page source of a web page

by: helpmefinda | last post by:

Hello everyone, I am currently writing a web spider, and I have it working for the most part using winINet functions. However, I have found that winINet functions do not get close to retrieving...

C / C++

downloading a link with javascript in it..

by: Jetus | last post by:

I am able to download this page (enclosed code), but I then want to download a pdf file that I can view in a regular browser by clicking on the "view" link. I don't know how to automate this next...

Python

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General