473,566 Members | 3,004 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Downloading only a part of a HTML page

Hello everyone!

I have the following problem

I need to download several HTML pages and get meta-tags out of the
code. I decided it would be better to download only the meta-tags
rather than downloading the whole page. But the standard method
(HttpWebRequest ) in C# only allows me to download the whole page. Is
there some alternative method?

Thanks in advance.
Jun 27 '08 #1
4 3442
On Tue, 29 Apr 2008 09:50:21 -0700, Nik0001 <so****@gmail.c omwrote:
I need to download several HTML pages and get meta-tags out of the
code. I decided it would be better to download only the meta-tags
rather than downloading the whole page. But the standard method
(HttpWebRequest ) in C# only allows me to download the whole page. Is
there some alternative method?
First thing that comes to mind is that I have some vague recollection that
there's a way to ask the web server for only the HEAD section of the
page. My guess is that there's some way to configure the web request so
that only the HEAD is returned.

If that doesn't pan out...

You can always use a TcpClient or Socket instance to communicate with the
server directly. Then you have complete control over the communications.

For that matter, when you process the HttpWebRequest, you can get a stream
and retrieve the data from the web server via the stream. Using that
mechanism, you have similar control over the process. I haven't looked at
the HttpWebRequest class, but I suspect that you can either close the
stream, or otherwise cancel the operation in the middle of it, without
receiving all of the data for the page.

But, that said...the web page itself should be reasonably small. It's not
like you're downloading all of the image links and other references within
the page. It's true that stopping the transfer early would save you some
data, but if for some reason I was mistaken about being able to ask only
for the HEAD of the page, it's possible that you're not really going to
save that much in actual transfer anyway, since the rest of the page could
wind up buffered by your local network drivers anyway. By that point in
time, you wouldn't save very much by stopping the transfer early.

Pete
Jun 27 '08 #2
Yes; to get just the http-headers, you can use the "HEAD" verb - if
the server supports it ;-p

var req = HttpWebRequest. Create("http://www.google.com/");
req.Method = "HEAD";
using(var resp = req.GetResponse ()) {
foreach(string key in resp.Headers.Ke ys) {
Console.WriteLi ne("{0}={1}", key,
resp.Headers[key]);
}
}

If you want the meta tags from the body, then just get the body and
parse it. The good news is that if you want the body, you don't need
to mess with HttpWebRequest etc (which frankly I find confusing):
WebClient is simpler:

using (WebClient client = new WebClient())
{
string body = client.Download String("http://
www.google.com/");
}

Of course, now the problem becomes parsing the html (which may or may-
not be xhtml)...

Marc
Jun 27 '08 #3
Meta tags normailly appear inside the <Headelement, but even if you were
successful in "chunk" downloading you'd have to stop after you read the
</headclosing tag, and since there could be a lot of script and css inside
the HEAD element, it most likely would not be much use. One of the best ways
to do this sort of thing is to use Simon Mourier's HttpAgilityPack library
(on Codeplex.com I believe)
as it produces an HtmlDocument that works with XPATH just like an XmlDocument.
-- Peter
To be a success, arm yourself with the tools you need and learn how to use
them.

Site: http://www.eggheadcafe.com
http://petesbloggerama.blogspot.com
http://ittyurl.net
"Nik0001" wrote:
Hello everyone!

I have the following problem

I need to download several HTML pages and get meta-tags out of the
code. I decided it would be better to download only the meta-tags
rather than downloading the whole page. But the standard method
(HttpWebRequest ) in C# only allows me to download the whole page. Is
there some alternative method?

Thanks in advance.
Jun 27 '08 #4
Thanks everyone! I'll be trying your methods.
Jun 27 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
4154
by: Ruskin Hardie | last post by:
The help is not very informative here, but it says, that it is possible to download files, by using the 'a' tag. What I am trying to do, is create a link, that the user clicks, which will download a file from the server. This works fine for most files, but for things like TXT and HTM, they open in the browser window. I want the 'Save As'...
5
4904
by: John Morgan | last post by:
I am using the following link to download a file of about 50k <a target= "_blank" href="http://www.bsecs.org.uk/ExecDocs/documentStore/elfridaWord.doc">open file</a> If I save the file to hard disk there is no problem/ if, in response to the menu which appears before downloading, I opt to open the doc file immediately then I get a...
7
3689
by: theyas | last post by:
How can I get my code to NOT display two "Open/Save/Cancel/More Info" dialog boxes when using the "Response.WriteFile" method to download a file to IE I've asked about this before and didn't get a satisfactory answer (check your browser) so now that I've had the time to set up a reasonable little test that I can post somewhere, I'll try again....
3
3080
by: Cathryn Johns | last post by:
Hi I'm trying to download some content as a file to the client. My code behind looks like this: private void DownloadFile(byte contents) { Response.ContentType = "text/csv"; Response.AppendHeader("content-disposition", "attachment;filename=myFile.csv");
23
1806
by: Doug van Vianen | last post by:
Hi, Is there some way in JavaScript to stop the downloading of pictures from a web page? Thank you. Doug van Vianen
4
8649
by: aldonnelley | last post by:
Hi there: a bit of a left-field question, I think. I'm writing a program that analyses image files downloaded with a basic crawler, and it's slow, mainly because I only want to analyse files within a certain size range, and I'm having to download all the files on the page, open them, get their size, and then only analyse the ones that are in...
3
2616
by: Chuck Renner | last post by:
Please help! This MIGHT even be a bug in PHP! I'll provide version numbers and site specific information (browser, OS, and kernel versions) if others cannot reproduce this problem. I'm running into some PHP behavior that I do not understand in PHP 5.1.2. I need to parse the HTML from the following carefully constructed URI:
2
2116
by: helpmefinda | last post by:
Hello everyone, I am currently writing a web spider, and I have it working for the most part using winINet functions. However, I have found that winINet functions do not get close to retrieving the full page source, and all I get is the basic html. Main Question::::: What I really want is the entire page source as you would see it if...
7
3656
by: Jetus | last post by:
I am able to download this page (enclosed code), but I then want to download a pdf file that I can view in a regular browser by clicking on the "view" link. I don't know how to automate this next part of my script. It seems like it uses Javascript. The line in the page source says href="javascript:openimagewin('JCCOGetImage.jsp?...
0
7893
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8109
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7953
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6263
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5485
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5213
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
1
2085
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1202
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
926
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.