472,342 Members | 1,282 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,342 software developers and data experts.

Is it possible to download only the <head> of a web page?

Rex
I am writing a script that executes a bunch of queries through a form
on a website and reads the results. I am only interested in the
<titlesection in the <headof each web page. Currently, each page
the server returns is about 100kb and contains a bunch of HTML and
Javascript, all of which I don't need; I don't want to waste bandwidth
or consume too much of the server's resources. I just need the <title>
string.

Is there any way to download less than the entire web page?
Sep 4 '08 #1
2 1208
Rex wrote:
I am writing a script that executes a bunch of queries through a form
on a website and reads the results. I am only interested in the
<titlesection in the <headof each web page. Currently, each page
the server returns is about 100kb and contains a bunch of HTML and
Javascript, all of which I don't need; I don't want to waste bandwidth
or consume too much of the server's resources. I just need the <title>
string.
you need to issue a GET request to get the HTML head section, which
almost always means that the server will build the entire page before
sending it to you (so it can set content-length etc).

you can save on network traffic by parsing the data as it arrives, and
stopping when you've gotten the TITLE element:

http://effbot.org/librarybook/sgmllib.htm

</F>

Sep 4 '08 #2
En Thu, 04 Sep 2008 18:53:33 -0300, Fredrik Lundh <fr*****@pythonware.com>
escribi�:
Rex wrote:
>I am writing a script that executes a bunch of queries through a form
on a website and reads the results. I am only interested in the
<titlesection in the <headof each web page. Currently, each page
the server returns is about 100kb and contains a bunch of HTML and
Javascript, all of which I don't need; I don't want to waste bandwidth
or consume too much of the server's resources. I just need the <title>
string.

you need to issue a GET request to get the HTML head section, which
almost always means that the server will build the entire page before
sending it to you (so it can set content-length etc).

you can save on network traffic by parsing the data as it arrives, and
stopping when you've gotten the TITLE element:

http://effbot.org/librarybook/sgmllib.htm
Another alternative would be to estimate the size it takes to reach to the
<titletag, and issue a GET with a Range header. The server will -very
likely- have to build the entire page, but won't attempt to send more
bytes than requested. (In case the requested size is not enough, one can
issue another GET asking for more data)

http://www.w3.org/Protocols/rfc2616/....html#sec14.35

--
Gabriel Genellina

Sep 5 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: ryan.mclean | last post by:
Hello everyone. Hope ya'll had a nice New Year. Anyway, my question is why won't this work? I must be doing something dumb . . . here is the...
7
by: Ignac Vucko | last post by:
Is writing a document *during* page load safe and supported for all 4th and 5th generation browsers? If not, can you show me a specific...
15
by: Frances | last post by:
<html> <head> <script> function doIt() { var list = document.forms.product; var selItem = list.options.value; ^^^^^^^ </head>
10
by: Brian W | last post by:
Hi All, I have a web user control that, among other things, provides Print this page, and Email this page functionality I have this script that...
6
by: Ken Varn | last post by:
I want to add my own custom <STYLE> section in the <HEAD> section of my ASP.NET page within a custom control. Can someone tell me how I can have my...
3
by: Sam Samnah | last post by:
Ok I have a bit of a problem with a Server control I am building. I need to write a client-side Javascript block between the open and closing Head...
3
by: PJ6 | last post by:
I want to render this text into the <HEAD> section of a page (and perhaps mute any existing title declaration, or alter it to be this instead) - ...
7
by: ericgla | last post by:
I am creating a web app using asp.net 2.0 where all pages are based a single master page. On some of the aspx pages I need to add javascript to...
3
by: phpmel | last post by:
Hi guys, I have yet another question. I am working with this html form that uses a template. <head> //is greyed out //some greyed out <style...
0
by: concettolabs | last post by:
In today's business world, businesses are increasingly turning to PowerApps to develop custom business applications. PowerApps is a powerful tool...
0
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
2
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...
0
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.