Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old September 4th, 2008, 10:35 PM
Rex
Guest
 
Posts: n/a
Default Is it possible to download only the <head> of a web page?

I am writing a script that executes a bunch of queries through a form
on a website and reads the results. I am only interested in the
<titlesection in the <headof each web page. Currently, each page
the server returns is about 100kb and contains a bunch of HTML and
Javascript, all of which I don't need; I don't want to waste bandwidth
or consume too much of the server's resources. I just need the <title>
string.

Is there any way to download less than the entire web page?
  #2  
Old September 4th, 2008, 10:55 PM
Fredrik Lundh
Guest
 
Posts: n/a
Default Re: Is it possible to download only the <head> of a web page?

Rex wrote:
Quote:
I am writing a script that executes a bunch of queries through a form
on a website and reads the results. I am only interested in the
<titlesection in the <headof each web page. Currently, each page
the server returns is about 100kb and contains a bunch of HTML and
Javascript, all of which I don't need; I don't want to waste bandwidth
or consume too much of the server's resources. I just need the <title>
string.
you need to issue a GET request to get the HTML head section, which
almost always means that the server will build the entire page before
sending it to you (so it can set content-length etc).

you can save on network traffic by parsing the data as it arrives, and
stopping when you've gotten the TITLE element:

http://effbot.org/librarybook/sgmllib.htm

</F>

  #3  
Old September 5th, 2008, 05:25 AM
Gabriel Genellina
Guest
 
Posts: n/a
Default Re: Is it possible to download only the <head> of a web page?

En Thu, 04 Sep 2008 18:53:33 -0300, Fredrik Lundh <fredrik@pythonware.com>
escribi�:
Quote:
Rex wrote:
>
Quote:
>I am writing a script that executes a bunch of queries through a form
>on a website and reads the results. I am only interested in the
><titlesection in the <headof each web page. Currently, each page
>the server returns is about 100kb and contains a bunch of HTML and
>Javascript, all of which I don't need; I don't want to waste bandwidth
>or consume too much of the server's resources. I just need the <title>
>string.
>
you need to issue a GET request to get the HTML head section, which
almost always means that the server will build the entire page before
sending it to you (so it can set content-length etc).
>
you can save on network traffic by parsing the data as it arrives, and
stopping when you've gotten the TITLE element:
>
http://effbot.org/librarybook/sgmllib.htm
Another alternative would be to estimate the size it takes to reach to the
<titletag, and issue a GET with a Range header. The server will -very
likely- have to build the entire page, but won't attempt to send more
bytes than requested. (In case the requested size is not enough, one can
issue another GET asking for more data)

http://www.w3.org/Protocols/rfc2616/....html#sec14.35

--
Gabriel Genellina

 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles