473,321 Members | 1,669 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,321 software developers and data experts.

urllib (and urllib2) read all data from page on open()?

The entire page is downloaded immediately whether you want it to or not when
you do an http request using urllib. This seems slightly broken to me.

Is there anyway to turn this behaviour off and have the objects read method
actually read data from the socket when you ask it to?

Jul 18 '05 #1
2 2343
Certianly under urllib2 - handle.read(100) will read the next 100 bytes
(up to) from the handle. Which is the same beahviour as the read method
for files.....

Regards,

Fuzzy
http://www.voidspace.org.uk/python/index.shtml

Jul 18 '05 #2

Alex Stapleton wrote:
Except wouldn't it of already read the entire file when it opened, or does it occour on the first read()?
Don't know, sorry. Try looking at the source code - it should be
reasonably obvious.
Also will the data returned from
handle.read(100) be raw HTTP? In which case what if the encoding is chunked or gzipped?


No - you get html - with the http stuff already handled (at least to
the best of my knowledge).

Regards,
Fuzzy
http://www.voidspace.org.uk/python/index.shtml

Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Volker M. | last post by:
Hey, I want to open a list of URLs with Pythons urllib and the fuction open(URL) automatically. It is important that the program open ONLY normal http-sites and no https-sites with...
4
by: Monty | last post by:
Hello, Sorry for this maybe stupid newbie question but I didn't find any answer in all my readings about python: With urllib, using urlretrieve, it's possible to get the number of blocks...
0
by: Pieter Edelman | last post by:
Hi all, I'm trying to submit some data using a POST request to a HTTP server with BASIC authentication with python, but I can't get it to work. Since it's driving me completely nuts, so here's...
11
by: Johnny Lee | last post by:
Hi, I was using urllib to grab urls from web. here is the work flow of my program: 1. Get base url and max number of urls from user 2. Call filter to validate the base url 3. Read the source...
0
by: Ali.Sabil | last post by:
hello all, I just maybe hit a bug in both urllib and urllib2, actually urllib doesn't support proxy authentication, and if you setup the http_proxy env var to...
5
by: John Nagle | last post by:
I thought I had all the timeout problems with urllib worked around, but no. socket.setdefaulttimeout is useful, but not always effective. I'm setting that to 15 seconds. If the host end won't...
5
by: Adrian Smith | last post by:
I'm trying to use urllib2 to download a page (I'd rather use urllib, but I need to change the User-Agent header to look like a browser or G**gle won't send it to me, the big meanies). The following...
2
by: Jive Dadson | last post by:
Hey folks! There are various web pages that I would like to read using urllib, but they require login with passwords. Can anyone tell me how to find out how to do that, both in general and...
4
by: Mike Driscoll | last post by:
Hi, I have been using the following code for over a year in one of my programs: f = urllib2.urlopen('https://www.companywebsite.com/somestring') It worked great until the middle of the...
0
by: johnpollard | last post by:
For some reason this script isn't working and I dont know what it is. I believe the problem lies in the following lines of code since the script works with a different website and username/password...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.