473,402 Members | 2,053 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,402 software developers and data experts.

Progress Bar with urllib2

I'm trying to write a python script to download data (well, files) from a HTTP server (well, a PHP script spitting them out, at least).
The file data is just the returned data from the request (the server script echoes the file and then dies).

I call the page using urllib2, like so:

satelliteRequest = urllib2.Request(satelliteServer + "?command=download&filepath="+filepath)
satelliteRequestData = {"username":satelliteUsername, "password":satellitePassword}
satelliteRequest.add_data(urllib.urlencode(satelli teRequestData))
satelliteOpener = urllib2.build_opener()
satelliteOpener.addheaders = [('User-agent', userAgent)]

Now, if I want to download the file all at once, I just do

satelliteData = satelliteOpener.open(satelliteRequest).read()

But some of these files are going to be really, really big, and I want to get a progress bar going.
I've tried doing a while loop like this:

chunkSize = 10240
while 1:
dataBuffer = satelliteOpener.open(satelliteRequest).read(chunkS ize)
data += dataBuffer
if not dataBuffer:
break

But that just gives me the first 10240 bytes again and again. Is there something I'm missing here?
It might even be I'm calling urllib2 the wrong way (does it download when you read() or when you create the Request?)

All help is appreciated, I'm sort of stuck here.

Andrew Godwin
Jul 19 '05 #1
2 4108
> But some of these files are going to be really, really big, and I want
to get a progress bar going. I've tried doing a while loop like this:


Here is a little snippet that I use occassionally:

------------------ geturl.py ---------------------------
import os
import sys
import urllib

def _reporthook(numblocks, blocksize, filesize, url=None):
#print "reporthook(%s, %s, %s)" % (numblocks, blocksize, filesize)
base = os.path.basename(url)
#XXX Should handle possible filesize=-1.
try:
percent = min((numblocks*blocksize*100)/filesize, 100)
except:
percent = 100
if numblocks != 0:
sys.stdout.write("\b"*70)
sys.stdout.write("%-66s%3d%%" % (base, percent))

def geturl(url, dst):
print "get url '%s' to '%s'" % (url, dst)
if sys.stdout.isatty():
urllib.urlretrieve(url, dst,
lambda nb, bs, fs, url=url: _reporthook(nb,bs,fs,url))
sys.stdout.write('\n')
else:
urllib.urlretrieve(url, dst)

if __name__ == "__main__":
if len(sys.argv) == 2:
url = sys.argv[1]
base = url[url.rindex('/')+1:]
geturl(url, base)
elif len(sys.argv) == 3:
url, base = sys.argv[1:]
geturl(url, base)
else:
print "Usage: geturl.py URL [DEST]"
sys.exit(1)
--------------- end of geturl.py ---------------------------
Save that as geturl.py and try running:

python geturl.py http://example.com/downloads/bigfile.zip
Cheers,
Trent

--
Trent Mick
Tr****@ActiveState.com
Jul 19 '05 #2
On Tue, 26 Apr 2005 20:28:43 GMT, Andrew Godwin wrote:
I'm trying to write a python script to download data (well, files) from a HTTP server (well, a PHP script spitting them out, at least).
The file data is just the returned data from the request (the server script echoes the file and then dies).

I call the page using urllib2, like so:

satelliteRequest = urllib2.Request(satelliteServer + "?command=download&filepath="+filepath)
satelliteRequestData = {"username":satelliteUsername, "password":satellitePassword}
satelliteRequest.add_data(urllib.urlencode(satelli teRequestData))
satelliteOpener = urllib2.build_opener()
satelliteOpener.addheaders = [('User-agent', userAgent)]

Now, if I want to download the file all at once, I just do

satelliteData = satelliteOpener.open(satelliteRequest).read()

But some of these files are going to be really, really big, and I want to get a progress bar going.
I've tried doing a while loop like this:

chunkSize = 10240
while 1:
dataBuffer = satelliteOpener.open(satelliteRequest).read(chunkS ize)
data += dataBuffer
if not dataBuffer:
break

But that just gives me the first 10240 bytes again and again. Is there something I'm missing here?
It might even be I'm calling urllib2 the wrong way (does it download when you read() or when you create the Request?)

All help is appreciated, I'm sort of stuck here.

Andrew Godwin


Each time through the loop you re-open the url and thus start from the
beginning. You need to separate the opening from the reading.

HTH,
John
Jul 19 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: bmiras | last post by:
I've got a problem using urllib2 to get a web page. I'm going through a proxy using user/password authentification and i'm trying to get a page asking for a HTTP authentification. And I'm using...
1
by: Matthew Wilson | last post by:
I am writing a script to check on my router's external IP address. My ISP refreshes my IP very often and I use dyndns for the hostname for my computer. My Netgear mr814 router has a webserver that...
2
by: John F Dutcher | last post by:
Can anyone comment on why the code shown in the Python error is in some way incorrect...or is there a problem with Python on my hoster's site ?? The highlites don't seem to show here...but line...
5
by: Pascal | last post by:
Hello, I want to acces my OWA (Outlook Web Acces - http Exchange interface) server with urllib2 but, when I try, I've always a 401 http error. Can someone help me (and us)? Thanks. ...
0
by: jacob c. | last post by:
When I request a URL using urllib2, it appears that urllib2 always makes the request using HTTP 1.0, and not HTTP 1.1. I'm trying to use the "If-None-Match"/"ETag" HTTP headers to conserve...
1
by: Ray Slakinski | last post by:
Hello, I have defined a function to set an opener for urllib2, this opener defines any proxy and http authentication that is required. If the proxy has authencation itself and requests an...
0
by: Ritesh Raj Sarraf | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, In urllib.urlretrieve I can use reporthook to implement a progress bar. But in urllib2.urlopen I can't. I have to use urllib2.urlopen...
3
by: Ritesh Raj Sarraf | last post by:
Hi, I have a small application, written in Python, that uses threads. The application uses function foo() to download files from the web. As it reads data from the web server, it runs a progress...
1
by: Alessandro Fachin | last post by:
I write this simply code that should give me the access to private page with htaccess using a proxy, i don't known because it's wrong... import urllib,urllib2 #input url...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.