Downloading Large Files -- Feedback?

mwt

This code works fine to download files from the web and write them to
the local drive:

import urllib
f = urllib.urlopen("http://www.python.org/blah/blah.zip")
g = f.read()
file = open("blah.zip", "wb")
file.write(g)
file.close()

The process is pretty opaque, however. This downloads and writes the
file with no feedback whatsoever. You don't see how many bytes you've
downloaded already, etc. Especially the "g = f.read()" step just sits
there while downloading a large file, presenting a pregnant, blinking
cursor.

So my question is, what is a good way to go about coding this kind of
basic feedback? Also, since my testing has only *worked* with this
code, I'm curious if it will throw a visibile error if something goes
wrong with the download.

Thanks for any pointers. I'm busily Googling away.

Feb 12 '06 #1

Subscribe Post Reply

5480

Paul Rubin

"mwt" <mi*********@gmail.com> writes:

f = urllib.urlopen("http://www.python.org/blah/blah.zip")
g = f.read() # ... So my question is, what is a good way to go about coding this kind of
basic feedback? Also, since my testing has only *worked* with this
code, I'm curious if it will throw a visibile error if something goes
wrong with the download.

One obvious type of failure is running out of memory if the file is
too large. Python can be fairly hosed (VM thrashing etc.) by the time
that happens. Normally you shouldn't read a potentially big file of
unknown size all in one gulp like that. You'd instead say something
like

while True:
block = f.read(4096) # read a 4k block from the file
if len(block) == 0:
break # end of file
# do something with the block

Your "do something with..." could involve updating a status display
or something, saying how much has been read so far.

Feb 12 '06 #2

mwt

Pardon my ignorance here, but could you give me an example of what
would constitute file that is unreasonably or dangerously large? I'm
running python on a ubuntu box with about a gig of ram.

Also, do you know of any online examples of the kind of robust,
real-world code you're describing?

Thanks.

Feb 12 '06 #3

Alex Martelli

mwt <mi*********@gmail.com> wrote:
...

The process is pretty opaque, however. This downloads and writes the
file with no feedback whatsoever. You don't see how many bytes you've
downloaded already, etc. Especially the "g = f.read()" step just sits
there while downloading a large file, presenting a pregnant, blinking
cursor.

So my question is, what is a good way to go about coding this kind of
basic feedback? Also, since my testing has only *worked* with this

You may use urlretrieve instead of urlopen: urlretrieve accepts an
optional argument named reporthook, and calls it once in a while ("zero
or more times"...;-) with three arguments block_count (number of blocks
downloaded so far), block_size (size of each block in bytes), file_size
(total size of the file in bytes if known, otherwise -1). The
reporthook function (or other callable) may display a progress bar or
whatever you like best.

urlretrieve saves what's downloading to a disk file (you may specify a
filename, or let it pick an appropriate temporary filename) and returns
two things, the filename where it's downloaded the data and a
mimetools.Message instance whose headers have metadata (such as content
type information).

If that doesn't fit your needs well, you may study the sources of
urllib.py in your Python's library source directory, to see exactly what
it's doing and code your own modified version.
Alex

Alex

Feb 12 '06 #4

Steven D'Aprano

mwt wrote:

Pardon my ignorance here, but could you give me an example of what
would constitute file that is unreasonably or dangerously large? I'm
running python on a ubuntu box with about a gig of ram.
1GB of RAM plus (say) 2GB of virtual memory = 3GB in total.

Your OS and other running processes might be using
(say) 1GB. So 2GB might be the absolute limit.

Of course your mileage will vary, and in practice your
machine will probably start slowing down long before
that limit.

Also, do you know of any online examples of the kind of robust,
real-world code you're describing?

It isn't written in C, but get your hands on wget. It
is probably already on your Linux distro, but if not,
check it out here:

http://www.gnu.org/software/wget/wget.html

--
Steven.

Feb 13 '06 #5

mwt

Thanks for the explanation. That is exactly what I'm looking for. In a
way, it's kind of neat that urlopen just *does* it, no questions asked,
but I'd like to just know the basics, which is what it sounds like
urlretrieve covers. Excellent. Now, let's see what I can whip up with
that.

-- just bought "cookbook" and "nutshell" moments ago btw....

Feb 13 '06 #6

mwt

It isn't written in C, but get your hands on wget. It
is probably already on your Linux distro, but if not,
check it out here: http://www.gnu.org/software/wget/wget.html

Thanks. I'm checking it out.

Feb 13 '06 #7

Alex Martelli

mwt <mi*********@gmail.com> wrote:

Thanks for the explanation. That is exactly what I'm looking for. In a
way, it's kind of neat that urlopen just *does* it, no questions asked,
but I'd like to just know the basics, which is what it sounds like
urlretrieve covers. Excellent. Now, let's see what I can whip up with
that.
Yes, I entirely understand your mindset, because mine is so similar: I
prefer using higher-level "just works" abstractions, BUT also want to
understand what's going on "below"... "just in case"!-)
-- just bought "cookbook" and "nutshell" moments ago btw....

Nice coincidence, and thanks!-)
Alex

Feb 13 '06 #8

mwt

So, I just put this little chunk to the test, which does give you
feedback about what's going on with a file download. Interesting that
with urlretrieve, you don't do all the file opening and closing stuff.

Works fine:

------------------
import urllib

def download_file(filename, URL):
f = urllib.urlretrieve(URL, filename, reporthook=my_report_hook)

def my_report_hook(block_count, block_size, total_size):
total_kb = total_size/1024
print "%d kb of %d kb downloaded" %(block_count *
(block_size/1024),total_kb )

if __name__ == "__main__":
download_file("test_zip.zip","http://blah.com/blah.zip")

Feb 13 '06 #9

Alex Martelli

mwt <mi*********@gmail.com> wrote:
...

import urllib

def download_file(filename, URL):
f = urllib.urlretrieve(URL, filename, reporthook=my_report_hook)

If you wanted to DO anything with the results, you'd probably want to
assign to
f, m = ...
not just f. This way, f is the filename, m a message object useful for
metadata (e.g., content type).

Otherwise looks fine.
Alex

Feb 13 '06 #10

Fuzzyman

mwt wrote:

This code works fine to download files from the web and write them to
the local drive:

import urllib
f = urllib.urlopen("http://www.python.org/blah/blah.zip")
g = f.read()
file = open("blah.zip", "wb")
file.write(g)
file.close()

The process is pretty opaque, however. This downloads and writes the
file with no feedback whatsoever. You don't see how many bytes you've
downloaded already, etc. Especially the "g = f.read()" step just sits
there while downloading a large file, presenting a pregnant, blinking
cursor.

So my question is, what is a good way to go about coding this kind of
basic feedback? Also, since my testing has only *worked* with this
code, I'm curious if it will throw a visibile error if something goes
wrong with the download.

By the way, you can achieve what you want with urllib2, you may also
want to check out the pycurl library - which is a Python interface to a
very good C library called curl.

With urllib2 you don't *have* to read the whole thing in one go -

import urllib2
f = urllib2.urlopen("http://www.python.org/blah/blah.zip")
g = ''
while True:
a = f.read(1024*10)
if not a:
break
print 'Read another 10k'
g += a

file = open("blah.zip", "wb")
file.write(g)
file.close()

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml Thanks for any pointers. I'm busily Googling away.

Feb 13 '06 #11

Similar topics

Downloading Large Files

by: Michael C. Gates | last post by:

I am trying to download large files from an ASP page. The files are not in the web directory. When I do this, a 150 MB file takes about 20 seconds or so for the dialog box to show up. Then I...

ASP / Active Server Pages

Help with downloading large files

by: sales | last post by:

If you are having difficulty downloading large files, please check out www.Downloads4Dialups.com. Mention this newsgroup in the "Special Instructions" window on the Shipping Form, and receive a...

C / C++

Response.WriteFile very slow on large files

by: Buddy Ackerman | last post by:

I'm trying to write files directly to the client so that it forces the client to open the Save As dialog box rather than display the file. On some occasions the files are very large (100MB+). On...

ASP.NET

Error while downloading large files

by: Yahya | last post by:

Dear Sirs, When I try to download large documents I receive the following error "unable to read data from the transport connection". How can I solve this? Regards, Yahya

Visual Basic .NET

Having problem while downloading some files.

by: just.starting | last post by:

I am having problem while downloading files from an apache server2.0.53 with php4.3.10.While downloading some files it generally stops after downloading some specific amount and then stops...

PHP

Memory loss in download of large files fixed in .NET 1.1 SP1?

by: PeterT | last post by:

Article ID 823409 discusses a problem where the downloading of large files via ASP.NET can cause a large memory loss and results in the recycling of the ASPNET_WP worker process. It appears as...

ASP.NET

Large files uploading

by: Lad | last post by:

If a user will upload large files via FTP protocol, must the user have an FTP client on his computer or is it possible to use a similar way to "http form" comunication? Or is there another way(...

Python

Comparison issues using fstat and very large files in C++

by: Lars B | last post by:

Hey guys, I have written a C++ program that passes data from a file to an FPGA board and back again using software and DMA buffers. In my program I need to compare the size of a given file against...

C / C++

downloading multiple files from the server

by: manjitsarma | last post by:

I want to create a functionality for downloading multiple files from the server. User can check multiple files and then when he selects download button it should download all the selected files...

.NET Framework

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General