473,395 Members | 1,495 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Downloading Large Files -- Feedback?

mwt
This code works fine to download files from the web and write them to
the local drive:

import urllib
f = urllib.urlopen("http://www.python.org/blah/blah.zip")
g = f.read()
file = open("blah.zip", "wb")
file.write(g)
file.close()

The process is pretty opaque, however. This downloads and writes the
file with no feedback whatsoever. You don't see how many bytes you've
downloaded already, etc. Especially the "g = f.read()" step just sits
there while downloading a large file, presenting a pregnant, blinking
cursor.

So my question is, what is a good way to go about coding this kind of
basic feedback? Also, since my testing has only *worked* with this
code, I'm curious if it will throw a visibile error if something goes
wrong with the download.

Thanks for any pointers. I'm busily Googling away.

Feb 12 '06 #1
10 5480
"mwt" <mi*********@gmail.com> writes:
f = urllib.urlopen("http://www.python.org/blah/blah.zip")
g = f.read() # ... So my question is, what is a good way to go about coding this kind of
basic feedback? Also, since my testing has only *worked* with this
code, I'm curious if it will throw a visibile error if something goes
wrong with the download.


One obvious type of failure is running out of memory if the file is
too large. Python can be fairly hosed (VM thrashing etc.) by the time
that happens. Normally you shouldn't read a potentially big file of
unknown size all in one gulp like that. You'd instead say something
like

while True:
block = f.read(4096) # read a 4k block from the file
if len(block) == 0:
break # end of file
# do something with the block

Your "do something with..." could involve updating a status display
or something, saying how much has been read so far.
Feb 12 '06 #2
mwt
Pardon my ignorance here, but could you give me an example of what
would constitute file that is unreasonably or dangerously large? I'm
running python on a ubuntu box with about a gig of ram.

Also, do you know of any online examples of the kind of robust,
real-world code you're describing?

Thanks.

Feb 12 '06 #3
mwt <mi*********@gmail.com> wrote:
...
The process is pretty opaque, however. This downloads and writes the
file with no feedback whatsoever. You don't see how many bytes you've
downloaded already, etc. Especially the "g = f.read()" step just sits
there while downloading a large file, presenting a pregnant, blinking
cursor.

So my question is, what is a good way to go about coding this kind of
basic feedback? Also, since my testing has only *worked* with this


You may use urlretrieve instead of urlopen: urlretrieve accepts an
optional argument named reporthook, and calls it once in a while ("zero
or more times"...;-) with three arguments block_count (number of blocks
downloaded so far), block_size (size of each block in bytes), file_size
(total size of the file in bytes if known, otherwise -1). The
reporthook function (or other callable) may display a progress bar or
whatever you like best.

urlretrieve saves what's downloading to a disk file (you may specify a
filename, or let it pick an appropriate temporary filename) and returns
two things, the filename where it's downloaded the data and a
mimetools.Message instance whose headers have metadata (such as content
type information).

If that doesn't fit your needs well, you may study the sources of
urllib.py in your Python's library source directory, to see exactly what
it's doing and code your own modified version.
Alex

Alex
Feb 12 '06 #4
mwt wrote:
Pardon my ignorance here, but could you give me an example of what
would constitute file that is unreasonably or dangerously large? I'm
running python on a ubuntu box with about a gig of ram.
1GB of RAM plus (say) 2GB of virtual memory = 3GB in total.

Your OS and other running processes might be using
(say) 1GB. So 2GB might be the absolute limit.

Of course your mileage will vary, and in practice your
machine will probably start slowing down long before
that limit.

Also, do you know of any online examples of the kind of robust,
real-world code you're describing?


It isn't written in C, but get your hands on wget. It
is probably already on your Linux distro, but if not,
check it out here:

http://www.gnu.org/software/wget/wget.html

--
Steven.

Feb 13 '06 #5
mwt
Thanks for the explanation. That is exactly what I'm looking for. In a
way, it's kind of neat that urlopen just *does* it, no questions asked,
but I'd like to just know the basics, which is what it sounds like
urlretrieve covers. Excellent. Now, let's see what I can whip up with
that.

-- just bought "cookbook" and "nutshell" moments ago btw....

Feb 13 '06 #6
mwt
It isn't written in C, but get your hands on wget. It
is probably already on your Linux distro, but if not,
check it out here: http://www.gnu.org/software/wget/wget.html


Thanks. I'm checking it out.

Feb 13 '06 #7
mwt <mi*********@gmail.com> wrote:
Thanks for the explanation. That is exactly what I'm looking for. In a
way, it's kind of neat that urlopen just *does* it, no questions asked,
but I'd like to just know the basics, which is what it sounds like
urlretrieve covers. Excellent. Now, let's see what I can whip up with
that.
Yes, I entirely understand your mindset, because mine is so similar: I
prefer using higher-level "just works" abstractions, BUT also want to
understand what's going on "below"... "just in case"!-)
-- just bought "cookbook" and "nutshell" moments ago btw....


Nice coincidence, and thanks!-)
Alex
Feb 13 '06 #8
mwt
So, I just put this little chunk to the test, which does give you
feedback about what's going on with a file download. Interesting that
with urlretrieve, you don't do all the file opening and closing stuff.

Works fine:

------------------
import urllib

def download_file(filename, URL):
f = urllib.urlretrieve(URL, filename, reporthook=my_report_hook)

def my_report_hook(block_count, block_size, total_size):
total_kb = total_size/1024
print "%d kb of %d kb downloaded" %(block_count *
(block_size/1024),total_kb )

if __name__ == "__main__":
download_file("test_zip.zip","http://blah.com/blah.zip")

Feb 13 '06 #9
mwt <mi*********@gmail.com> wrote:
...
import urllib

def download_file(filename, URL):
f = urllib.urlretrieve(URL, filename, reporthook=my_report_hook)


If you wanted to DO anything with the results, you'd probably want to
assign to
f, m = ...
not just f. This way, f is the filename, m a message object useful for
metadata (e.g., content type).

Otherwise looks fine.
Alex
Feb 13 '06 #10

mwt wrote:
This code works fine to download files from the web and write them to
the local drive:

import urllib
f = urllib.urlopen("http://www.python.org/blah/blah.zip")
g = f.read()
file = open("blah.zip", "wb")
file.write(g)
file.close()

The process is pretty opaque, however. This downloads and writes the
file with no feedback whatsoever. You don't see how many bytes you've
downloaded already, etc. Especially the "g = f.read()" step just sits
there while downloading a large file, presenting a pregnant, blinking
cursor.

So my question is, what is a good way to go about coding this kind of
basic feedback? Also, since my testing has only *worked* with this
code, I'm curious if it will throw a visibile error if something goes
wrong with the download.

By the way, you can achieve what you want with urllib2, you may also
want to check out the pycurl library - which is a Python interface to a
very good C library called curl.

With urllib2 you don't *have* to read the whole thing in one go -

import urllib2
f = urllib2.urlopen("http://www.python.org/blah/blah.zip")
g = ''
while True:
a = f.read(1024*10)
if not a:
break
print 'Read another 10k'
g += a

file = open("blah.zip", "wb")
file.write(g)
file.close()

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml Thanks for any pointers. I'm busily Googling away.


Feb 13 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Michael C. Gates | last post by:
I am trying to download large files from an ASP page. The files are not in the web directory. When I do this, a 150 MB file takes about 20 seconds or so for the dialog box to show up. Then I...
0
by: sales | last post by:
If you are having difficulty downloading large files, please check out www.Downloads4Dialups.com. Mention this newsgroup in the "Special Instructions" window on the Shipping Form, and receive a...
3
by: Buddy Ackerman | last post by:
I'm trying to write files directly to the client so that it forces the client to open the Save As dialog box rather than display the file. On some occasions the files are very large (100MB+). On...
0
by: Yahya | last post by:
Dear Sirs, When I try to download large documents I receive the following error "unable to read data from the transport connection". How can I solve this? Regards, Yahya
0
by: just.starting | last post by:
I am having problem while downloading files from an apache server2.0.53 with php4.3.10.While downloading some files it generally stops after downloading some specific amount and then stops...
3
by: PeterT | last post by:
Article ID 823409 discusses a problem where the downloading of large files via ASP.NET can cause a large memory loss and results in the recycling of the ASPNET_WP worker process. It appears as...
7
by: Lad | last post by:
If a user will upload large files via FTP protocol, must the user have an FTP client on his computer or is it possible to use a similar way to "http form" comunication? Or is there another way(...
1
by: Lars B | last post by:
Hey guys, I have written a C++ program that passes data from a file to an FPGA board and back again using software and DMA buffers. In my program I need to compare the size of a given file against...
6
by: manjitsarma | last post by:
I want to create a functionality for downloading multiple files from the server. User can check multiple files and then when he selects download button it should download all the selected files...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.