This code works fine to download files from the web and write them to
the local drive:
import urllib
f = urllib.urlopen("http://www.python.org/blah/blah.zip")
g = f.read()
file = open("blah.zip", "wb")
file.write(g)
file.close()
The process is pretty opaque, however. This downloads and writes the
file with no feedback whatsoever. You don't see how many bytes you've
downloaded already, etc. Especially the "g = f.read()" step just sits
there while downloading a large file, presenting a pregnant, blinking
cursor.
So my question is, what is a good way to go about coding this kind of
basic feedback? Also, since my testing has only *worked* with this
code, I'm curious if it will throw a visibile error if something goes
wrong with the download.
Thanks for any pointers. I'm busily Googling away. 10 5480
"mwt" <mi*********@gmail.com> writes: f = urllib.urlopen("http://www.python.org/blah/blah.zip") g = f.read() # ...
So my question is, what is a good way to go about coding this kind of basic feedback? Also, since my testing has only *worked* with this code, I'm curious if it will throw a visibile error if something goes wrong with the download.
One obvious type of failure is running out of memory if the file is
too large. Python can be fairly hosed (VM thrashing etc.) by the time
that happens. Normally you shouldn't read a potentially big file of
unknown size all in one gulp like that. You'd instead say something
like
while True:
block = f.read(4096) # read a 4k block from the file
if len(block) == 0:
break # end of file
# do something with the block
Your "do something with..." could involve updating a status display
or something, saying how much has been read so far.
Pardon my ignorance here, but could you give me an example of what
would constitute file that is unreasonably or dangerously large? I'm
running python on a ubuntu box with about a gig of ram.
Also, do you know of any online examples of the kind of robust,
real-world code you're describing?
Thanks.
mwt <mi*********@gmail.com> wrote:
... The process is pretty opaque, however. This downloads and writes the file with no feedback whatsoever. You don't see how many bytes you've downloaded already, etc. Especially the "g = f.read()" step just sits there while downloading a large file, presenting a pregnant, blinking cursor.
So my question is, what is a good way to go about coding this kind of basic feedback? Also, since my testing has only *worked* with this
You may use urlretrieve instead of urlopen: urlretrieve accepts an
optional argument named reporthook, and calls it once in a while ("zero
or more times"...;-) with three arguments block_count (number of blocks
downloaded so far), block_size (size of each block in bytes), file_size
(total size of the file in bytes if known, otherwise -1). The
reporthook function (or other callable) may display a progress bar or
whatever you like best.
urlretrieve saves what's downloading to a disk file (you may specify a
filename, or let it pick an appropriate temporary filename) and returns
two things, the filename where it's downloaded the data and a
mimetools.Message instance whose headers have metadata (such as content
type information).
If that doesn't fit your needs well, you may study the sources of
urllib.py in your Python's library source directory, to see exactly what
it's doing and code your own modified version.
Alex
Alex
mwt wrote: Pardon my ignorance here, but could you give me an example of what would constitute file that is unreasonably or dangerously large? I'm running python on a ubuntu box with about a gig of ram.
1GB of RAM plus (say) 2GB of virtual memory = 3GB in total.
Your OS and other running processes might be using
(say) 1GB. So 2GB might be the absolute limit.
Of course your mileage will vary, and in practice your
machine will probably start slowing down long before
that limit.
Also, do you know of any online examples of the kind of robust, real-world code you're describing?
It isn't written in C, but get your hands on wget. It
is probably already on your Linux distro, but if not,
check it out here: http://www.gnu.org/software/wget/wget.html
--
Steven.
Thanks for the explanation. That is exactly what I'm looking for. In a
way, it's kind of neat that urlopen just *does* it, no questions asked,
but I'd like to just know the basics, which is what it sounds like
urlretrieve covers. Excellent. Now, let's see what I can whip up with
that.
-- just bought "cookbook" and "nutshell" moments ago btw....
mwt <mi*********@gmail.com> wrote: Thanks for the explanation. That is exactly what I'm looking for. In a way, it's kind of neat that urlopen just *does* it, no questions asked, but I'd like to just know the basics, which is what it sounds like urlretrieve covers. Excellent. Now, let's see what I can whip up with that.
Yes, I entirely understand your mindset, because mine is so similar: I
prefer using higher-level "just works" abstractions, BUT also want to
understand what's going on "below"... "just in case"!-)
-- just bought "cookbook" and "nutshell" moments ago btw....
Nice coincidence, and thanks!-)
Alex
So, I just put this little chunk to the test, which does give you
feedback about what's going on with a file download. Interesting that
with urlretrieve, you don't do all the file opening and closing stuff.
Works fine:
------------------
import urllib
def download_file(filename, URL):
f = urllib.urlretrieve(URL, filename, reporthook=my_report_hook)
def my_report_hook(block_count, block_size, total_size):
total_kb = total_size/1024
print "%d kb of %d kb downloaded" %(block_count *
(block_size/1024),total_kb )
if __name__ == "__main__":
download_file("test_zip.zip","http://blah.com/blah.zip")
mwt <mi*********@gmail.com> wrote:
... import urllib
def download_file(filename, URL): f = urllib.urlretrieve(URL, filename, reporthook=my_report_hook)
If you wanted to DO anything with the results, you'd probably want to
assign to
f, m = ...
not just f. This way, f is the filename, m a message object useful for
metadata (e.g., content type).
Otherwise looks fine.
Alex
mwt wrote: This code works fine to download files from the web and write them to the local drive:
import urllib f = urllib.urlopen("http://www.python.org/blah/blah.zip") g = f.read() file = open("blah.zip", "wb") file.write(g) file.close()
The process is pretty opaque, however. This downloads and writes the file with no feedback whatsoever. You don't see how many bytes you've downloaded already, etc. Especially the "g = f.read()" step just sits there while downloading a large file, presenting a pregnant, blinking cursor.
So my question is, what is a good way to go about coding this kind of basic feedback? Also, since my testing has only *worked* with this code, I'm curious if it will throw a visibile error if something goes wrong with the download.
By the way, you can achieve what you want with urllib2, you may also
want to check out the pycurl library - which is a Python interface to a
very good C library called curl.
With urllib2 you don't *have* to read the whole thing in one go -
import urllib2
f = urllib2.urlopen("http://www.python.org/blah/blah.zip")
g = ''
while True:
a = f.read(1024*10)
if not a:
break
print 'Read another 10k'
g += a
file = open("blah.zip", "wb")
file.write(g)
file.close()
All the best,
Fuzzyman http://www.voidspace.org.uk/python/index.shtml Thanks for any pointers. I'm busily Googling away. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Michael C. Gates |
last post by:
I am trying to download large files from an ASP page. The
files are not in the web directory. When I do this, a 150
MB file takes about 20 seconds or so for the dialog box to
show up. Then I...
|
by: sales |
last post by:
If you are having difficulty downloading large files, please check out
www.Downloads4Dialups.com. Mention this newsgroup in the "Special
Instructions" window on the Shipping Form, and receive a...
|
by: Buddy Ackerman |
last post by:
I'm trying to write files directly to the client so that it forces the client to open the Save As dialog box rather than
display the file. On some occasions the files are very large (100MB+). On...
|
by: Yahya |
last post by:
Dear Sirs,
When I try to download large documents I receive the following error
"unable to read data from the transport connection". How can I solve this?
Regards,
Yahya
|
by: just.starting |
last post by:
I am having problem while downloading files from an apache server2.0.53
with php4.3.10.While downloading some files it generally stops after
downloading some specific amount and then stops...
|
by: PeterT |
last post by:
Article ID 823409 discusses a problem where the downloading of large files
via ASP.NET can cause a large memory loss and results in the recycling of the
ASPNET_WP worker process.
It appears as...
|
by: Lad |
last post by:
If a user will upload large files via FTP protocol, must the user
have an FTP client on his computer or is it possible to use a similar
way to "http form" comunication?
Or is there another way(...
|
by: Lars B |
last post by:
Hey guys,
I have written a C++ program that passes data from a file to an FPGA board and back again using software and DMA buffers.
In my program I need to compare the size of a given file against...
|
by: manjitsarma |
last post by:
I want to create a functionality for downloading multiple files from the server. User can check multiple files and then when he selects download button it should download all the selected files...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
| |