473,836 Members | 1,464 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Any python scripts to do parallel downloading?

I want to find a multithreaded downloading lib in python,
can someone recommend one for me, please?
Thanks~

Jan 31 '07 #1
9 2472
On Jan 31, 5:23 pm, "Frank Potter" <could....@gmai l.comwrote:
I want to find a multithreaded downloading lib in python,
can someone recommend one for me, please?
Thanks~
Why do you want to use threads for that? Twisted is the
obvious solution for your problem, but you may use any
asynchronous framework, as for instance the good ol
Tkinter:

"""
Example of asynchronous programming with Tkinter. Download 10 times
the same URL.
"""

import sys, urllib, itertools, Tkinter

URL = 'http://docs.python.org/dev/lib/module-urllib.html'

class Downloader(obje ct):
chunk = 1024

def __init__(self, urls, frame):
self.urls = urls
self.downloads = [self.download(i ) for i in range(len(urls) )]
self.tkvars = []
self.tklabels = []
for url in urls:
var = Tkinter.StringV ar(frame)
lbl = Tkinter.Label(f rame, textvar=var)
lbl.pack()
self.tkvars.app end(var)
self.tklabels.a ppend(lbl)
frame.pack()

def download(self, i):
src = urllib.urlopen( self.urls[i])
size = int(src.info()['Content-Length'])
for block in itertools.count ():
chunk = src.read(self.c hunk)
if not chunk: break
percent = block * self.chunk * 100/size
msg = '%s: downloaded %2d%% of %s K' % (
self.urls[i], percent, size/1024)
self.tkvars[i].set(msg)
yield None
self.tkvars[i].set('Downloade d %s' % self.urls[i])

if __name__ == '__main__':
root = Tkinter.Tk()
frame = Tkinter.Frame(r oot)
downloader = Downloader([url] * 10, frame)
def next(cycle):
try:
cycle.next().ne xt()
except StopIteration:
pass
root.after(50, next, cycle)
root.after(0, next, itertools.cycle (downloader.dow nloads))
root.mainloop()
Michele Simionato

Jan 31 '07 #2
Michele Simionato wrote:
On Jan 31, 5:23 pm, "Frank Potter" <could....@gmai l.comwrote:
>I want to find a multithreaded downloading lib in python,
can someone recommend one for me, please?
Thanks~

Why do you want to use threads for that? Twisted is the
obvious solution for your problem, but you may use any
asynchronous framework, as for instance the good ol
Well, since it will be io based, why not use threads? They are easy to
use and it would do the job just fine. Then leverage some other
technology on top of that.

You could go as far as using wget via os.system() in a thread, if the
app is simple enough.

def getSite(site):
os.system('wget %s',site)

threadList =[]
for site in websiteList:
threadList.appe nd(threading.Th read( target=getSite, args=(site,)))

for thread in threadList:
thread.start()

for thread in threadList:
thread.join()
Tkinter:

"""
Example of asynchronous programming with Tkinter. Download 10 times
the same URL.
"""

import sys, urllib, itertools, Tkinter

URL = 'http://docs.python.org/dev/lib/module-urllib.html'

class Downloader(obje ct):
chunk = 1024

def __init__(self, urls, frame):
self.urls = urls
self.downloads = [self.download(i ) for i in range(len(urls) )]
self.tkvars = []
self.tklabels = []
for url in urls:
var = Tkinter.StringV ar(frame)
lbl = Tkinter.Label(f rame, textvar=var)
lbl.pack()
self.tkvars.app end(var)
self.tklabels.a ppend(lbl)
frame.pack()

def download(self, i):
src = urllib.urlopen( self.urls[i])
size = int(src.info()['Content-Length'])
for block in itertools.count ():
chunk = src.read(self.c hunk)
if not chunk: break
percent = block * self.chunk * 100/size
msg = '%s: downloaded %2d%% of %s K' % (
self.urls[i], percent, size/1024)
self.tkvars[i].set(msg)
yield None
self.tkvars[i].set('Downloade d %s' % self.urls[i])

if __name__ == '__main__':
root = Tkinter.Tk()
frame = Tkinter.Frame(r oot)
downloader = Downloader([url] * 10, frame)
def next(cycle):
try:
cycle.next().ne xt()
except StopIteration:
pass
root.after(50, next, cycle)
root.after(0, next, itertools.cycle (downloader.dow nloads))
root.mainloop()
Michele Simionato


--

Carl J. Van Arsdall
cv*********@mvi sta.com
Build and Release
MontaVista Software

Jan 31 '07 #3
Michele Simionato wrote:
On Jan 31, 5:23 pm, "Frank Potter" <could....@gmai l.comwrote:
I want to find a multithreaded downloading lib in python,
can someone recommend one for me, please?
Thanks~

Why do you want to use threads for that? Twisted is the
obvious solution for your problem,
Overkill? Just to download a few web pages? You've got to be
kidding.
but you may use any
asynchronous framework, as for instance the good ol
Tkinter:
Well, of all the things you can use threads for, this is probably the
simplest, so I don't see any reason to prefer asynchronous method
unless you're used to it. One Queue for dispatching should be enough
to synchronize everything; maybe a Queue or simple lock at end as well
depending on the need.

The OP might not even care whether it's threaded or asynchronous.
Carl Banks

Jan 31 '07 #4
On Jan 31, 9:24 pm, "Carl Banks" <pavlovevide... @gmail.comwrote :
Well, of all the things you can use threads for, this is probably the
simplest, so I don't see any reason to prefer asynchronous method
unless you're used to it.
Well, actually there is a reason why I prefer the asynchronous
approach even for the simplest things:
I can stop my program at any time with CTRL-C. When developing a
threaded program, or I implement a
mechanism for stopping the threads (which should be safe enough to
survive the bugs introduced
while I develop, BTW), or I have to resort to kill -9, and I *hate*
that. Especially since kill -9 does not
honor try .. finally statements.
In short, I prefer to avoid threads, *especially* for the simplest
things.
I use threads only when I am forced to, typically when I am using a
multithreaded framework
interacting with a database.

Michele Simionato

Feb 1 '07 #5
On Jan 31, 8:31 pm, "Carl J. Van Arsdall" <cvanarsd...@mv ista.com>
wrote:
>
Well, since it will be io based, why not use threads? They are easy to
use and it would do the job just fine. Then leverage some other
technology on top of that.

You could go as far as using wget via os.system() in a thread, if the
app is simple enough.
Calling os.system in a thread look really perverse to me, you would
loose CTRL-C without any benefit.
Why not to use subprocess.Pope n instead?

I am unhappy with the current situation in Python. Whereas for most
things Python is such that the simplest
things look simple, this is not the case for threads. Unfortunately we
have a threading module in the
standard library, but not a "Twisted for pedestrian" module, so people
overlook the simplest solution
in favor of the complex one.
Another thing I miss is a facility to run an iterator in the Tkinter
mainloop: since Tkinter is not thread-safe,
writing a multiple-download progress bar in Tkinter using threads is
definitely less obvious than running
an iterator in the main loop, as I discovered the hard way. Writing a
facility to run iterators in Twisted
is a three-liner, but it is not already there, nor standard :-(

Michele Simionato

Feb 1 '07 #6
On Feb 1, 1:43 pm, Jean-Paul Calderone <exar...@divmod .comwrote:
On 31 Jan 2007 22:02:36 -0800, Michele Simionato <michele.simion ...@gmail.comwr ote:
Another thing I miss is a facility to run an iterator in the Tkinter
mainloop: since Tkinter is not thread-safe,
writing a multiple-download progress bar in Tkinter using threads is
definitely less obvious than running
an iterator in the main loop, as I discovered the hard way. Writing a
facility to run iterators in Twisted
is a three-liner, but it is not already there, nor standard :-(

Have you seen the recently introduced twisted.interne t.task.coiterat e()?
It sounds like it might be what you're after.
Ops! There is a misprint here, I meant "writing a facility to run
iterators in TKINTER",
not in Twisted. Twisted has already everything, even too much. I would
like to have
a better support for asynchronous programming in the standard library,
for people
not needing the full power of Twisted. I also like to keep my
dependencies at a minimum.

Michele Simionato

Feb 1 '07 #7
On Jan 31, 3:37 pm, Jean-Paul Calderone <exar...@divmod .comwrote:
On 31 Jan 2007 12:24:21 -0800, Carl Banks <pavlovevide... @gmail.comwrote :
Michele Simionato wrote:
On Jan 31, 5:23 pm, "Frank Potter" <could....@gmai l.comwrote:
I want to find a multithreaded downloading lib in python,
can someone recommend one for me, please?
Thanks~
Why do you want to use threads for that? Twisted is the
obvious solution for your problem,
Overkill? Just to download a few web pages? You've got to be
kidding.

Better "overkill" (whatever that is) than wasting time re-implementing
the same boring thing over and over for no reason.
"I need to download some web pages in parallel."

"Here's tremendously large and complex framework. Download, install,
and learn this large and complex framework. Then you can write your
very simple throwaway script with ease."

Is the twisted solution even shorter? Doing this with threads I'm
thinking would be on the order of 20 lines of code.
Carl Banks

Feb 1 '07 #8
On Feb 1, 9:20 am, Jean-Paul Calderone <exar...@divmod .comwrote:
On 1 Feb 2007 06:14:40 -0800, Carl Banks <pavlovevide... @gmail.comwrote :
On Jan 31, 3:37 pm, Jean-Paul Calderone <exar...@divmod .comwrote:
On 31 Jan 2007 12:24:21 -0800, Carl Banks <pavlovevide... @gmail.comwrote :
Michele Simionato wrote:
On Jan 31, 5:23 pm, "Frank Potter" <could....@gmai l.comwrote:
I want to find a multithreaded downloading lib in python,
can someone recommend one for me, please?
Thanks~
Why do you want to use threads for that? Twisted is the
obvious solution for your problem,
Overkill? Just to download a few web pages? You've got to be
kidding.
Better "overkill" (whatever that is) than wasting time re-implementing
the same boring thing over and over for no reason.
"I need to download some web pages in parallel."
"Here's tremendously large and complex framework. Download, install,
and learn this large and complex framework. Then you can write your
very simple throwaway script with ease."
Is the twisted solution even shorter? Doing this with threads I'm
thinking would be on the order of 20 lines of code.

The /already written/ solution I linked to in my original response was five
lines shorter than that.
And I suppose "re-implementing the same boring thing over and over" is
ok if it's 15 lines but is too much to bear if it's 20 (irrespective
of the additional large framework the former requires).
Carl Banks

Feb 1 '07 #9
On Feb 1, 12:40 am, "Michele Simionato" <michele.simion ...@gmail.com>
wrote:
On Jan 31, 9:24 pm, "Carl Banks" <pavlovevide... @gmail.comwrote :
Well, of all the things you can use threads for, this is probably the
simplest, so I don't see any reason to prefer asynchronous method
unless you're used to it.

Well, actually there is a reason why I prefer the asynchronous
approach even for the simplest things:
I can stop my program at any time with CTRL-C. When developing a
threaded program, or I implement a
mechanism for stopping the threads (which should be safe enough to
survive the bugs introduced
while I develop, BTW), or I have to resort to kill -9, and I *hate*
that. Especially since kill -9 does not
honor try .. finally statements.
In short, I prefer to avoid threads, *especially* for the simplest
things.
I use threads only when I am forced to, typically when I am using a
multithreaded framework
interacting with a database.
Fair enough.

I'm just saying that just because something is good for funded,
important, enterprise tasks, it doesn't mean very simple stuff
automatically has to use it as well. For Pete's sake, even Perl works
for simple scripts.
Carl Banks

Feb 1 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1957
by: Shufen | last post by:
Hi, Can someone please advice me on the differences between this: #!/usr/bin/env python and these: #!/usr/local/bin/python or #!/usr/bin/python? I know the first one locates the Python interpreter according to my system searching path setting. But I read about an article that mentioned that if we are running the scripts as a CGI from a Web server then we should use the latter one. Because I'm using python
8
2034
by: Jan Danielsson | last post by:
Hello all, How do I make a python script actually a _python_ in unix:ish environments? I know about adding: #!/bin/sh ..as the first row in a shell script, but when I installed python on a NetBSD system, I didn't get a "python" executable; only a "python2.4"
0
1198
by: marco | last post by:
hi folks, i can not run any python scripts with dos lineendings under cygwin's python. if i run such a scripts i get stupid syntax error messages from python. what can i do to run these scripts without changing the lineending of these scripts. regards marco -----BEGIN PGP SIGNATURE-----
20
18891
by: Ramdas | last post by:
How do I add users using Python scripts on a Linux machine? Someone has a script?
6
3953
by: Ishpeck | last post by:
I'm using Python to automate testing software for my company. I wanted the computers in my testing lab to automatically fetch the latest version of the python scripts from a CVS repository and then ask a local database server which of the scripts to run. I built the following: #!/bin/bash # Batcher will run the specified scripts.
24
2860
by: Mark | last post by:
Hi, I'm new to python and looking for a better idiom to use for the manner I have been organising my python scripts. I've googled all over the place about this but found absolutely nothing. I'm a linux/unix command line guy quite experienced in shell scripts etc. I have a heap of command line utility scripts which I run directly. What is the best way to create python command line scripts but exploit the (loadonly) speed-up benefit of...
0
954
by: Toon Knapen | last post by:
Dear all, I'm looking into launching python in parallel using MPI. There are many projects already doing this but I would like to understand how this can be done in a portable way. For instance, is it possible to launch myscript.py (which calls MPI_Init through an extension module) like: mpirun -np 2 /path/to/python myscript.py
2
3113
by: joe jacob | last post by:
I need to configure apache to run python scripts. I followed the steps mentioned in this site (http://www.thesitewizard.com/archive/ addcgitoapache.shtml). But I am not able to run python scripts from Firefox, I got a forbidden error "you do not have permission to access the file in the server" when I try to run the script form Firefox browser. Somebody please help me.
3
4945
by: joe jacob | last post by:
I configured apache to execute python scripts using mod_python handler. I followed below mentioned steps to configure apache. 1. In http.conf I added <Directory "D:/softwares/Apache2.2/htdocs"> AddHandler mod_python .py PythonHandler mptest PythonDebug On </Directory>
2
2076
by: Dale | last post by:
I am using a simple python webserver (see code below) to serve up python scripts located in my cgi-bin directory. import BaseHTTPServer import CGIHTTPServer class Handler(CGIHTTPServer.CGIHTTPRequestHandler): cgi_directories = httpd = BaseHTTPServer.HTTPServer(('',8000), Handler) httpd.serve_forever()
0
9825
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
1
10601
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10260
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9388
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6981
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5829
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4460
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4023
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3116
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.