By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,449 Members | 921 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,449 IT Pros & Developers. It's quick & easy.

Parallelization with Python: which, where, how?

P: n/a
Dear NG,

I have a (pretty much) "emberassingly parallel" problem and look for the
right toolbox to parallelize it over a cluster of homogenous linux
workstations. I don't need automatic loop-parallelization or the like
since I prefer to prepare the work packets "by hand".
I simply need
- to specify a list of clients
- a means of sending a work packet to a free client and receiving the
result (hopefully automatically without need to login to each one)
- optionally a timeout mechanism if a client doesn't respond
- optionally help for debugging of remote clients

So far I've seen scipy's COW (cluster of workstation) package, but
couldn't find documentation or even examples for it (and the small
example in the code crashes...).
I've noticed PYRO as well, but didn't look too far yet.

Can someone recommend a parallelization approach? Are there examples or
documentation? Has someone got experience with stability and efficiency?

Thanks a lot,
Mathias

Jul 18 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
"Mathias" <no_sp@m_please.cc> wrote:
I have a (pretty much) "emberassingly parallel" problem and look for the right toolbox to
parallelize it over a cluster of homogenous linux workstations. I don't need automatic
loop-parallelization or the like since I prefer to prepare the work packets "by hand".
I simply need
- to specify a list of clients
- a means of sending a work packet to a free client and receiving the
result (hopefully automatically without need to login to each one)
- optionally a timeout mechanism if a client doesn't respond
- optionally help for debugging of remote clients

So far I've seen scipy's COW (cluster of workstation) package, but couldn't find documentation or
even examples for it (and the small example in the code crashes...).
I've noticed PYRO as well, but didn't look too far yet.

Can someone recommend a parallelization approach? Are there examples or documentation? Has someone
got experience with stability and efficiency?


googling for "parallel python" brings up lots of references; tools like

http://pympi.sourceforge.net/
http://datamining.anu.edu.au/~ole/pypar/

(see https://geodoc.uchicago.edu/climatew...scussPythonMPI for
a comparision)

seem to be commonly used.

</F>

Jul 18 '05 #2

P: n/a
Mathias <no_sp@m_please.cc> writes:
Can someone recommend a parallelization approach? Are there examples
or documentation? Has someone got experience with stability and
efficiency?


In the "persistent objects" thread someone mentioned a very cool package
called POSH:

http://poshmodule.sourceforge.net/posh/html/posh.html
Jul 18 '05 #3

P: n/a
>>>>> "Mathias" == Mathias <no_sp@m_please.cc> writes:
Dear NG,
I have a (pretty much) "emberassingly parallel" problem and look for
the right toolbox to parallelize it over a cluster of homogenous linux
workstations. I don't need automatic loop-parallelization or the like
since I prefer to prepare the work packets "by hand".
I simply need
- to specify a list of clients
- a means of sending a work packet to a free client and receiving the
result (hopefully automatically without need to login to each one)
- optionally a timeout mechanism if a client doesn't respond
- optionally help for debugging of remote clients


pypvm or pympi? See http://pypvm.sourceforge.net/ and
http://pympi.sourceforge.net/.

Ganesan
Jul 18 '05 #4

P: n/a
Mathias wrote:
I have a (pretty much) "emberassingly parallel" problem and look for the
right toolbox to parallelize it over a cluster of homogenous linux
workstations.


We have a >1000-node cluster here and use the commercial Platform LSF to
manage it. My Poly package
<http://www.ebi.ac.uk/~hoffman/software/poly/> makes that trivial to use
from Python and also avoids many of the pitfalls of programming farms
that large, such as accidental distributed denial of service attacks on
your own fileserver ;)

Due to the cost and difficulty of setup, LSF is probably not what you
want, or you would already have it. But MPI is probably not what you
want if you are doing embarassingly parallelizable problems. I would
look into OpenPBS <http://www.openpbs.org/>. If you want to write a Poly
plugin for OpenPBS, I would be happy to accept it. ;)
--
Michael Hoffman
Jul 18 '05 #5

P: n/a
On Mon, 20 Dec 2004 14:03:09 +0100, Mathias <no_sp@m_please.cc> wrote:
Can someone recommend a parallelization approach? Are there examples or
documentation? Has someone got experience with stability and efficiency?


If you think a light-weight approach of distributing work and collecting
the output afterwards (using ssh/rsh) fits your problem, send me an
email.

Albert
--
Unlike popular belief, the .doc format is not an open publically available format.
Jul 18 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.