Parallel Python

parallelpython

Has anybody tried to run parallel python applications?
It appears that if your application is computation-bound using 'thread'
or 'threading' modules will not get you any speedup. That is because
python interpreter uses GIL(Global Interpreter Lock) for internal
bookkeeping. The later allows only one python byte-code instruction to
be executed at a time even if you have a multiprocessor computer.
To overcome this limitation, I've created ppsmp module:
http://www.parallelpython.com
It provides an easy way to run parallel python applications on smp
computers.
I would appreciate any comments/suggestions regarding it.
Thank you!

Jan 6 '07 #1

Subscribe Post Reply

4262

Laszlo Nagy

pa************@gmail.com wrote:

Has anybody tried to run parallel python applications?
It appears that if your application is computation-bound using 'thread'
or 'threading' modules will not get you any speedup. That is because
python interpreter uses GIL(Global Interpreter Lock) for internal
bookkeeping. The later allows only one python byte-code instruction to
be executed at a time even if you have a multiprocessor computer.
To overcome this limitation, I've created ppsmp module:
http://www.parallelpython.com
It provides an easy way to run parallel python applications on smp
computers.
I would appreciate any comments/suggestions regarding it.

I always thought that if you use multiple processes (e.g. os.fork) then
Python can take advantage of multiple processors. I think the GIL locks
one processor only. The problem is that one interpreted can be run on
one processor only. Am I not right? Is your ppm module runs the same
interpreter on multiple processors? That would be very interesting, and
something new.
Or does it start multiple interpreters? Another way to do this is to
start multiple processes and let them communicate through IPC or a local
network.
Laszlo

Jan 8 '07 #2

Duncan Booth

Laszlo Nagy <ga*****@designaproduct.bizwrote:

pa************@gmail.com wrote:
>Has anybody tried to run parallel python applications?
It appears that if your application is computation-bound using 'thread'
or 'threading' modules will not get you any speedup. That is because
python interpreter uses GIL(Global Interpreter Lock) for internal
bookkeeping. The later allows only one python byte-code instruction to
be executed at a time even if you have a multiprocessor computer.
To overcome this limitation, I've created ppsmp module:
http://www.parallelpython.com
It provides an easy way to run parallel python applications on smp
computers.
I would appreciate any comments/suggestions regarding it.

I always thought that if you use multiple processes (e.g. os.fork) then
Python can take advantage of multiple processors. I think the GIL locks
one processor only. The problem is that one interpreted can be run on
one processor only. Am I not right? Is your ppm module runs the same
interpreter on multiple processors? That would be very interesting, and
something new.

The GIL locks all processors, but just for one process. So, yes, if you
spawn off multiple processes then Python will take advantage of this. For
example we run Zope on a couple of dual processor dual core systems, so we
use squid and pound to ensure that the requests are spread across 4
instances of Zope on each machine. That way we do get a fairly even cpu
usage.

For some applications it is much harder to split the tasks across separate
processes rather than just separate threads, but there is a benefit once
you've done it since you can then distribute the processing across cpus on
separate machines.

The 'parallel python' site seems very sparse on the details of how it is
implemented but it looks like all it is doing is spawning some subprocesses
and using some simple ipc to pass details of the calls and results. I can't
tell from reading it what it is supposed to add over any of the other
systems which do the same.

Combined with the closed source 'no redistribution' license I can't really
see anyone using it.

Jan 8 '07 #3

robert

Duncan Booth wrote:

Laszlo Nagy <ga*****@designaproduct.bizwrote:
The 'parallel python' site seems very sparse on the details of how it is
implemented but it looks like all it is doing is spawning some subprocesses
and using some simple ipc to pass details of the calls and results. I can't
tell from reading it what it is supposed to add over any of the other
systems which do the same.

Combined with the closed source 'no redistribution' license I can't really
see anyone using it.

Thats true. IPC through sockets or (somewhat faster) shared memory - cPickle at least - is usually the maximum of such approaches.
See http://groups.google.de/group/comp.l...22ec289f30b26a

For tasks really requiring threading one can consider IronPython.
Most advanced technique I've see for CPython ist posh : http://poshmodule.sourceforge.net/

I'd say Py3K should just do the locking job for dicts / collections, obmalloc and refcount (or drop the refcount mechanism) and do the other minor things in order to enable free threading. Or at least enable careful sharing of Py-Objects between multiple separated Interpreter instances of one process.
..NET and Java have shown that the speed costs for this technique are no so extreme. I guess less than 10%.
And Python is a VHLL with less focus on speed anyway.
Also see discussions in http://groups.google.de/group/comp.l...22ec289f30b26a .
Robert

Jan 8 '07 #4

parallelpython

I always thought that if you use multiple processes (e.g. os.fork) then

Python can take advantage of multiple processors. I think the GIL locks
one processor only. The problem is that one interpreted can be run on
one processor only. Am I not right? Is your ppm module runs the same
interpreter on multiple processors? That would be very interesting, and
something new.
Or does it start multiple interpreters? Another way to do this is to
start multiple processes and let them communicate through IPC or a local
network.

That's right. ppsmp starts multiple interpreters in separate
processes and organize communication between them through IPC.

Originally ppsmp was designed to speedup an existent application
which is written in pure python but is quite computationally expensive
(the other ways to optimize it were used too). It was also required
that the application will run out of the box on the most standard Linux
distributions (they all contain CPython).

Jan 10 '07 #5

sturlamolden

robert wrote:

Thats true. IPC through sockets or (somewhat faster) shared memory - cPickle at least - is usually the maximum of such approaches.
See http://groups.google.de/group/comp.l...22ec289f30b26a

For tasks really requiring threading one can consider IronPython.
Most advanced technique I've see for CPython ist posh : http://poshmodule.sourceforge.net/

In SciPy there is an MPI-binding project, mpi4py.

MPI is becoming the de facto standard for high-performance parallel
computing, both on shared memory systems (SMPs) and clusters. Spawning
threads or processes is not recommended way to do numerical parallel
computing. Threading makes programming certain tasks more convinient
(particularly GUI and I/O, for which the GIL does not matter anyway),
but is not a good paradigm for dividing CPU bound computations between
multiple processors. MPI is a high level API based on a concept of
"message passing", which allows the programmer to focus on solving the
problem, instead on irrelevant distractions such as thread managament
and synchronization.

Although MPI has standard APIs for C and Fortran, it may be used with
any programming language. For Python, an additional advantage of using
MPI is that the GIL has no practical consequence for performance. The
GIL can lock a process but not prevent MPI from using multiple
processors as MPI is always using multiple processes. For IPC, MPI will
e.g. use shared-memory segments on SMPs and tcp/ip on clusters, but all
these details are hidden.

It seems like 'ppsmp' of parallelpython.com is just an reinvention of a
small portion of MPI.
http://mpi4py.scipy.org/
http://en.wikipedia.org/wiki/Message_Passing_Interface

Jan 10 '07 #6

sturlamolden

parallelpyt...@gmail.com wrote:

That's right. ppsmp starts multiple interpreters in separate
processes and organize communication between them through IPC.

Thus you are basically reinventing MPI.
http://mpi4py.scipy.org/
http://en.wikipedia.org/wiki/Message_Passing_Interface

Jan 10 '07 #7

Nick Maclaren

In article <11**********************@p59g2000hsd.googlegroups .com>,
"sturlamolden" <st**********@yahoo.nowrites:
|>
|MPI is becoming the de facto standard for high-performance parallel
|computing, both on shared memory systems (SMPs) and clusters.

It has been for some time, and is still gaining ground.

|Spawning
|threads or processes is not recommended way to do numerical parallel
|computing.

Er, MPI works by getting SOMETHING to spawn processes, which then
communicate with each other.

|Threading makes programming certain tasks more convinient
|(particularly GUI and I/O, for which the GIL does not matter anyway),
|but is not a good paradigm for dividing CPU bound computations between
|multiple processors. MPI is a high level API based on a concept of
|"message passing", which allows the programmer to focus on solving the
|problem, instead on irrelevant distractions such as thread managament
|and synchronization.

Grrk. That's not quite it.

The problem is that the current threading models (POSIX threads and
Microsoft's equivalent) were intended for running large numbers of
semi-independent, mostly idle, threads: Web servers and similar.
Everything about them, including their design (such as it is), their
interfaces and their implementations, are unsuitable for parallel HPC
applications. One can argue whether that is insoluble, but let's not,
at least not here.

Now, Unix and Microsoft processes are little better but, because they
are more separate (and, especially, because they don't share memory)
are MUCH easier to run effectively on shared memory multi-CPU systems.
You still have to play administrator tricks, but they aren't as foul
as the ones that you have to play for threaded programs. Yes, I know
that it is a bit Irish for the best way to use a shared memory system
to be to not share memory, but that's how it is.
Regards,
Nick Maclaren.

Jan 10 '07 #8

sturlamolden

Nick Maclaren wrote:

as the ones that you have to play for threaded programs. Yes, I know
that it is a bit Irish for the best way to use a shared memory system
to be to not share memory, but that's how it is.

Thank you for clearing that up.

In any case, this means that Python can happily keep its GIL, as the
CPU bound 'HPC' tasks for which the GIL does matter should be done
using multiple processes (not threads) anyway. That leaves threads as a
tool for programming certain i/o tasks and maintaining 'responsive'
user interfaces, for which the GIL incidentally does not matter.

I wonder if too much emphasis is put on thread programming these days.
Threads may be nice for programming web servers and the like, but not
for numerical computing. Reading books about thread programming, one
can easily get the impression that it is 'the' way to parallelize
numerical tasks on computers with multiple CPUs (or multiple CPU
cores). But if threads are inherently designed and implemented to stay
idle most of the time, that is obviously not the case.

I like MPI. Although it is a huge API with lots of esoteric functions,
I only need to know a handfull to cover my needs. Not to mention the
fact that I can use MPI with Fortran, which is frowned upon by computer
scientists but loved by scientists and engineers specialized in any
other field.

Jan 10 '07 #9

Paul Rubin

nm**@cus.cam.ac.uk (Nick Maclaren) writes:

Yes, I know that it is a bit Irish for the best way to use a shared
memory system to be to not share memory, but that's how it is.

But I thought serious MPI implementations use shared memory if they
can. That's the beauty of it, you can run your application on SMP
processors getting the benefit of shared memory, or split it across
multiple machines using ethernet or infiniband or whatever, without
having to change the app code.

Jan 10 '07 #10

Nick Maclaren

In article <11*********************@i39g2000hsf.googlegroups. com>,
"sturlamolden" <st**********@yahoo.nowrites:
|>
|In any case, this means that Python can happily keep its GIL, as the
|CPU bound 'HPC' tasks for which the GIL does matter should be done
|using multiple processes (not threads) anyway. That leaves threads as a
|tool for programming certain i/o tasks and maintaining 'responsive'
|user interfaces, for which the GIL incidentally does not matter.

Yes. That is the approach being taken at present by almost everyone.

|I wonder if too much emphasis is put on thread programming these days.
|Threads may be nice for programming web servers and the like, but not
|for numerical computing. Reading books about thread programming, one
|can easily get the impression that it is 'the' way to parallelize
|numerical tasks on computers with multiple CPUs (or multiple CPU
|cores). But if threads are inherently designed and implemented to stay
|idle most of the time, that is obviously not the case.

You have to distinguish "lightweight processes" from "POSIX threads"
from the generic concept. It is POSIX and Microsoft threads that are
inherently like that, and another kind of thread model might be very
different. Don't expect to see one provided any time soon, even by
Linux.

OpenMP is the current leader for SMP parallelism, and it would be
murder to produce a Python binding that had any hope of delivering
useful performance. I think that it could be done, but implementing
the result would be a massive task. The Spruce Goose and Project
Habbakuk (sic) spring to my mind, by comparison[*] :-)

|I like MPI. Although it is a huge API with lots of esoteric functions,
|I only need to know a handfull to cover my needs. Not to mention the
|fact that I can use MPI with Fortran, which is frowned upon by computer
|scientists but loved by scientists and engineers specialized in any
|other field.

Yup. MPI is also debuggable and tunable (with difficulty). Debugging
and tuning OpenMP and POSIX threads are beyond anyone except the most
extreme experts; I am only on the borderline of being able to.

The ASCI bunch favour Co-array Fortran, and its model matches Python
like a steam turbine is a match for a heart transplant.

[*] They are worth looking up, if you don't know about them.
Regards,
Nick Maclaren.

Jan 10 '07 #11

Nick Maclaren

In article <7x************@ruckus.brouhaha.com>,
Paul Rubin <http://ph****@NOSPAM.invalidwrites:
|>
| Yes, I know that it is a bit Irish for the best way to use a shared
| memory system to be to not share memory, but that's how it is.
|>
|But I thought serious MPI implementations use shared memory if they
|can. That's the beauty of it, you can run your application on SMP
|processors getting the benefit of shared memory, or split it across
|multiple machines using ethernet or infiniband or whatever, without
|having to change the app code.

They use it for the communication, but don't expose it to the
programmer. It is therefore easy to put the processes on different
CPUs, and get the memory consistency right.
Regards,
Nick Maclaren.

Jan 10 '07 #12

Sergei Organov

nm**@cus.cam.ac.uk (Nick Maclaren) writes:

In article <11*********************@i39g2000hsf.googlegroups. com>,
"sturlamolden" <st**********@yahoo.nowrites:

[...]

|I wonder if too much emphasis is put on thread programming these days.
|Threads may be nice for programming web servers and the like, but not
|for numerical computing. Reading books about thread programming, one
|can easily get the impression that it is 'the' way to parallelize
|numerical tasks on computers with multiple CPUs (or multiple CPU
|cores). But if threads are inherently designed and implemented to stay
|idle most of the time, that is obviously not the case.

You have to distinguish "lightweight processes" from "POSIX threads"
from the generic concept. It is POSIX and Microsoft threads that are
inherently like that,

Do you mean that POSIX threads are inherently designed and implemented
to stay idle most of the time?! If so, I'm afraid those guys that
designed POSIX threads won't agree with you. In particular, as far as I
remember, David R. Butenhof said a few times in comp.programming.threads
that POSIX threads were primarily designed to meet parallel programming
needs on SMP, or at least that was how I understood him.

-- Sergei.

Jan 10 '07 #13

Nick Maclaren

In article <ma***************************************@python. org>,
Sergei Organov <os*@javad.comwrites:
|>
|Do you mean that POSIX threads are inherently designed and implemented
|to stay idle most of the time?! If so, I'm afraid those guys that
|designed POSIX threads won't agree with you. In particular, as far as I
|remember, David R. Butenhof said a few times in comp.programming.threads
|that POSIX threads were primarily designed to meet parallel programming
|needs on SMP, or at least that was how I understood him.

I do mean that, and I know that they don't agree. However, the word
"designed" doesn't really make a lot of sense for POSIX threads - the
one I tend to use is "perpetrated".

The people who put the specification together were either unaware of
most of the experience of the previous 30 years, or chose to ignore it.
In particular, in this context, the importance of being able to control
the scheduling was well-known, as was the fact that it is NOT possible
to mix processes with different scheduling models on the same set of
CPUs. POSIX's facilities are completely hopeless for that purpose, and
most of the systems I have used effectively ignore them.

I could go on at great length, and the performance aspects are not even
the worst aspect of POSIX threads. The fact that there is no usable
memory model, and the synchronisation depends on C to handle the
low-level consistency, but there are no CONCEPTS in common between
POSIX and C's memory consistency 'specifications' is perhaps the worst.
That is why many POSIX threads programs work until the genuinely
shared memory accesses become frequent enough that you get some to the
same location in a single machine cycle.
Regards,
Nick Maclaren.

Jan 10 '07 #14

Carl J. Van Arsdall

Just as something to note, but many HPC applications will use a
combination of both MPI and threading (OpenMP usually, as for the
underlying thread implementation i don't have much to say). Its
interesting to see on this message board this huge "anti-threading"
mindset, but the HPC community seems to be happy using a little of both
depending on their application and the topology of their parallel
machine. Although if I was doing HPC applications, I probably would not
choose to use Python but I would write things in C or FORTRAN.

What I liked about python threads was that they were easy whereas using
processes and IPC is a real pain in the butt sometimes. I don't
necessarily think this module is the end-all solution to all of our
problems but I do think that its a good thing and I will toy with it
some in my spare time. I think that any effort to making python
threading better is a good thing and I'm happy to see the community
attempt to make improvements. It would also be cool if this would be
open sourced and I'm not quite sure why its not.

-carl
--

Carl J. Van Arsdall
cv*********@mvista.com
Build and Release
MontaVista Software

Jan 10 '07 #15

Nick Maclaren

In article <ma***************************************@python. org>,
"Carl J. Van Arsdall" <cv*********@mvista.comwrites:
|>
|Just as something to note, but many HPC applications will use a
|combination of both MPI and threading (OpenMP usually, as for the
|underlying thread implementation i don't have much to say). Its
|interesting to see on this message board this huge "anti-threading"
|mindset, but the HPC community seems to be happy using a little of both
|depending on their application and the topology of their parallel
|machine. Although if I was doing HPC applications, I probably would not
|choose to use Python but I would write things in C or FORTRAN.

That is a commonly quoted myth.

Some of the ASCI community did that, but even they have backed off
to a great extent. Such code is damn near impossible to debug, let
alone tune. To the best of my knowledge, no non-ASCI application
has ever done that, except for virtuosity. I have several times
asked claimants to name some examples of code that does that and is
used in the general research community, and have so far never had a
response.

I managed the second-largest HPC system in UK academia for a decade,
ending less than a year ago, incidentally, and was and am fairly well
in touch with what is going on in HPC world-wide.
Regards,
Nick Maclaren.

Jan 10 '07 #16

robert

sturlamolden wrote:

Nick Maclaren wrote:

I wonder if too much emphasis is put on thread programming these days.
Threads may be nice for programming web servers and the like, but not
for numerical computing. Reading books about thread programming, one
can easily get the impression that it is 'the' way to parallelize
numerical tasks on computers with multiple CPUs (or multiple CPU

Most threads on this planet are not used for number crunching jobs, but for "organization of execution".

Also if one wants to exploit the speed of upcoming multi-core CPUs for all kinds of fine grained programs, things need fast fine grained communication - and most important: huge data trees in memory have to be shared effectively.
CPU frequencies will not grow anymore in the future, but we will see multi-cores/SMP. How to exploit them in a manner as if we had really faster CPU's: threads and thread-like techniques.

Things like MPI, IPC are just for the area of "small message, big job" - typically sci number crunching, where you collect the results "at the end of day". Its more a slow network technique.

A most challenging example on this are probably games - not to discuss about gaming here, but as tech example to the point: Would you do MPI, RPC etc. while 30fps 3D and real time physics simulation is going on?
Robert

Jan 11 '07 #17

robert

Nick Maclaren wrote:

In article <7x************@ruckus.brouhaha.com>,
Paul Rubin <http://ph****@NOSPAM.invalidwrites:
|>
| Yes, I know that it is a bit Irish for the best way to use a shared
| memory system to be to not share memory, but that's how it is.
|>
|But I thought serious MPI implementations use shared memory if they
|can. That's the beauty of it, you can run your application on SMP
|processors getting the benefit of shared memory, or split it across
|multiple machines using ethernet or infiniband or whatever, without
|having to change the app code.

They use it for the communication, but don't expose it to the
programmer. It is therefore easy to put the processes on different
CPUs, and get the memory consistency right.

Thus communicated data is "serialized" - not directly used as with threads or with custom shared memory techniques like POSH object sharing.
Robert

Jan 11 '07 #18

Sergei Organov

nm**@cus.cam.ac.uk (Nick Maclaren) writes:

In article <ma***************************************@python. org>,
Sergei Organov <os*@javad.comwrites:
|>
|Do you mean that POSIX threads are inherently designed and implemented
|to stay idle most of the time?! If so, I'm afraid those guys that
|designed POSIX threads won't agree with you. In particular, as far as I
|remember, David R. Butenhof said a few times in comp.programming.threads
|that POSIX threads were primarily designed to meet parallel programming
|needs on SMP, or at least that was how I understood him.

I do mean that, and I know that they don't agree. However, the word
"designed" doesn't really make a lot of sense for POSIX threads - the
one I tend to use is "perpetrated".

OK, then I don't think the POSIX threads were "perpetrated" to be idle
most of time.

The people who put the specification together were either unaware of
most of the experience of the previous 30 years, or chose to ignore it.
In particular, in this context, the importance of being able to control
the scheduling was well-known, as was the fact that it is NOT possible
to mix processes with different scheduling models on the same set of
CPUs. POSIX's facilities are completely hopeless for that purpose, and
most of the systems I have used effectively ignore them.

I won't argue that. On the other hand, POSIX threads capabilities in the
field of I/O-bound and real-time threads are also limited, and that's
where "threads that are idle most of time" idiom comes from, I
think. What I argue, is that POSIX were "perpetrated" to support
I/O-bound or real-time apps any more than to support parallel
calculations apps. Besides, pthreads real-time extensions came later
than pthreads themselves.

What I do see, is that Microsoft designed their system so that it's
almost impossible to implement an interactive application without using
threads, and that fact leads to the current situation where threads are
considered to be beasts that are sleeping most of time.

I could go on at great length, and the performance aspects are not even
the worst aspect of POSIX threads. The fact that there is no usable
memory model, and the synchronisation depends on C to handle the
low-level consistency, but there are no CONCEPTS in common between
POSIX and C's memory consistency 'specifications' is perhaps the worst.

I won't argue that either. However, I don't see how does it make POSIX
threads to be "perpetrated" to be idle most of time.

That is why many POSIX threads programs work until the genuinely
shared memory accesses become frequent enough that you get some to the
same location in a single machine cycle.

Sorry, I don't understand. Are you saying that it's inherently
impossible to write an application that uses POSIX threads and that
doesn't have bugs accessing shared state? I thought that pthreads
mutexes guarantee sequential access to shared data. Or do you mean
something entirely different? Lock-free algorithms maybe?

-- Sergei.

Jan 11 '07 #19

Nick Maclaren

In article <ma***************************************@python. org>,
Sergei Organov <os*@javad.comwrites:
|>
|OK, then I don't think the POSIX threads were "perpetrated" to be idle
|most of time.

Perhaps I was being unclear. I should have added "In the case where
there are more threads per system than CPUs per system". The reasons
are extremely obscure and are to do with the scheduling, memory access
and communication.

I am in full agreement that the above effect was not INTENDED.

| That is why many POSIX threads programs work until the genuinely
| shared memory accesses become frequent enough that you get some to the
| same location in a single machine cycle.
|>
|Sorry, I don't understand. Are you saying that it's inherently
|impossible to write an application that uses POSIX threads and that
|doesn't have bugs accessing shared state? I thought that pthreads
|mutexes guarantee sequential access to shared data. Or do you mean
|something entirely different? Lock-free algorithms maybe?

I mean precisely the first.

The C99 standard uses a bizarre consistency model, which requires serial
execution, and its consistency is defined in terms of only volatile
objects and external I/O. Any form of memory access, signalling or
whatever is outside that, and is undefined behaviour.

POSIX uses a different but equally bizarre one, based on some function
calls being "thread-safe" and others forcing "consistency" (which is
not actually defined, and there are many possible, incompatible,
interpretations). It leaves all language aspects (including allowed
code movement) to C.

There are no concepts in common between C's and POSIX's consistency
specifications (even when they are precise enough to use), and so no
way of mapping the two standards together.
Regards,
Nick Maclaren.

Jan 11 '07 #20

Nick Maclaren

In article <eo**********@news.albasani.net>,
robert <no*****@no-spam-no-spam.invalidwrites:
|>
|Most threads on this planet are not used for number crunching jobs,
|but for "organization of execution".

That is true, and it is effectively what POSIX and Microsoft threads
are suitable for. With reservations, even there.

|Things like MPI, IPC are just for the area of "small message, big job"
|- typically sci number crunching, where you collect the results "at
|the end of day". Its more a slow network technique.

That is completely false. Most dedicated HPC systems use MPI for high
levels of message passing over high-speed networks.

| They use it for the communication, but don't expose it to the
| programmer. It is therefore easy to put the processes on different
| CPUs, and get the memory consistency right.
|>
|Thus communicated data is "serialized" - not directly used as with
|threads or with custom shared memory techniques like POSH object
|sharing.

It is not used as directly with threads as you might think. Even
POSIX and Microsoft threads require synchronisation primitives, and
threading models like OpenMP and BSP have explicit control.

Also, MPI has asynchronous (non-blocking) communication.
Regards,
Nick Maclaren.

Jan 11 '07 #21

sturlamolden

robert wrote:

Thus communicated data is "serialized" - not directly used as with threads or with custom shared memory techniques like POSH object sharing.

Correct, and that is precisely why MPI code is a lot easier to write
and debug than thread code. The OP used a similar technique in his
'parallel python' project.

This does not mean that MPI is inherently slower than threads however,
as there are overhead associated with thread synchronization as well.
With 'shared memory' between threads, a lot more fine grained
synchronization ans scheduling is needed, which impair performance and
often introduce obscure bugs.

Jan 11 '07 #22

Sergei Organov

nm**@cus.cam.ac.uk (Nick Maclaren) writes:
[...]

I mean precisely the first.

The C99 standard uses a bizarre consistency model, which requires serial
execution, and its consistency is defined in terms of only volatile
objects and external I/O. Any form of memory access, signalling or
whatever is outside that, and is undefined behaviour.

POSIX uses a different but equally bizarre one, based on some function
calls being "thread-safe" and others forcing "consistency" (which is
not actually defined, and there are many possible, incompatible,
interpretations). It leaves all language aspects (including allowed
code movement) to C.

There are no concepts in common between C's and POSIX's consistency
specifications (even when they are precise enough to use), and so no
way of mapping the two standards together.

Ah, now I see what you mean. Even though I only partly agree with what
you've said above, I'll stop arguing as it gets too off-topic for this
group.

Thank you for explanations.

-- Sergei.

Jan 11 '07 #23

robert

sturlamolden wrote:

robert wrote:

>Thus communicated data is "serialized" - not directly used as with threads or with custom shared memory techniques like POSH object sharing.

Correct, and that is precisely why MPI code is a lot easier to write
and debug than thread code. The OP used a similar technique in his
'parallel python' project.

Thus there are different levels of parallelization:

1 file/database based; multiple batch jobs
2 Message Passing, IPC, RPC, ...
3 Object Sharing
4 Sharing of global data space (Threads)
5 Local parallelism / Vector computing, MMX, 3DNow,...

There are good reasons for all of these levels.
Yet "parallel python" to me fakes to be on level 3 or 4 (or even 5 :-) ), while its just a level 2 system, where "passing", "remote", "inter-process" ... are the right vocables.

With all this fakes popping up - a GIL free CPython is a major feature request for Py3K - a name at least promising to run 3rd millenium CPU's ...

This does not mean that MPI is inherently slower than threads however,
as there are overhead associated with thread synchronization as well.

level 2 communication is slower. Just for selected apps it won't matter a lot.

With 'shared memory' between threads, a lot more fine grained
synchronization ans scheduling is needed, which impair performance and
often introduce obscure bugs.

Its a question of chances and costs and nature of application.
Yet one can easily restrict inter-thread communcation to be as simple and modular or even simpler as IPC. Search e.g. "Python CallQueue" and "BackgroundCall" on Google.
Thread programming is less complicated as it seems. (Just Python's stdlib offers cumbersome 'non-functional' classes)
Robert

Jan 11 '07 #24

Nick Maclaren

In article <eo**********@news.albasani.net>,
robert <no*****@no-spam-no-spam.invalidwrites:
|>
|Thus there are different levels of parallelization:
|>
|1 file/database based; multiple batch jobs
|2 Message Passing, IPC, RPC, ...
|3 Object Sharing
|4 Sharing of global data space (Threads)
|5 Local parallelism / Vector computing, MMX, 3DNow,...
|>
|There are good reasons for all of these levels.

Well, yes, but to call them "levels" is misleading, as they are closer
to communication methods of a comparable level.

| This does not mean that MPI is inherently slower than threads however,
| as there are overhead associated with thread synchronization as well.
|>
|level 2 communication is slower. Just for selected apps it won't matter a lot.

That is false. It used to be true, but that was a long time ago. The
reasons why what seems to be a more heavyweight mechanism (message
passing) can be faster than an apparently lightweight one (data sharing)
are both subtle and complicated.
Regards,
Nick Maclaren.

Jan 11 '07 #25

Konrad Hinsen

On Jan 8, 2007, at 11:33, Duncan Booth wrote:

The 'parallel python' site seems very sparse on the details of how
it is
implemented but it looks like all it is doing is spawning some
subprocesses
and using some simple ipc to pass details of the calls and results.
I can't
tell from reading it what it is supposed to add over any of the other
systems which do the same.

Combined with the closed source 'no redistribution' license I can't
really
see anyone using it.

I'd also like to see more details - even though I'd probably never
use any Python module distributed in .pyc form only.

From the bit of information there is on the Web site, the
distribution strategy looks quite similar to my own master-slave
distribution model (based on Pyro) which is part of ScientificPython.
There is an example at

http://dirac.cnrs-orleans.fr/hg/ScientificPython/main/?
f=08361040f00a;file=Examples/master_slave_demo.py

and the code itself can be consulted at

http://dirac.cnrs-orleans.fr/hg/ScientificPython/main/?
f=bce321680116;file=Scientific/DistributedComputing/MasterSlave.py
The main difference seems to be that my implementation doesn't start
compute jobs itself; it leaves it to the user to start any number he
wants by any means that works for his setup, but it allows a lot of
flexibility. In particular, it can work with a variable number of
slave jobs and even handles disappearing slave jobs gracefully.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: hi****@cnrs-orleans.fr
---------------------------------------------------------------------

Jan 11 '07 #26

parallelpython

sturlamolden wrote:

parallelpyt...@gmail.com wrote:

That's right. ppsmp starts multiple interpreters in separate
processes and organize communication between them through IPC.

Thus you are basically reinventing MPI.

http://mpi4py.scipy.org/
http://en.wikipedia.org/wiki/Message_Passing_Interface

Thanks for bringing that into consideration.

I am well aware of MPI and have written several programs in C/C++ and
Fortran which use it.
I would agree that MPI is the most common solution to run software on a
cluster (computers connected by network). Although there is another
parallelization approach: PVM (Parallel Virtual Machine)
http://www.csm.ornl.gov/pvm/pvm_home.html. I would say ppsmp is more
similar to the later.

By the way there are links to different python parallelization
techniques (including MPI) from PP site:
http://www.parallelpython.com/compon...,14/Itemid,23/

The main difference between MPI python solutions and ppsmp is that with
MPI you have to organize both computations
{MPI_Comm_rank(MPI_COMM_WORLD, &id); if id==1 then ... else ....} and
data distribution (MPI_Send / MPI_Recv) by yourself. While with ppsmp
you just submit a function with arguments to the execution server and
retrieve the results later.
That makes transition from serial python software to parallel much
simpler with ppsmp than with MPI.

To make this point clearer here is a short example:
--------------------serial code 2 lines------------------
for input in inputs:
print "Sum of primes below", input, "is", sum_primes(input)
--------------------parallel code 3 lines----------------
jobs = [(input, job_server.submit(sum_primes,(input,), (isprime,),
("math",))) for input in inputs]
for input, job in jobs:
print "Sum of primes below", input, "is", job()
---------------------------------------------------------------
In this example parallel execution was added at the cost of 1 line of
code!

The other difference with MPI is that ppsmp dynamically decides where
to run each given job. For example if there are other active processes
running in the system ppsmp will use in the bigger extent the
processors which are free. Since in MPI the whole tasks is usually
divided between processors equally at the beginning, the overall
runtime will be determined by the slowest running process (the one
which shares processor with another running program). In this
particular case ppsmp will outperform MPI.

The third, probably less important, difference is that with MPI based
parallel python code you must have MPI installed in the system.

Overall ppsmp is still work in progress and there are other interesting
features which I would like to implement. This is the main reason why I
do not open the source of ppsmp - to have better control of its future
development, as advised here: http://en.wikipedia.org/wiki/Freeware :-)

Best regards,
Vitalii

Jan 11 '07 #27

parallelpython

Thus there are different levels of parallelization:

1 file/database based; multiple batch jobs
2 Message Passing, IPC, RPC, ...
3 Object Sharing
4 Sharing of global data space (Threads)
5 Local parallelism / Vector computing, MMX, 3DNow,...

There are good reasons for all of these levels.
Yet "parallel python" to me fakes to be on level 3 or 4 (or even 5 :-) ), while its just a level 2
system, where "passing", "remote", "inter-process" ... are the right vocables.

In one of the previous posts I've mentioned that ppsmp is based on
processes + IPC, which makes it a system with level 2 parallelization,
the same level where MPI is.
Also it's obvious from the fact that it's written completely in python,
as python objects cannot be shared due to GIL (POSH can do sharing
because it's an extension written in C).

Jan 12 '07 #28

jairodsl

Hi,

You guys forgot pyMPI, http://pympi.sourceforge.net/ It works fine !!!
A little hard installation and configuration but finally works !!!

Cordially,

Jairo Serrano
Bucaramanga, Colombia

parallelpyt...@gmail.com wrote:

Thus there are different levels of parallelization:

1 file/database based; multiple batch jobs
2 Message Passing, IPC, RPC, ...
3 Object Sharing
4 Sharing of global data space (Threads)
5 Local parallelism / Vector computing, MMX, 3DNow,...

There are good reasons for all of these levels.
Yet "parallel python" to me fakes to be on level 3 or 4 (or even 5 :-) ), while its just a level 2
system, where "passing", "remote", "inter-process" ... are the right vocables.
In one of the previous posts I've mentioned that ppsmp is based on
processes + IPC, which makes it a system with level 2 parallelization,
the same level where MPI is.
Also it's obvious from the fact that it's written completely in python,
as python objects cannot be shared due to GIL (POSH can do sharing
because it's an extension written in C).

Jan 12 '07 #29

Paul Boddie

parallelpyt...@gmail.com wrote:

>
The main difference between MPI python solutions and ppsmp is that with
MPI you have to organize both computations
{MPI_Comm_rank(MPI_COMM_WORLD, &id); if id==1 then ... else ....} and
data distribution (MPI_Send / MPI_Recv) by yourself. While with ppsmp
you just submit a function with arguments to the execution server and
retrieve the results later.

Couldn't you just provide similar conveniences on top of MPI? Searching
for "Python MPI" yields a lot of existing work (as does "Python PVM"),
so perhaps someone has already done so. Also, what about various grid
toolkits?

[...]

Overall ppsmp is still work in progress and there are other interesting
features which I would like to implement. This is the main reason why I
do not open the source of ppsmp - to have better control of its future
development, as advised here: http://en.wikipedia.org/wiki/Freeware :-)

Despite various probable reactions from people who will claim that
they're comfortable with binary-only products from a single vendor, I
think more people would be inclined to look at your software if you did
distribute the source code, even if they then disregarded what you've
done. My own experience with regard to releasing software is that even
with an open source licence, most people are likely to ignore your
projects than to suddenly jump on board and take control, and even if
your project somehow struck a chord and attracted a lot of interested
developers, would it really be such a bad thing? Many developers have
different experiences and insights which can only make your project
better, anyway.

Related to your work, I've released a parallel execution solution
called parallel/pprocess [1] under the LGPL and haven't really heard
about anyone really doing anything with it, let alone forking it and
showing my original efforts in a bad light. Perhaps most of the
downloaders believe me to be barking up the wrong tree (or just
barking) with the approach I've taken, but I think the best thing is to
abandon any fears of not doing things the best possible way and just be
open to improvements and suggestions.

Paul

[1] http://www.python.org/pypi/parallel

Jan 12 '07 #30

Nick Maclaren

In article <11**********************@s34g2000cwa.googlegroups .com>,
"Paul Boddie" <pa**@boddie.org.ukwrites:
|parallelpyt...@gmail.com wrote:
|
| The main difference between MPI python solutions and ppsmp is that with
| MPI you have to organize both computations
| {MPI_Comm_rank(MPI_COMM_WORLD, &id); if id==1 then ... else ....} and
| data distribution (MPI_Send / MPI_Recv) by yourself. While with ppsmp
| you just submit a function with arguments to the execution server and
| retrieve the results later.
|>
|Couldn't you just provide similar conveniences on top of MPI? Searching
|for "Python MPI" yields a lot of existing work (as does "Python PVM"),
|so perhaps someone has already done so.

Yes. No problem.

|Also, what about various grid toolkits?

If you can find one that is robust enough for real work by someone who
is not deeply into developing Grid software, I will be amazed.
Regards,
Nick Maclaren.

Jan 12 '07 #31

robert

Paul Boddie wrote:

parallelpyt...@gmail.com wrote:
>The main difference between MPI python solutions and ppsmp is that with
MPI you have to organize both computations
{MPI_Comm_rank(MPI_COMM_WORLD, &id); if id==1 then ... else ....} and
data distribution (MPI_Send / MPI_Recv) by yourself. While with ppsmp
you just submit a function with arguments to the execution server and
retrieve the results later.

Couldn't you just provide similar conveniences on top of MPI? Searching
for "Python MPI" yields a lot of existing work (as does "Python PVM"),
so perhaps someone has already done so. Also, what about various grid
toolkits?

[...]

>Overall ppsmp is still work in progress and there are other interesting
features which I would like to implement. This is the main reason why I
do not open the source of ppsmp - to have better control of its future
development, as advised here: http://en.wikipedia.org/wiki/Freeware :-)

Despite various probable reactions from people who will claim that
they're comfortable with binary-only products from a single vendor, I
think more people would be inclined to look at your software if you did
distribute the source code, even if they then disregarded what you've
done. My own experience with regard to releasing software is that even
with an open source licence, most people are likely to ignore your
projects than to suddenly jump on board and take control, and even if
your project somehow struck a chord and attracted a lot of interested
developers, would it really be such a bad thing? Many developers have
different experiences and insights which can only make your project
better, anyway.

Related to your work, I've released a parallel execution solution
called parallel/pprocess [1] under the LGPL and haven't really heard
about anyone really doing anything with it, let alone forking it and
showing my original efforts in a bad light. Perhaps most of the
downloaders believe me to be barking up the wrong tree (or just
barking) with the approach I've taken, but I think the best thing is to
abandon any fears of not doing things the best possible way and just be
open to improvements and suggestions.

Paul

[1] http://www.python.org/pypi/parallel

I'd be interested in an overview.
For ease of use a major criterion for me would be a pure python
solution, which also does the job of starting and controlling the
other process(es) automatically right (by default) on common
platforms.
Which of the existing (RPC) solutions are that nice?
Robert

Jan 12 '07 #32

Neal Becker

pa************@gmail.com wrote:

Has anybody tried to run parallel python applications?
It appears that if your application is computation-bound using 'thread'
or 'threading' modules will not get you any speedup. That is because
python interpreter uses GIL(Global Interpreter Lock) for internal
bookkeeping. The later allows only one python byte-code instruction to
be executed at a time even if you have a multiprocessor computer.
To overcome this limitation, I've created ppsmp module:
http://www.parallelpython.com
It provides an easy way to run parallel python applications on smp
computers.
I would appreciate any comments/suggestions regarding it.
Thank you!

Looks interesting, but is there any way to use this for a cluster of
machines over a network (not smp)?

Jan 12 '07 #33

Paul Boddie

robert wrote:

Paul Boddie wrote:

[1] http://www.python.org/pypi/parallel

I'd be interested in an overview.

I think we've briefly discussed the above solution before, and I don't
think you're too enthusiastic about anything using interprocess
communication, which is what the above solution uses. Moreover, it's
intended as a threading replacement for SMP/multicore architectures
where one actually gets parallel execution (since it uses processes).

For ease of use a major criterion for me would be a pure python
solution, which also does the job of starting and controlling the
other process(es) automatically right (by default) on common
platforms.
Which of the existing (RPC) solutions are that nice?

Many people have nice things to say about Pyro, and there seem to be
various modules attempting parallel processing, or at least some kind
of job control, using that technology. See Konrad Hinsen's
ScientificPython solution for an example of this - I'm sure I've seen
others, too.

Paul

Jan 12 '07 #34

Konrad Hinsen

On Jan 12, 2007, at 11:21, Paul Boddie wrote:

done. My own experience with regard to releasing software is that even
with an open source licence, most people are likely to ignore your
projects than to suddenly jump on board and take control, and even if

My experience is exactly the same. And looking into the big world of
Open Source programs, the only case I ever heard of in which a
project was forked by someone else is the Emacs/XEmacs split. I'd be
happy if any of my projects ever reached that level of interest.

Related to your work, I've released a parallel execution solution
called parallel/pprocess [1] under the LGPL and haven't really heard
about anyone really doing anything with it, let alone forking it and

That's one more project... It seems that there is significant
interest in parallel computing in Python. Perhaps we should start a
special interest group? Not so much in order to work on a single
project; I believe that at the current state of parallel computing we
still need many different approaches to be tried. But an exchange of
experience could well be useful for all of us.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: hi****@cnrs-orleans.fr
---------------------------------------------------------------------

Jan 12 '07 #35

Paul Boddie

Konrad Hinsen wrote:

>
That's one more project... It seems that there is significant
interest in parallel computing in Python. Perhaps we should start a
special interest group? Not so much in order to work on a single
project; I believe that at the current state of parallel computing we
still need many different approaches to be tried. But an exchange of
experience could well be useful for all of us.

I think a special interest group might be productive, but I've seen
varying levels of special interest in the different mailing lists
associated with such groups: the Web-SIG list started with enthusiasm,
produced a cascade of messages around WSGI, then dried up; the XML-SIG
list seems to be a sorry indication of how Python's XML scene has
drifted onto other matters; other such groups have also lost their
momentum.

It seems to me that a more useful first step would be to create an
overview of the different modules and put it on the python.org Wiki:

http://wiki.python.org/moin/FrontPage
http://wiki.python.org/moin/UsefulModules (a reasonable entry point)

If no-one beats me to it, I may write something up over the weekend.

Paul

Jan 12 '07 #36

Konrad Hinsen

On Jan 12, 2007, at 15:08, Paul Boddie wrote:

It seems to me that a more useful first step would be to create an
overview of the different modules and put it on the python.org Wiki:

http://wiki.python.org/moin/FrontPage
http://wiki.python.org/moin/UsefulModules (a reasonable entry point)

If no-one beats me to it, I may write something up over the weekend.

That sounds like a good idea. I won't beat you to it, but I'll have a
look next week and perhaps add information that I have.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: hi****@cnrs-orleans.fr
---------------------------------------------------------------------

Jan 12 '07 #37

mheslep

Konrad Hinsen wrote:
..... Perhaps we should start a

special interest group? Not so much in order to work on a single
project; I believe that at the current state of parallel computing we
still need many different approaches to be tried. But an exchange of
experience could well be useful for all of us.

+ 1

-Mark

Jan 12 '07 #38

parallelpython

Looks interesting, but is there any way to use this for a cluster of

machines over a network (not smp)?

Networking capabilities will be included in the next release of
Parallel Python software (http://www.parallelpython.com), which is
coming soon.

Couldn't you just provide similar conveniences on top of MPI? Searching
for "Python MPI" yields a lot of existing work (as does "Python PVM"),
so perhaps someone has already done so.

Yes, it's possible to do it on the top of any environment which
supports IPC.

That's one more project... It seems that there is significant
interest in parallel computing in Python. Perhaps we should start a
special interest group? Not so much in order to work on a single
project; I believe that at the current state of parallel computing we
still need many different approaches to be tried. But an exchange of
experience could well be useful for all of us.

Well, I may just add that everybody is welcome to start discussion
regarding any parallel python project or idea in this forum:
http://www.parallelpython.com/compon...d,29/board,2.0

Jan 13 '07 #39

A.T.Hofkamp

On 2007-01-12, robert <no*****@no-spam-no-spam.invalidwrote:

>>
[1] http://www.python.org/pypi/parallel

I'd be interested in an overview.
For ease of use a major criterion for me would be a pure python
solution, which also does the job of starting and controlling the
other process(es) automatically right (by default) on common
platforms.

Let me add a few cents to the discussion with this announcement:

About three years ago, I wrote two Python modules, one called 'exec_proxy',
which uses ssh to run another exec_proxy instance at a remote machine, thus
providing ligh-weight transparent access to a machine across a network.

The idea behind this module was/is that by just using ssh you have network
transparency, much more light weight than most other distributed modules where
you have to start deamons at all machines.
Recently, the 'rthread' module was announced which takes the same approach (it
seems from the announcement). I have not compared both modules with each other.
The more interesting Python module called 'batchlib' lies on top of the former
(or any other module that provides transparency across the network). It
handles distribution of computation jobs in the form of a 'start-computation'
and 'get-results' pair of functions.

That is, you give it a set of machines it may use, you say to the entry-point,
compute for me this-and-this function with this-and-this parameters, and
batchlib does the rest.
(that is, it finds a free machine, copies the parameters over the network, runs
the job, the result is transported back, and you can get the result of a
computation by using the same (uniq) identification given by you when the job
was given to batchlib.)

We used it as computation backend for optimization problems, but since
'computation job' may mean anything, the module should be very generically
applicable.
Compared to most other parallel/distributed modules, I think that the other
modules more-or-less compare with exec_proxy (that is, they stop with
transparent network access), where exec_proxy was designed to have minimal
impact on required infra structure (ie just ssh or rsh which is generally
already available) and thus without many of the features available from the
other modules.

Batchlib starts where exec_proxy ends, namely lifting network primitives to the
level of providing a simple way of doing distributed computations (in the case
of exec_proxy, without adding network infra structure such as deamons).

Until now, both modules were used in-house, and it was not clear what we wanted
to do further with the software. Recently, we have decided that we have no
further use for this software (we think we want to move into a different
direction), clearing the way to release this software to the community.

You can get the software from my home page http://seweb.se.wtb.tue.nl/~hat
Both packages can be downloaded, and include documentation and an example.
The bad news is that I will not be able to do further development of these
modules. The code is 'end-of-life' for us.
Maybe you find the software useful,
Albert

Jan 17 '07 #40

Paul Boddie

A.T.Hofkamp skrev:

>
Let me add a few cents to the discussion with this announcement:

[Notes about exec_proxy, batchlib and rthread]

I've added entries for these modules, along with py.execnet, to the
parallel processing solutions page on the python.org Wiki:

http://wiki.python.org/moin/ParallelProcessing

Thanks for describing your work to us!

Paul

Jan 17 '07 #41

robert

Paul Boddie wrote:

A.T.Hofkamp skrev:
>Let me add a few cents to the discussion with this announcement:

[Notes about exec_proxy, batchlib and rthread]

I've added entries for these modules, along with py.execnet, to the
parallel processing solutions page on the python.org Wiki:

http://wiki.python.org/moin/ParallelProcessing

Thanks for describing your work to us!

Paul

as many libs are restriced to certain OS'es, and/or need/rely on
extension modules, few tags would probably improve: OS'es,
pure-python, dependeniess
Robert

Jan 17 '07 #42

Paul Boddie

robert wrote:

>
as many libs are restriced to certain OS'es, and/or need/rely on
extension modules, few tags would probably improve: OS'es,
pure-python, dependeniess

I've added some platform notes, although the library dependencies of
various MPI and PVM solutions are sort of obvious, but I'll get round
to adding those at some point unless you beat me to it. ;-)

Paul

Jan 18 '07 #43

parallelpython

On Jan 12, 11:52 am, Neal Becker <ndbeck...@gmail.comwrote:

parallelpyt...@gmail.com wrote:
Has anybody tried to runparallelpythonapplications?
It appears that if your application is computation-bound using 'thread'
or 'threading' modules will not get you any speedup. That is because
pythoninterpreter uses GIL(Global Interpreter Lock) for internal
bookkeeping. The later allows only onepythonbyte-code instruction to
be executed at a time even if you have a multiprocessor computer.
To overcome this limitation, I've created ppsmp module:
http://www.parallelpython.com
It provides an easy way to runparallelpythonapplications on smp
computers.
I would appreciate any comments/suggestions regarding it.
Thank you!

Looks interesting, but is there any way to use this for a cluster of
machines over a network (not smp)?

There are 2 major updates regarding Parallel Python: http://
www.parallelpython.com

1) Now (since version 1.2) parallel python software could be used for
cluster-wide parallelization (or even Internet-wide). It's also
renamed accordingly: pp (module is backward compatible with ppsmp)

2) Parallel Python became open source (under BSD license): http://
www.parallelpython.com/content/view/18/32/

Feb 5 '07 #44

Parallel Python

Similar topics