473,545 Members | 2,004 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Parallel Python

Has anybody tried to run parallel python applications?
It appears that if your application is computation-bound using 'thread'
or 'threading' modules will not get you any speedup. That is because
python interpreter uses GIL(Global Interpreter Lock) for internal
bookkeeping. The later allows only one python byte-code instruction to
be executed at a time even if you have a multiprocessor computer.
To overcome this limitation, I've created ppsmp module:
http://www.parallelpython.com
It provides an easy way to run parallel python applications on smp
computers.
I would appreciate any comments/suggestions regarding it.
Thank you!

Jan 6 '07 #1
43 4270
pa************@ gmail.com wrote:
Has anybody tried to run parallel python applications?
It appears that if your application is computation-bound using 'thread'
or 'threading' modules will not get you any speedup. That is because
python interpreter uses GIL(Global Interpreter Lock) for internal
bookkeeping. The later allows only one python byte-code instruction to
be executed at a time even if you have a multiprocessor computer.
To overcome this limitation, I've created ppsmp module:
http://www.parallelpython.com
It provides an easy way to run parallel python applications on smp
computers.
I would appreciate any comments/suggestions regarding it.
I always thought that if you use multiple processes (e.g. os.fork) then
Python can take advantage of multiple processors. I think the GIL locks
one processor only. The problem is that one interpreted can be run on
one processor only. Am I not right? Is your ppm module runs the same
interpreter on multiple processors? That would be very interesting, and
something new.
Or does it start multiple interpreters? Another way to do this is to
start multiple processes and let them communicate through IPC or a local
network.
Laszlo

Jan 8 '07 #2
Laszlo Nagy <ga*****@design aproduct.bizwro te:
pa************@ gmail.com wrote:
>Has anybody tried to run parallel python applications?
It appears that if your application is computation-bound using 'thread'
or 'threading' modules will not get you any speedup. That is because
python interpreter uses GIL(Global Interpreter Lock) for internal
bookkeeping. The later allows only one python byte-code instruction to
be executed at a time even if you have a multiprocessor computer.
To overcome this limitation, I've created ppsmp module:
http://www.parallelpython.com
It provides an easy way to run parallel python applications on smp
computers.
I would appreciate any comments/suggestions regarding it.
I always thought that if you use multiple processes (e.g. os.fork) then
Python can take advantage of multiple processors. I think the GIL locks
one processor only. The problem is that one interpreted can be run on
one processor only. Am I not right? Is your ppm module runs the same
interpreter on multiple processors? That would be very interesting, and
something new.
The GIL locks all processors, but just for one process. So, yes, if you
spawn off multiple processes then Python will take advantage of this. For
example we run Zope on a couple of dual processor dual core systems, so we
use squid and pound to ensure that the requests are spread across 4
instances of Zope on each machine. That way we do get a fairly even cpu
usage.

For some applications it is much harder to split the tasks across separate
processes rather than just separate threads, but there is a benefit once
you've done it since you can then distribute the processing across cpus on
separate machines.

The 'parallel python' site seems very sparse on the details of how it is
implemented but it looks like all it is doing is spawning some subprocesses
and using some simple ipc to pass details of the calls and results. I can't
tell from reading it what it is supposed to add over any of the other
systems which do the same.

Combined with the closed source 'no redistribution' license I can't really
see anyone using it.
Jan 8 '07 #3
Duncan Booth wrote:
Laszlo Nagy <ga*****@design aproduct.bizwro te:
The 'parallel python' site seems very sparse on the details of how it is
implemented but it looks like all it is doing is spawning some subprocesses
and using some simple ipc to pass details of the calls and results. I can't
tell from reading it what it is supposed to add over any of the other
systems which do the same.

Combined with the closed source 'no redistribution' license I can't really
see anyone using it.

Thats true. IPC through sockets or (somewhat faster) shared memory - cPickle at least - is usually the maximum of such approaches.
See http://groups.google.de/group/comp.l...22ec289f30b26a

For tasks really requiring threading one can consider IronPython.
Most advanced technique I've see for CPython ist posh : http://poshmodule.sourceforge.net/

I'd say Py3K should just do the locking job for dicts / collections, obmalloc and refcount (or drop the refcount mechanism) and do the other minor things in order to enable free threading. Or at least enable careful sharing of Py-Objects between multiple separated Interpreter instances of one process.
..NET and Java have shown that the speed costs for this technique are no so extreme. I guess less than 10%.
And Python is a VHLL with less focus on speed anyway.
Also see discussions in http://groups.google.de/group/comp.l...22ec289f30b26a .
Robert
Jan 8 '07 #4
I always thought that if you use multiple processes (e.g. os.fork) then
Python can take advantage of multiple processors. I think the GIL locks
one processor only. The problem is that one interpreted can be run on
one processor only. Am I not right? Is your ppm module runs the same
interpreter on multiple processors? That would be very interesting, and
something new.
Or does it start multiple interpreters? Another way to do this is to
start multiple processes and let them communicate through IPC or a local
network.
That's right. ppsmp starts multiple interpreters in separate
processes and organize communication between them through IPC.

Originally ppsmp was designed to speedup an existent application
which is written in pure python but is quite computationally expensive
(the other ways to optimize it were used too). It was also required
that the application will run out of the box on the most standard Linux
distributions (they all contain CPython).

Jan 10 '07 #5

robert wrote:
Thats true. IPC through sockets or (somewhat faster) shared memory - cPickle at least - is usually the maximum of such approaches.
See http://groups.google.de/group/comp.l...22ec289f30b26a

For tasks really requiring threading one can consider IronPython.
Most advanced technique I've see for CPython ist posh : http://poshmodule.sourceforge.net/

In SciPy there is an MPI-binding project, mpi4py.

MPI is becoming the de facto standard for high-performance parallel
computing, both on shared memory systems (SMPs) and clusters. Spawning
threads or processes is not recommended way to do numerical parallel
computing. Threading makes programming certain tasks more convinient
(particularly GUI and I/O, for which the GIL does not matter anyway),
but is not a good paradigm for dividing CPU bound computations between
multiple processors. MPI is a high level API based on a concept of
"message passing", which allows the programmer to focus on solving the
problem, instead on irrelevant distractions such as thread managament
and synchronization .

Although MPI has standard APIs for C and Fortran, it may be used with
any programming language. For Python, an additional advantage of using
MPI is that the GIL has no practical consequence for performance. The
GIL can lock a process but not prevent MPI from using multiple
processors as MPI is always using multiple processes. For IPC, MPI will
e.g. use shared-memory segments on SMPs and tcp/ip on clusters, but all
these details are hidden.

It seems like 'ppsmp' of parallelpython. com is just an reinvention of a
small portion of MPI.
http://mpi4py.scipy.org/
http://en.wikipedia.org/wiki/Message_Passing_Interface

Jan 10 '07 #6

parallelpyt...@ gmail.com wrote:
That's right. ppsmp starts multiple interpreters in separate
processes and organize communication between them through IPC.
Thus you are basically reinventing MPI.
http://mpi4py.scipy.org/
http://en.wikipedia.org/wiki/Message_Passing_Interface

Jan 10 '07 #7

In article <11************ **********@p59g 2000hsd.googleg roups.com>,
"sturlamold en" <st**********@y ahoo.nowrites:
|>
|MPI is becoming the de facto standard for high-performance parallel
|computing, both on shared memory systems (SMPs) and clusters.

It has been for some time, and is still gaining ground.

|Spawning
|threads or processes is not recommended way to do numerical parallel
|computing.

Er, MPI works by getting SOMETHING to spawn processes, which then
communicate with each other.

|Threading makes programming certain tasks more convinient
|(particularly GUI and I/O, for which the GIL does not matter anyway),
|but is not a good paradigm for dividing CPU bound computations between
|multiple processors. MPI is a high level API based on a concept of
|"message passing", which allows the programmer to focus on solving the
|problem, instead on irrelevant distractions such as thread managament
|and synchronization .

Grrk. That's not quite it.

The problem is that the current threading models (POSIX threads and
Microsoft's equivalent) were intended for running large numbers of
semi-independent, mostly idle, threads: Web servers and similar.
Everything about them, including their design (such as it is), their
interfaces and their implementations , are unsuitable for parallel HPC
applications. One can argue whether that is insoluble, but let's not,
at least not here.

Now, Unix and Microsoft processes are little better but, because they
are more separate (and, especially, because they don't share memory)
are MUCH easier to run effectively on shared memory multi-CPU systems.
You still have to play administrator tricks, but they aren't as foul
as the ones that you have to play for threaded programs. Yes, I know
that it is a bit Irish for the best way to use a shared memory system
to be to not share memory, but that's how it is.
Regards,
Nick Maclaren.
Jan 10 '07 #8

Nick Maclaren wrote:
as the ones that you have to play for threaded programs. Yes, I know
that it is a bit Irish for the best way to use a shared memory system
to be to not share memory, but that's how it is.
Thank you for clearing that up.

In any case, this means that Python can happily keep its GIL, as the
CPU bound 'HPC' tasks for which the GIL does matter should be done
using multiple processes (not threads) anyway. That leaves threads as a
tool for programming certain i/o tasks and maintaining 'responsive'
user interfaces, for which the GIL incidentally does not matter.

I wonder if too much emphasis is put on thread programming these days.
Threads may be nice for programming web servers and the like, but not
for numerical computing. Reading books about thread programming, one
can easily get the impression that it is 'the' way to parallelize
numerical tasks on computers with multiple CPUs (or multiple CPU
cores). But if threads are inherently designed and implemented to stay
idle most of the time, that is obviously not the case.

I like MPI. Although it is a huge API with lots of esoteric functions,
I only need to know a handfull to cover my needs. Not to mention the
fact that I can use MPI with Fortran, which is frowned upon by computer
scientists but loved by scientists and engineers specialized in any
other field.

Jan 10 '07 #9
nm**@cus.cam.ac .uk (Nick Maclaren) writes:
Yes, I know that it is a bit Irish for the best way to use a shared
memory system to be to not share memory, but that's how it is.
But I thought serious MPI implementations use shared memory if they
can. That's the beauty of it, you can run your application on SMP
processors getting the benefit of shared memory, or split it across
multiple machines using ethernet or infiniband or whatever, without
having to change the app code.
Jan 10 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1731
by: Markus Franz | last post by:
Hi. I have a difficult problem: An array contains several different URLs. I want to load these websites in parallel by using a HTTP-Request. How can I do this in PHP? Up to now I did this with an external Python script because Python offers process control functions. But in PHP only exist restricted possibilities for using threads /...
2
5311
by: Andrei D. | last post by:
Hello Python newsgroup, In the process of developing a big ssh wrapper for sending commands to multiple hosts over the last few months, I (almost accidentally, considering I'm really just an "amateur hacker" :-) was very pleased to discover at one stage how to run processes in parallel using python, which is powerful technology to say the...
0
1303
by: mmf | last post by:
Hi! I am using Python for CGI scripting. I had the following script: #!/usr/bin/python import sys print 'Content-type: text/html\r\n\r\n' print 'starting...' sys.stdout.flush() x = 999999
6
7228
by: Novice Experl | last post by:
I'd like to write a simple application that interfaces with the parallel port, and changes the data on it according to keyboard input. I hope I can get it to run under windows xp and / or windows 2000. How can I do this? What do I need to know? It doesn't look like the standard library (the one under my pillow) has that feature. In addition,...
0
1266
by: fiepye | last post by:
Hello. I am interested in parallel computing in Python. Except other modulesI would like to use new modules for vector and matrix operations and scientific computing SciPy and NumPy. I have already installed LAPACK and BLAS libraries. It works well. For object oriented parallel programming in Python on a single machine I can use techniques...
10
2459
by: Mythmon | last post by:
I am trying to make a program that will basically simulate a chess clock in python. To do this I have two threads running, one that updates the currently running clock, and one that watches for a keypress. I am using the effbot Console module, and that is where I get the events for the keypresses. But when I press space it crashes shortly...
5
1931
by: fdu.xiaojf | last post by:
Hi all, I'm interested in Parallel Python and I learned from the website of Parallel Python that it can run on SMP and clusters. But can it run on a our muti-CPU server ? We are running an origin3800 server with 128 CPUs. Thanks.
5
1637
by: George Sakkis | last post by:
I'm looking for any existing packages or ideas on how to implement the equivalent of a generator (in the Python sense, i.e. http://www.python.org/dev/peps/pep-0255/) in a parallel/distributed way. As a use case, imagine a function that generates a range of primes. I'd like to be able to do something along the following lines: def...
2
4408
by: hari | last post by:
Hi all, I need to automate printer command testing, prinetr supports parallel/ serial/USB.How can i send the commands from python to printer. I have got pyparallel, as am new to python, no idea how to work on it. Please give some tips,The comamnd to be sent to the printer is hex data "1B 40".please give a example,it will be grateful.
0
7415
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7928
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7775
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
5997
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5344
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
4963
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3451
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1902
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
0
726
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.