threading and multicores, pros and cons

Maric Michaud

This is a recurrent problem I encounter when I try to sell python solutions to
my customers. I'm aware that this problem is sometimes overlooked, but here
is the market's law.

I've heard of a bunch of arguments to defend python's choice of GIL, but I'm
not quite sure of their technical background, nor what is really important
and what is not. These discussions often end in a prudent "python has made a
choice among others"... which is not really convincing.

If some guru has made a good recipe, or want to resume the main points it
would be really appreciated.

regards,

--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
Mobile: +33 632 77 00 21

Feb 14 '07 #1

Subscribe Post Reply

1874

Paul Rubin

Maric Michaud <ma***@aristote.infowrites:

If some guru has made a good recipe, or want to resume the main points it
would be really appreciated.

Basically Python applications are usually not too CPU-intensive; there
are some ways you can get parallelism with reasonable extra effort;
and for most of Python's history, multi-CPU systems have been rather
exotic so the GIL didn't create too big a problem. Right now it is
starting to become more of a problem than before, but it's not yet
intolerable. Obviously something will have to be done about it in the
long run, maybe with PyPy.

Feb 14 '07 #2

Maric Michaud

Le mercredi 14 février 2007 05:49, Paul Rubin a écrit*:

Basically Python applications are usually not too CPU-intensive; there
are some ways you can get parallelism with reasonable extra effort;

Basically, while not CPU intensive, application server needs to get benefitof
all resources of the hardware.
When a customer comes with his new beautiful dual-core server and get a basic
plone install up and running, he will immediately compare it to J2EE and
wonder why he should pay a consultant to make it work properly.
At this time, it 's not easy to explain him that python is not flawed compared
to Java, and that he will not regret his choice in the future.
First impression may be decisive.

The historical explanation should be inefficient here, I'm afraid. What about
the argument that said that multi threading is not so good for parallelism ?
Is it strong enough ?

--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
Mobile: +33 632 77 00 21

Feb 14 '07 #3

Paul Rubin

Maric Michaud <ma***@aristote.infowrites:

Le mercredi 14 février 2007 05:49, Paul Rubin a écrit*:
Basically Python applications are usually not too CPU-intensive; there
are some ways you can get parallelism with reasonable extra effort;
Basically, while not CPU intensive, application server needs to get
benefit of all resources of the hardware.

But this is impossible--if the application is not CPU intensive, by
definition it leaves a lot of the available CPU cycles unused.

When a customer comes with his new beautiful dual-core server and
get a basic plone install up and running, he will immediately
compare it to J2EE and wonder why he should pay a consultant to make
it work properly. At this time, it 's not easy to explain him that
python is not flawed compared to Java, and that he will not regret
his choice in the future. First impression may be decisive.

That is true, parallelism is an area where Java is ahead of us.

The historical explanation should be inefficient here, I'm
afraid. What about the argument that said that multi threading is
not so good for parallelism ? Is it strong enough ?

It's not much good for parallelism in the typical application that
spends most of its time blocked waiting for I/O. That is many
applications. It might even even be most applications. But there are
still such things as CPU-intensive applications which can benefit from
parallelism, and Python has a weak spot there.

Feb 14 '07 #4

garrickp

On Feb 13, 9:07 pm, Maric Michaud <m...@aristote.infowrote:

I've heard of a bunch of arguments to defend python's choice of GIL, but I'm
not quite sure of their technical background, nor what is really important
and what is not. These discussions often end in a prudent "python has made a
choice among others"... which is not really convincing.

Well, INAG (I'm not a Guru), but we recently had training from a Guru.
When we brought up this question, his response was fairly simple.
Paraphrased for inaccuracy:

"Some time back, a group did remove the GIL from the python core, and
implemented locks on the core code to make it threadsafe. Well, the
problem was that while it worked, the necessary locks it made single
threaded code take significantly longer to execute."

He then proceeded to show us how to achieve the same effect
(multithreading python for use on multi-core computers) using popen2
and stdio pipes.

FWIW, ~G

Feb 14 '07 #5

Istvan Albert

On Feb 14, 1:33 am, Maric Michaud <m...@aristote.infowrote:

At this time, it 's not easy to explain him that python
is notflawed compared to Java, and that he will not
regret his choice in the future.

Database adaptors such as psycopg do release the GIL while connecting
and exchanging data. Apache's MPM (multi processing module) can run
mod_python and with that multiple python instances as separate
processes thus avoiding the global lock as well.

plone install up and running, he will immediately compare it to
J2EE wonder why he should pay a consultant to make it work properly.

I really doubt that any performance difference will be due to the
global interpreter lock. This not how things work. You most certainly
have far more substantial bottlenecks in each application.

i.

Feb 14 '07 #6

Nikita the Spider

In article <ma***************************************@python. org>,
Maric Michaud <ma***@aristote.infowrote:

This is a recurrent problem I encounter when I try to sell python solutions
to
my customers. I'm aware that this problem is sometimes overlooked, but here
is the market's law.

I've heard of a bunch of arguments to defend python's choice of GIL, but I'm
not quite sure of their technical background, nor what is really important
and what is not. These discussions often end in a prudent "python has made a
choice among others"... which is not really convincing.

If some guru has made a good recipe, or want to resume the main points it
would be really appreciated.

When designing a new Python application I read a fair amount about the
implications of multiple cores for using threads versus processes, and
decided that using multiple processes was the way to go for me. On that
note, there a (sort of) new module available that allows interprocess
communication via shared memory and semaphores with Python. You can find
it here:
http://NikitaTheSpider.com/python/shm/

Hope this helps

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

Feb 14 '07 #7

sjdevnull

On Feb 14, 1:44 am, Paul Rubin <http://phr...@NOSPAM.invalidwrote:

When a customer comes with his new beautiful dual-core server and
get a basic plone install up and running, he will immediately
compare it to J2EE and wonder why he should pay a consultant to make
it work properly. At this time, it 's not easy to explain him that
python is not flawed compared to Java, and that he will not regret
his choice in the future. First impression may be decisive.

That is true, parallelism is an area where Java is ahead of us.

Java's traditionally been ahead in one case, but well behind in
general.

Java has historically had no support at all for real multiple process
solutions (akin to fork() or ZwCreateProcess() with NULL
SectionHandle), which should make up the vast majority of parallel
programs (basically all of those except where you don't want memory
protection).

Has this changed in recent Java releases? Is there a way to use
efficient copy-on-write multiprocess architectures?

Feb 14 '07 #8

Paul Rubin

"sj*******@yahoo.com" <sj*******@yahoo.comwrites:

Java has historically had no support at all for real multiple process
solutions (akin to fork() or ZwCreateProcess() with NULL
SectionHandle), which should make up the vast majority of parallel
programs (basically all of those except where you don't want memory
protection).

I don't know what ZwCreateProcess is (sounds like a Windows-ism) but I
remember using popen() under Java 1.1 in Solaris. That at least
allows launching a new process and communicating with it. I don't
know if there was anything like mmap. I think this is mostly a
question of library functions--you could certainly write JNI
extensions for that stuff.

Has this changed in recent Java releases? Is there a way to use
efficient copy-on-write multiprocess architectures?

I do think they've been adding more stuff for parallelism in general.

Feb 14 '07 #9

sjdevnull

On Feb 14, 4:37 pm, Paul Rubin <http://phr...@NOSPAM.invalidwrote:

"sjdevn...@yahoo.com" <sjdevn...@yahoo.comwrites:
Java has historically had no support at all for real multiple process
solutions (akin to fork() or ZwCreateProcess() with NULL
SectionHandle), which should make up the vast majority of parallel
programs (basically all of those except where you don't want memory
protection).

I don't know what ZwCreateProcess is (sounds like a Windows-ism)

Yeah, it's the Window equivalent to fork. Does true copy-on-write, so
you can do efficient multiprocess work.

but I
remember using popen() under Java 1.1 in Solaris. That at least
allows launching a new process and communicating with it.

Yep. That's okay for limited kinds of applications.

I don't know if there was anything like mmap.

That would be important as well.

I think this is mostly a
question of library functions--you could certainly write JNI
extensions for that stuff.

Sure. If you're writing extensions you can work around the GIL, too.

Has this changed in recent Java releases? Is there a way to use
efficient copy-on-write multiprocess architectures?

I do think they've been adding more stuff for parallelism in general.

Up through 1.3/1.4 or so they were pretty staunchly in the "threads
for everything!" camp, but they've added a select/poll-style call a
couple versions back. That was a pretty big sticking point previously.

Feb 14 '07 #10

Paul Rubin

"sj*******@yahoo.com" <sj*******@yahoo.comwrites:

question of library functions--you could certainly write JNI
extensions for that stuff [access to mmap, etc.]
Sure. If you're writing extensions you can work around the GIL, too.

I don't think that's comparable--if you have extensions turning off
the GIL, they can't mess with Python data objects, which generally
assume the GIL's presence. Python's mmap module can't do that either.

Up through 1.3/1.4 or so they were pretty staunchly in the "threads
for everything!" camp, but they've added a select/poll-style call a
couple versions back. That was a pretty big sticking point previously.

They've gone much further now and they actually have some STM features:

http://www-128.ibm.com/developerwork...ry/j-jtp11234/

Feb 14 '07 #11

MRAB

On Feb 14, 3:24 pm, garri...@gmail.com wrote:

On Feb 13, 9:07 pm, Maric Michaud <m...@aristote.infowrote:

I've heard of a bunch of arguments to defend python's choice of GIL, but I'm
not quite sure of their technical background, nor what is really important
and what is not. These discussions often end in a prudent "python has made a
choice among others"... which is not really convincing.

Well, INAG (I'm not a Guru), but we recently had training from a Guru.
When we brought up this question, his response was fairly simple.
Paraphrased for inaccuracy:

"Some time back, a group did remove the GIL from the python core, and
implemented locks on the core code to make it threadsafe. Well, the
problem was that while it worked, the necessary locks it made single
threaded code take significantly longer to execute."

He then proceeded to show us how to achieve the same effect
(multithreading python for use on multi-core computers) using popen2
and stdio pipes.

Hmm. I wonder whether it would be possible to have a pair of python
cores, one for single-threaded code (no locks necessary) and the other
for multi-threaded code. When the Python program went from single-
threaded to multi-threaded or multi-threaded to single-threaded there
would be a switch from one core to the other.

Feb 14 '07 #12

Maric Michaud

Le mercredi 14 février 2007 16:24, ga******@gmail.com a écrit*:

"Some time back, a group did remove the GIL from the python core, and
implemented locks on the core code to make it threadsafe. Well, the
problem was that while it worked, the necessary locks it made single
threaded code take significantly longer to execute."

Very interesting point, this is exactly the sort of thing I'm looking for. Any
valuable link on this ?

--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
Mobile: +33 632 77 00 21

Feb 15 '07 #13

Paul Rubin

Maric Michaud <ma***@aristote.infowrites:

"Some time back, a group did remove the GIL from the python core, and
implemented locks on the core code to make it threadsafe. Well, the
problem was that while it worked, the necessary locks it made single
threaded code take significantly longer to execute."

Very interesting point, this is exactly the sort of thing I'm
looking for. Any valuable link on this ?

I think it was a long time ago, Python 1.5.2 or something. However it
really wasn't that useful, since as Garrick said, it slowed Python
down. The reason was CPython's structures weren't designed for thread
safety so it needed a huge amount of locking/releasing. For example,
adjusting any reference count required setting and releasing a lock,
and CPython does this all the time. Getting rid of the GIL in a
serious way requires radically changing the interpreter, not just
sticking some locks here and there.

Feb 15 '07 #14

John Nagle

If locking is expensive on x86, it's implemented wrong.
It's done right in QNX, with inline code for the non-blocking
case. Not sure about the current libraries for Linux, but
by now, somebody should have gotten this right.

John Nagle

Paul Rubin wrote:

Maric Michaud <ma***@aristote.infowrites:

>>>"Some time back, a group did remove the GIL from the python core, and
implemented locks on the core code to make it threadsafe. Well, the
problem was that while it worked, the necessary locks it made single
threaded code take significantly longer to execute."

Feb 15 '07 #15

Paul Rubin

John Nagle <na***@animats.comwrites:

If locking is expensive on x86, it's implemented wrong.
It's done right in QNX, with inline code for the non-blocking case.

Acquiring the lock still takes an expensive instruction, LOCK XCHG or
whatever. I think QNX is usually run on embedded cpu's with less
extensive caching as these multicore x86's, so the lock prefix may be
less expensive in the QNX systems.

Feb 15 '07 #16

John Nagle

Paul Rubin wrote:

John Nagle <na***@animats.comwrites:

> If locking is expensive on x86, it's implemented wrong.
It's done right in QNX, with inline code for the non-blocking case.

Acquiring the lock still takes an expensive instruction, LOCK XCHG or
whatever. I think QNX is usually run on embedded cpu's with less
extensive caching as these multicore x86's, so the lock prefix may be
less expensive in the QNX systems.

That's not so bad. See

http://lists.freebsd.org/pipermail/f...st/033462.html

But there are dumb thread implementations that make
a system call for every lock.

John Nagle

Feb 15 '07 #17

Paul Rubin

John Nagle <na***@animats.comwrites:

But there are dumb thread implementations that make
a system call for every lock.

Yes, a sys call on each lock access would really be horrendous. But I
think that in a modern cpu, LOCK XCHG costs as much as hundreds of
regular instructions. Doing that on every adjustment of a Python
reference count is enough to impact the interpreter significantly.
It's not just mutating user data; every time you use an integer, or
call a function and make an arg tuple and bind the function's locals
dictionary, you're touching refcounts.

The preferred locking scheme in Linux these days is called futex,
which avoids system calls in the uncontended case--see the docs.

Feb 15 '07 #18

Rhamphoryncus

On Feb 14, 4:30 pm, "MRAB" <goo...@mrabarnett.plus.comwrote:

Hmm. I wonder whether it would be possible to have a pair of python
cores, one for single-threaded code (no locks necessary) and the other
for multi-threaded code. When the Python program went from single-
threaded to multi-threaded or multi-threaded to single-threaded there
would be a switch from one core to the other.

I have explored this option (and some simpler variants). Essentially,
you end up rewriting a massive amount of CPython's codebase to change
the refcount API. Even all the C extension types assume the refcount
can be statically initialized (which may not be true if you're trying
to make it efficient on multiple CPUs.)

Once you realize the barrier for entry is so high you start
considering alternative implementations. Personally, I'm watching
PyPy to see if they get reasonable performance using JIT. Then I can
start hacking on it.

--
Adam Olsen, aka Rhamphoryncus

Feb 15 '07 #19

Paul Boddie

On 15 Feb, 00:14, "sjdevn...@yahoo.com" <sjdevn...@yahoo.comwrote:

>
Yeah, it's the Window equivalent to fork. Does true copy-on-write, so
you can do efficient multiprocess work.

Aside from some code floating around the net which possibly originates
from some book on Windows systems programming, is there any reference
material on ZwCreateProcess, is anyone actually using it as "fork on
Windows", and would it be in any way suitable for an implementation of
os.fork in the Python standard library? I only ask because there's a
lot of folklore about this particular function (everyone seems to
repeat more or less what you've just said), but aside from various
Cygwin mailing list threads where they reject its usage, there's
precious little information of substance.

Not that I care about Windows, but it would be useful to be able to
offer fork-based multiprocessing solutions to people using that
platform. Although the python-dev people currently seem more intent in
considering (and now hopefully rejecting) yet more syntax sugar [1],
it'd be nice to consider matters seemingly below the python-dev
threshold of consideration and offer some kind of roadmap for
convenient parallel processing.

Paul

[1] http://mail.python.org/pipermail/pyt...ry/070939.html

Feb 15 '07 #20

skip

MaricLe mercredi 14 février 2007 16:24, ga******@gmail.com a écrit*:

>"Some time back, a group did remove the GIL from the python core, and
implemented locks on the core code to make it threadsafe. Well, the
problem was that while it worked, the necessary locks it made single
threaded code take significantly longer to execute."

MaricVery interesting point, this is exactly the sort of thing I'm
Mariclooking for. Any valuable link on this ?

Google for "python free threading stein" then click the first link.

Skip

Feb 15 '07 #21

Aahz

In article <ma***************************************@python. org>,
Maric Michaud <ma***@aristote.infowrote:

>
This is a recurrent problem I encounter when I try to sell python
solutions to my customers. I'm aware that this problem is sometimes
overlooked, but here is the market's law.

Could you expand more on what exactly the problem is?
--
Aahz (aa**@pythoncraft.com) <* http://www.pythoncraft.com/

"I disrespectfully agree." --SJM

Feb 18 '07 #22

Paul Rubin

Nikita the Spider <Ni*************@gmail.comwrites:

note, there a (sort of) new module available that allows interprocess
communication via shared memory and semaphores with Python. You can find
it here:
http://NikitaTheSpider.com/python/shm/

This is from the old shm module that was floating around several years
ago? Cool, I remember trying to find it recently and it seemed to
have disappeared--the original url was dead and it wasn't mirrored
anywhere. How about putting it in CheeseShop or some other such
repository? Having it in the stdlib would be even better, of course.

Feb 20 '07 #23

Nikita the Spider

In article <7x************@ruckus.brouhaha.com>,
Paul Rubin <http://ph****@NOSPAM.invalidwrote:

Nikita the Spider <Ni*************@gmail.comwrites:
note, there a (sort of) new module available that allows interprocess
communication via shared memory and semaphores with Python. You can find
it here:
http://NikitaTheSpider.com/python/shm/

This is from the old shm module that was floating around several years
ago? Cool, I remember trying to find it recently and it seemed to
have disappeared--the original url was dead and it wasn't mirrored
anywhere.

Yes, this is almost certainly the one which you remember. I had a hard
time finding it myself, but it's still shipped with a few Linux distros
that have their SVN repository online and indexed by Google.

FYI, I fixed a few bugs in the original, added some small features and a
wrapper module. If you're compiling for Linux you might need to remove
the HAVE_UNION_SEMUN definition from setup.py. (Just learned this
yesterday thanks to Eric J. and I haven't updated the documentation yet.)

How about putting it in CheeseShop or some other such repository?

Hmmm, I hadn't thought about that since I've never used the Cheese Shop
myself. <honestly-not-being-grouchy-just-naive>What benefits does Cheese
Shop confer to someone looking for a package?</I ask because from my
perspective it just adds overhead to package maintenance.

Having it in the stdlib would be even better, of course.

That'd be fine with me!

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

Feb 20 '07 #24

Paul Boddie

Nikita the Spider wrote:

>
Hmmm, I hadn't thought about that since I've never used the Cheese Shop
myself. <honestly-not-being-grouchy-just-naive>What benefits does Cheese
Shop confer to someone looking for a package?</I ask because from my
perspective it just adds overhead to package maintenance.

The Python Package Index, as I prefer to call it (but we're talking
about the same thing), doesn't really make any special demands on
distribution or maintenance: you just need to register yourself and
add an entry for the package, filling in a few fields such as the
homepage and perhaps the download link; you can also upload archives
if you'd prefer. If you have a PKG-INFO file, you can either upload
that in order to get fields filled out more conveniently (as long as
the Package Index likes the file), and if you have a setup.py script
you might be able to use the upload feature with that (and the PKG-
INFO file, I suppose).

Don't be confused by all the setuptools extras and any insistence that
the Package Index works best with things that are packaged as Python
Eggs: whilst that might confer certain benefits, mostly to users who
rely on Egg dependencies, it's peripheral to the purpose of the
Package Index itself.

Paul

Feb 20 '07 #25

threading and multicores, pros and cons

Similar topics