2.6, 3.0, and truly independent intepreters - Page 2

Andy

Dear Python dev community,

I'm CTO at a small software company that makes music visualization
software (you can check us out at www.soundspectrum.com). About two
years ago we went with decision to use embedded python in a couple of
our new products, given all the great things about python. We were
close to using lua but for various reasons we decided to go with
python. However, over the last two years, there's been one area of
grief that sometimes makes me think twice about our decision to go
with python...

Some background first... Our software is used for entertainment and
centers around real time, high-performance graphics, so python's
performance, embedded flexibility, and stability are the most
important issues for us. Our software targets a large cross section
of hardware and we currently ship products for Win32, OS X, and the
iPhone and since our customers are end users, our products have to be
robust, have a tidy install footprint, and be foolproof. Basically,
we use embedded python and use it to wrap our high performance C++
class set which wraps OpenGL, DirectX and our own software renderer.
In addition to wrapping our C++ frameworks, we use python to perform
various "worker" tasks on worker thread (e.g. image loading and
processing). However, we require *true* thread/interpreter
independence so python 2 has been frustrating at time, to say the
least. Please don't start with "but really, python supports multiple
interpreters" because I've been there many many times with people.
And, yes, I'm aware of the multiprocessing module added in 2.6, but
that stuff isn't lightweight and isn't suitable at all for many
environments (including ours). The bottom line is that if you want to
perform independent processing (in python) on different threads, using
the machine's multiple cores to the fullest, then you're out of luck
under python 2.

Sadly, the only way we could get truly independent interpreters was to
put python in a dynamic library, have our installer make a *duplicate*
copy of it during the installation process (e.g. python.dll/.bundle ->
python2.dll/.bundle) and load each one explicitly in our app, so we
can get truly independent interpreters. In other words, we load a
fresh dynamic lib for each thread-independent interpreter (you can't
reuse the same dynamic library because the OS will just reference the
already-loaded one).

From what I gather from the python community, the basis for not
offering "real" muti-threaded support is that it'd add to much
internal overhead--and I couldn't agree more. As a high performance C
and C++ guy, I fully agree that thread safety should be at the high
level, not at the low level. BUT, the lack of truly independent
interpreters is what ultimately prevents using python in cool,
powerful ways. This shortcoming alone has caused game developers--
both large and small--to choose other embedded interpreters over
python (e.g. Blizzard chose lua over python). For example, Apple's
QuickTime API is powerful in that high-level instance objects can
leverage performance gains associated with multi-threaded processing.
Meanwhile, the QuickTime API simply lists the responsibilitie s of the
caller regarding thread safety and that's all its needs to do. In
other words, CPython doesn't need to step in an provide a threadsafe
environment; it just needs to establish the rules and make sure that
its own implementation supports those rules.

More than once, I had actually considered expending company resources
to develop a high performance, truly independent interpreter
implementation of the python core language and modules but in the end
estimated that the size of that project would just be too much, given
our company's current resources. Should such an implementation ever
be developed, it would be very attractive for companies to support,
fund, and/or license. The truth is, we just love python as a
language, but it's lack of true interpreter independence (in a
interpreter as well as in a thread sense) remains a *huge* liability.

So, my question becomes: is python 3 ready for true multithreaded
support?? Can we finally abandon our Frankenstein approach of loading
multiple identical dynamic libs to achieve truly independent
interpreters?? I've reviewed all the new python 3 C API module stuff,
and all I have to say is: whew--better late then never!! So, although
that solves modules offering truly independent interpreter support,
the following questions remain:

- In python 3, the C module API now supports true interpreter
independence, but have all the modules in the python codebase been
converted over? Are they all now truly compliant? It will only take
a single static/global state variable in a module to potentially cause
no end of pain in a multiple interpreter environment! Yikes!

- How close is python 3 really to true multithreaded use? The
assumption here is that caller ensures safety (e.g. ensuring that
neither interpreter is in use when serializing data from one to
another).

I believe that true python independent thread/interpreter support is
paramount and should become the top priority because this is the key
consideration used by developers when they're deciding which
interpreter to embed in their app. Until there's a hello world that
demonstrates running independent python interpreters on multiple app
threads, lua will remain the clear choice over python. Python 3 needs
true interpreter independence and multi-threaded support!
Thanks,
Andy O'Meara

Oct 22 '08

Subscribe Reply

114

3908

<
1
2
3
4
>
Last »

Rhamphoryncus

On Oct 22, 10:32*am, Andy <and...@gmail.c omwrote:

Dear Python dev community,

I'm CTO at a small software company that makes music visualization
software (you can check us out atwww.soundspec trum.com). *About two
years ago we went with decision to use embedded python in a couple of
our new products, given all the great things about python. *We were
close to using lua but for various reasons we decided to go with
python. *However, over the last two years, there's been one area of
grief that sometimes makes me think twice about our decision to go
with python...

Some background first... * Our software is used for entertainment and
centers around real time, high-performance graphics, so python's
performance, embedded flexibility, and stability are the most
important issues for us. *Our software targets a large cross section
of hardware and we currently ship products for Win32, OS X, and the
iPhone and since our customers are end users, our products have to be
robust, have a tidy install footprint, and be foolproof. *Basically,
we use embedded python and use it to wrap our high performance C++
class set which wraps OpenGL, DirectX and our own software renderer.
In addition to wrapping our C++ frameworks, we use python to perform
various "worker" tasks on worker thread (e.g. image loading andprocessing). *However, we require *true* thread/interpreter
independence so python 2 has been frustrating at time, to say the
least. *Please don't start with "but really, python supports multiple
interpreters" because I've been there many many times with people.
And, yes, I'm aware of the multiprocessing module added in 2.6, but
that stuff isn't lightweight and isn't suitable at all for many
environments (including ours). *The bottom line is that if you want to
perform independentproc essing (in python) on different threads, using
the machine's multiple cores to the fullest, then you're out of luck
under python 2.

Sadly, the only way we could get truly independent interpreters was to
put python in a dynamic library, have our installer make a *duplicate*
copy of it during the installationpro cess(e.g. python.dll/.bundle ->
python2.dll/.bundle) and load each one explicitly in our app, so we
can get truly independent interpreters. *In other words, we load a
fresh dynamic lib for each thread-independent interpreter (you can't
reuse the same dynamic library because the OS will just reference the
already-loaded one).

From what I gather from the python community, the basis for not
offering "real" muti-threaded support is that it'd add to much
internal overhead--and I couldn't agree more. *As a high performance C
and C++ guy, I fully agree that thread safety should be at the high
level, not at the low level. *BUT, the lack of truly independent
interpreters is what ultimately prevents using python in cool,
powerful ways. *This shortcoming alone has caused game developers--
both large and small--to choose other embedded interpreters over
python (e.g. Blizzard chose lua over python). *For example, Apple's
QuickTime API is powerful in that high-level instance objects can
leverage performance gains associated with multi-threadedprocess ing.
Meanwhile, the QuickTime API simply lists the responsibilitie s of the
caller regarding thread safety and that's all its needs to do. *In
other words, CPython doesn't need to step in an provide a threadsafe
environment; it just needs to establish the rules and make sure that
its own implementation supports those rules.

More than once, I had actually considered expending company resources
to develop a high performance, truly independent interpreter
implementation of the python core language and modules but in the end
estimated that the size of that project would just be too much, given
our company's current resources. *Should such an implementation ever
be developed, it would be very attractive for companies to support,
fund, and/or license. *The truth is, we just love python as a
language, but it's lack of true interpreter independence (in a
interpreter as well as in a thread sense) remains a *huge* liability.

So, my question becomes: is python 3 ready for true multithreaded
support?? *Can we finally abandon our Frankenstein approach of loading
multiple identical dynamic libs to achieve truly independent
interpreters?? I've reviewed all the new python 3 C API module stuff,
and all I have to say is: whew--better late then never!! *So, although
that solves modules offering truly independent interpreter support,
the following questions remain:

- In python 3, the C module API now supports true interpreter
independence, but have all the modules in the python codebase been
converted over? *Are they all now truly compliant? *It will only take
a single static/global state variable in a module to potentially cause
no end of pain in a multiple interpreter environment! *Yikes!

- How close is python 3 really to true multithreaded use? *The
assumption here is that caller ensures safety (e.g. ensuring that
neither interpreter is in use when serializing data from one to
another).

I believe that true python independent thread/interpreter support is
paramount and should become the top priority because this is the key
consideration used by developers when they're deciding which
interpreter to embed in their app. Until there's a hello world that
demonstrates running independent python interpreters on multiple app
threads, lua will remain the clear choice over python. *Python 3 needs
true interpreter independence and multi-threaded support!

What you describe, truly independent interpreters, is not threading at
all: it is processes, emulated at the application level, with all the
memory cost and none of the OS protections. True threading would
involve sharing most objects.

Your solution depends on what you need:
* Killable "threads" -OS processes
* multicore usage (GIL removal) -OS processes or alternative Python
implementations (PyPy/Jython/IronPython)
* Sane shared objects -safethread

Oct 22 '08 #11

Andy

>
What you describe, truly independent interpreters, is not threading at
all: it is processes, emulated at the application level, with all the
memory cost and none of the OS protections. *True threading would
involve sharing most objects.

Your solution depends on what you need:
* Killable "threads" -OS processes
* multicore usage (GIL removal) -OS processes or alternative Python
implementations (PyPy/Jython/IronPython)
* Sane shared objects -safethread

I realize what you're saying, but it's better said there's two issues
at hand:

1) Independent interpreters (this is the easier one--and solved, in
principle anyway, by PEP 3121, by Martin v. Löwis, but is FAR from
being carried through in modules as he pointed out). As you point
out, this doesn't directly relate to multi-threading BUT it is
intimately tied to the issue because if, in principle, every module
used instance data (rather than static data), then python would be
WELL on its way to "free threading" (as Jesse Noller calls it), or as
I was calling it "true multi-threading".

2) Barriers to "free threading". As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps (see my QuickTime API example). Perhaps if we could go
back in time, we would not put the GIL in place, strict guidelines
regarding multithreaded use would have been established, and PEP 3121
would have been mandatory for C modules. Then again--screw that, if I
could go back in time, I'd just go for the lottery tickets!! :^)

Anyway, I've been at this issue for quite a while now (we're
approaching our 3rd release cycle), so I'm pretty comfortable with the
principles at hand. I'd say the theme of your comments share the
theme of others here, so perhaps consider where end-user software
houses (like us) are coming from. Specifically, developing commercial
software for end users imposes some restrictions that open source
development communities aren't often as sensitive to, namely:

- Performance -- emulation is a no-go (e.g. Jython)
- Maturity and Licensing -- experimental/academic projects are no-go
(PyPy)
- Cross platform support -- love it or hate it, Win32 and OS X are all
that matter when you're talking about selling (and supporting)
software to the masses. I'm just the messenger here (ie. this is NOT
flamebait). We publish for OS X, so IronPython is therefore out.

Basically, our company is at a crossroads where we really need light,
clean "free threading" as Jesse calls it (e.g. on the iPhone, using
our python drawing wrapper to do primary drawing while running python
jobs on another thread doing image decoding and processing). In our
current iPhone app, we achieve this by using two python bundles
(dynamic libs) in the way I described in my initial post. Sure, thus
solves our problem, but it's pretty messy, sucks up resources, and has
been a pain to maintain.

Moving forward, please understand my posts here are also intended to
give the CPython dev community a glimpse of the issues that may not be
as visible to you guys (as they are for dev houses like us). For
example, it'd be pretty cool if Blizzard went with python instead of
lua, wouldn't you think? But some of the issues I've raised here no
doubt factor in to why end-user dev houses ultimately may have to pass
up python in favor of another interpreted language.

Bottom line: why give prospective devs any reason to turn down python--
there's just so many great things about python!

Regards,
Andy

Oct 23 '08 #12

Andy

Jesse, Terry, Martin -

First off, thanks again for your time and interest in this matter.
It's definitely encouraging to know that time and real effort is being
put into the matter and I hope my posts on this subject are hopefully
an informative data point for everyone here.

Thanks for that link to Adam Olsen's work, Jesse--I'll definitely look
more closely at it. As I mentioned in my previous post, end-user devs
like me are programmed to get nervous around new mods but at first
glance there definitely seems to be interesting. My initial reaction,
as interesting as the project is, goes back to by previous post about
putting all the object safety responsibility on the shoulders of the
API client. That way, one gets the best of both worlds: free
threading and no unnecessary object locking/blocking (ie. the API
client will manage moving the synchronization req'd to move objects
from one interpreter to another). I could have it wrong, but it seems
like safethread inserts some thread-safety features but they come at
the cost of performance. I know I keep mentioning it, but I think the
QuickTime API (and its documentation) is a great model for how any API
should approach threading. Check out their docs to see how they
address it; conceptually speaking, there's not a single line of thread
safety in QuickTime:

http://developer.apple.com/technotes/tn/tn2125.html

In short: multiple thread is tricky; it's the responsibility of the
API client to not do hazardous things.

And for the record: the module multiprocessing is totally great answer
for python-level MP stuff--very nice work, Jesse!

I'd like to post and discuss more, but I'll pick it up tomorrow...
All this stuff is fun and interesting to talk about, but I have to get
to some other things and it unfortunately comes down to cost
analysis. Sadly, I look at it as I can allocate 2-3 man months (~
$40k) to build our own basic python interpreter implementation that
solves our need for free threading and increased performance (we've
built various internal interpreters over the years so we have good
experience in house, our tools are high performance, and we only use a
pretty small subset of python). Or, there's the more attractive
approach to work with the python dev community and put that dev
expenditure into a form everyone can benefit from.
Regards,
Andy

On Oct 22, 5:21*pm, "Jesse Noller" <jnol...@gmail. comwrote:

On Wed, Oct 22, 2008 at 12:32 PM, Andy <and...@gmail.c omwrote:
And, yes, I'm aware of the multiprocessing module added in 2.6, but
that stuff isn't lightweight and isn't suitable at all for many
environments (including ours). *The bottom line is that if you want to
perform independent processing (in python) on different threads, using
the machine's multiple cores to the fullest, then you're out of luck
under python 2.

So, as the guy-on-the-hook for multiprocessing , I'd like to know what
you might suggest for it to make it more apt for your - and other
environments.

Additionally, have you looked at:https://launchpad.net/python-safethr...ethread/w/list
(By Adam olsen)

-jesse

Oct 23 '08 #13

Andy

You seem confused. *PEP 3121 is for isolated interpreters (ie emulated
processes), not threading.

Please reread my points--inherently isolated interpreters (ie. the top
level object) are indirectly linked to thread independence. I don't
want to argue, but you seem hell-bent on not hearing what I'm trying
to say here.

>
Got some real benchmarks to back that up? *How about testing it on a
16 core (or more) box and seeing how it scales?

I don't care to argue with you, and you'll have to take it on faith
that I'm not spouting hot air. But just to put this to rest, I'll
make it clear in this Jython case:

You can't sell software to end users and expect them have a recent,
working java distro. Look around you: no real commercial software
title that sells to soccer moms and gamers use java. There's method
to commercial software production, so please don't presume that you
know my job, product line, and customers better than me, ok?

Just to put things in perspective, I already have exposed my company
to more support and design liability than I knew I was getting into by
going with python (as a result of all this thread safety and
interpreter independence business). I love to go into that one, but
it's frankly just not a good use of my time right now. Please just
accept that when someone says an option is a deal breaker, then it's a
deal breaker. This isn't some dude's masters thesis project here--we
pay our RENT and put our KIDS through school because we sell and ship
software that works is meant to entertain people happy.

>
I'd like to see python used more, but fixing these things properly is
not as easy as believed. *Those in the user community see only their
immediate problem (threads don't use multicore). *People like me see
much bigger problems. *We need consensus on the problems, and how to
solve it, and a commitment to invest what's required.

Well, you seem to come down pretty hard on people that at your
doorstep saying their WILLING and INTERESTED in supporting python
development. And, you're exactly right: users see only their
immediate problem--but that's the definition of being a user. If
users saw the whole picture from the dev side, then they be
developers, not users.

Please consider that you're representing the python dev community
here; I'm you're friend here, not your enemy.

Andy

Oct 23 '08 #14

Rhamphoryncus

On Oct 22, 10:31*pm, Andy <and...@gmail.c omwrote:

You seem confused. *PEP 3121 is for isolated interpreters (ie emulated
processes), not threading.

Please reread my points--inherently isolated interpreters (ie. the top
level object) are indirectly linked to thread independence. *I don't
want to argue, but you seem hell-bent on not hearing what I'm trying
to say here.

I think the confusion is a matter of context. Your app, written in C
or some other non-python language, shares data between the threads and
thus treats them as real threads. However, from python's perspective
nothing is shared, and thus it is processes.

Although this contradiction is fine for embedding purposes, python is
a general purpose language, and needs to be capable of directly
sharing objects. Imagine you wanted to rewrite the bulk of your app
in python, with only a relatively small portion left in a C extension
module.

Got some real benchmarks to back that up? *How about testing it on a
16 core (or more) box and seeing how it scales?

I don't care to argue with you, and you'll have to take it on faith
that I'm not spouting hot air. *But just to put this to rest, I'll
make it clear in this Jython case:

You can't sell software to end users and expect them have a recent,
working java distro. *Look around you: no real commercial software
title that sells to soccer moms and gamers use java. *There's method
to commercial software production, so please don't presume that you
know my job, product line, and customers better than me, ok?

Just to put things in perspective, I already have exposed my company
to more support and design liability than I knew I was getting into by
going with python (as a result of all this thread safety and
interpreter independence business). *I love to go into that one, but
it's frankly just not a good use of my time right now. *Please just
accept that when someone says an option is a deal breaker, then it's a
deal breaker. *This isn't some dude's masters thesis project here--we
pay our RENT and put our KIDS through school because we sell and ship
software that works is meant to entertain people happy.

Consider it accepted. I understand that PyPy/Jython/IronPython don't
fit your needs. Likewise though, CPython cannot fit my needs. What
we both need simply does not exist today.

I'd like to see python used more, but fixing these things properly is
not as easy as believed. *Those in the user community see only their
immediate problem (threads don't use multicore). *People like me see
much bigger problems. *We need consensus on the problems, and how to
solve it, and a commitment to invest what's required.

Well, you seem to come down pretty hard on people that at your
doorstep saying their WILLING and INTERESTED in supporting python
development. *And, you're exactly right: *users see only their
immediate problem--but that's the definition of being a user. *If
users saw the whole picture from the dev side, then they be
developers, not users.

Please consider that you're representing the python dev community
here; I'm you're friend here, not your enemy.

I'm sorry if I came across harshly. My intent was merely to push you
towards supporting long-term solutions, rather than short-term ones.

Oct 23 '08 #15

Christian Heimes

Andy wrote:

2) Barriers to "free threading". As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps (see my QuickTime API example). Perhaps if we could go
back in time, we would not put the GIL in place, strict guidelines
regarding multithreaded use would have been established, and PEP 3121
would have been mandatory for C modules. Then again--screw that, if I
could go back in time, I'd just go for the lottery tickets!! :^)

I'm very - not absolute, but very - sure that Guido and the initial
designers of Python would have added the GIL anyway. The GIL makes
Python faster on single core machines and more stable on multi core
machines. Other language designers think the same way. Ruby recently got
a GIL. The article
http://www.infoq.com/news/2007/05/ru...eading-futures explains the
rationales for a GIL in Ruby. The article also holds a quote from Guido
about threading in general.

Several people inside and outside the Python community think that
threads are dangerous and don't scale. The paper
http://www.eecs.berkeley.edu/Pubs/Te...ECS-2006-1.pdf sums it
up nicely, It explains why modern processors are going to cause more and
more trouble with the Java approach to threads, too.

Python *must* gain means of concurrent execution of CPU bound code
eventually to survive on the market. But it must get the right means or
we are going to suffer the consequences.

Christian

Oct 23 '08 #16

Rhamphoryncus

On Oct 23, 11:30*am, Glenn Linderman <v+pyt...@g.nev cal.comwrote:

On approximately 10/23/2008 12:24 AM, came the following characters from
the keyboard of Christian Heimes:

Andy wrote:
2) Barriers to "free threading". *As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps (see my QuickTime API example). *Perhaps if we could go
back in time, we would not put the GIL in place, strict guidelines
regarding multithreaded use would have been established, and PEP 3121
would have been mandatory for C modules. *Then again--screw that, ifI
could go back in time, I'd just go for the lottery tickets!! :^)

I've been following this discussion with interest, as it certainly seems
that multi-core/multi-CPU machines are the coming thing, and many
applications will need to figure out how to use them effectively.

I'm very - not absolute, but very - sure that Guido and the initial
designers of Python would have added the GIL anyway. The GIL makes
Python faster on single core machines and more stable on multi core
machines. Other language designers think the same way. Ruby recently
got a GIL. The article
http://www.infoq.com/news/2007/05/ru...uturesexplains the
rationales for a GIL in Ruby. The article also holds a quote from
Guido about threading in general.

Several people inside and outside the Python community think that
threads are dangerous and don't scale. The paper
http://www.eecs.berkeley.edu/Pubs/Te...2006-1.pdfsums
it up nicely, It explains why modern processors are going to cause
more and more trouble with the Java approach to threads, too.

Reading this PDF paper is extremely interesting (albeit somewhat
dependent on understanding abstract theories of computation; I have
enough math background to follow it, sort of, and most of the text can
be read even without fully understanding the theoretical abstractions).

I have already heard people talking about "Java applications are
buggy". *I don't believe that general sequential programs written in
Java are any buggier than programs written in other languages... so I
had interpreted that to mean (based on some inquiry) that complex,
multi-threaded Java applications are buggy. *And while I also don't
believe that complex, multi-threaded programs written in Java are any
buggier than complex, multi-threaded programs written in other
languages, it does seem to be true that Java is one of the currently
popular languages in which to write complex, multi-threaded programs,
because of its language support for threads and concurrency primitives. *
These reports were from people that are not programmers, but are field
IT people, that have bought and/or support software and/or hardware with
drivers, that are written in Java, and seem to have non-ideal behavior,
(apparently only) curable by stopping/restarting the application or
driver, or sometimes requiring a reboot.

The paper explains many traps that lead to complex, multi-threaded
programs being buggy, and being hard to test. *I have worked with
parallel machines, applications, and databases for 25 years, and can
appreciate the succinct expression of the problems explained within the
paper, and can, from experience, agree with its premises and
conclusions. *Parallel applications only have been commercial successes
when the parallelism is tightly constrained to well-controlled patterns
that could be easily understood. *Threads, especially in "cooperatio n"
with languages that use memory pointers, have the potential to get out
of control, in inexplicable ways.

Although the paper is correct in many ways, I find it fails to
distinguish the core of the problem from the chaff surrounding it, and
thus is used to justify poor language designs.

For example, the amount of interaction may be seen as a spectrum: at
one end is C or Java threads, with complicated memory models, and a
tendency to just barely control things using locks. At the other end
would be completely isolated processes with no form of IPC. The later
is considered the worst possible, while the latter is the best
possible (purely sequential).

However, the latter is too weak for many uses. At a minimum we'd like
some pipes to communicate. Helps, but it's still too weak. What if
you have a large amount of data to share, created at startup but
otherwise not modified? So we add some read only types and ways to
define your own read only types. A couple of those types need a
process associated with them, so we make sure process handles are
proper objects too.

What have we got now? It's more on the thread end of the spectrum
than the process end, but it's definitely not a C or Java thread, and
it's definitely not an OS process. What is it? Does it have the
problems in the paper? Only some? Which?

Another peeve I have is his characterizatio n of the observer pattern.
The generalized form of the problem exists in both single-threaded
sequential programs, in the form of unexpected reentrancy, and message
passing, with infinite CPU usage or infinite number of pending
messages.

Perhaps threading makes it much worse; I've heard many anecdotes that
would support that. Or perhaps it's the lack of automatic deadlock
detection, giving a clear and diagnosable error for you to fix.
Certainly, the mystery and extremeness of a deadlock could explain how
much it scales people. Either way the paper says nothing.

Python *must* gain means of concurrent execution of CPU bound code
eventually to survive on the market. But it must get the right means
or we are going to suffer the consequences.

This statement, after reading the paper, seems somewhat in line with the
author's premise that language acceptability requires that a language be
self-contained/monolithic, and potentially sufficient to implement
itself. *That seems to also be one of the reasons that Java is used
today for threaded applications. *It does seem to be true, given current
hardware trends, that _some mechanism_ must be provided to obtain the
benefit of multiple cores/CPUs to a single application, and that Python
must either implement or interface to that mechanism to continue to be a
viable language for large scale application development.

Andy seems to want an implementation of independent Python processes
implemented as threads within a single address space, that can be
coordinated by an outer application. *This actually corresponds to the
model promulgated in the paper as being most likely to succeed. *Of
course, it maps nicely into a model using separate processes,
coordinated by an outer process, also. *The differences seem to be:

1) Most applications are historically perceived as corresponding to
single processes. *Language features for multi-processing are rare, and
such languages are not in common use.

2) A single address space can be convenient for the coordinating outer
application. *It does seem simpler and more efficient to simply "copy"
data from one memory location to another, rather than send it in a
message, especially if the data are large. *On the other hand,
coordination of memory access between multiple cores/CPUs effectively
causes memory copies from one cache to the other, and if memory is
accessed from multiple cores/CPUs regularly, the underlying hardware
implements additional synchronization and copying of data, potentially
each time the memory is accessed. *Being forced to do message passing of
data between processes can actually be more efficient than access to
shared memory at times. *I should note that in my 25 years of parallel
development, all the systems created used a message passing paradigm,
partly because the multiple CPUs often didn't share the same memory
chips, much less the same address space, and that a key feature of all
the successful systems of that nature was an efficient inter-CPU message
passing mechanism. *I should also note that Herb Sutter has a recent
series of columns in Dr Dobbs regarding multi-core/multi-CPU parallelism
and a variety of implementation pitfalls, that I found to be very
interesting reading.

Try looking at it on another level: when your CPU wants to read from a
bit of memory controlled by another CPU it sends them a message
requesting they get it for us. They send back a message containing
that memory. They also note we have it, in case they want to modify
it later. We also note where we got it, in case we want to modify it
(and not wait for them to do modifications for us).

Message passing vs shared memory isn't really a yes/no question. It's
about ratios, usage patterns, and tradeoffs. *All* programs will
share data, but in what way? If it's just the code itself you can
move the cache validation into software and simplify the CPU, making
it faster. If the shared data is a lot more than that, and you use it
to coordinate accesses, then it'll be faster to have it in hardware.

It's quite possible they'll come up with something that seems quite
different, but in reality is the same sort of rearrangement. Add
hardware support for transactions, move the caching partly into
software, etc.

>
I have noted the multiprocessing module that is new to Python 2.6/3.0
being feverishly backported to Python 2.5, 2.4, etc... indicating that
people truly find the model/module useful... seems that this is one way,
in Python rather than outside of it, to implement the model Andy is
looking for, although I haven't delved into the details of that module
yet, myself. *I suspect that a non-Python application could load one
embedded Python interpreter, and then indirectly use the multiprocessing
module to control other Python interpreters in other processors. *I
don't know that multithreading primitives such as described in the paper
are available in the multiprocessing module, but perhaps they can be
implemented in some manner using the tools that are provided; in any
case, some interprocess communication primitives are provided via this
new Python module.

There could be opportunity to enhance Python with process creation and
process coordination operations, rather than have it depend on
easy-to-implement-incorrectly coordination patterns or
easy-to-use-improperly libraries/modules of multiprocessing primitives
(this is not a slam of the new multiprocessing module, which appears to
be filling a present need in rather conventional ways, but just to point
out that ideas promulgated by the paper, which I suspect 2 years later
are still research topics, may be a better abstraction than the
conventional mechanisms).

One thing Andy hasn't yet explained (or I missed) is why any of his
application is coded in a language other than Python. *I can think of a
number of possibilities:

A) (Historical) It existed, then the desire for extensions was seen, and
Python was seen as a good extension language.

B) Python is inappropriate (performance?) for some of the algorithms
(but should they be coded instead as Python extensions, with the core
application being in Python?)

C) Unavailability of Python wrappers for particularly useful 3rd-party
libraries

D) Other?

"It already existed" is definitely the original reason, but now it
includes single-threaded performance and multi-threaded scalability.
Although the idea of "just write an extension that releases the GIL"
is a common suggestion, it needs to be fairly coarse to be effective,
and ensure little of the CPU time is left in python. If the apps
spreads around it's CPU time it is likely impossible to use python
effectively.

Oct 23 '08 #17

greg

Andy wrote:

1) Independent interpreters (this is the easier one--and solved, in
principle anyway, by PEP 3121, by Martin v. LÃ¶wis

Something like that is necessary for independent interpreters,
but not sufficient. There are also all the built-in constants
and type objects to consider. Most of these are statically
allocated at the moment.

2) Barriers to "free threading". As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps

No, it's there because it's necessary for acceptable performance
when multiple threads are running in one interpreter. Independent
interpreters wouldn't mean the absence of a GIL; it would only
mean each interpreter having its own GIL.

--
Greg

Oct 24 '08 #18

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

You seem confused. PEP 3121 is for isolated interpreters (ie emulated

processes), not threading.

Just a small remark: this wasn't the primary objective of the PEP.
The primary objective was to support module cleanup in a reliable
manner, to allow eventually to get modules garbage-collected properly.
However, I also kept the isolated interpreters feature in mind there.

Regards,
Martin

Oct 24 '08 #19

sturlamolden

Instead of "appdomains " (one interpreter per thread), or free
threading, you could use multiple processes. Take a look at the new
multiprocessing module in Python 2.6. It has roughly the same
interface as Python's threading and queue modules, but uses processes
instead of threads. Processes are scheduled independently by the
operating system. The objects in the multiprocessing module also tend
to have much better performance than their threading and queue
counterparts. If you have a problem with threads due to the GIL, the
multiprocessing module with most likely take care of it.

There is a fundamental problem with using homebrew loading of multiple
(but renamed) copies of PythonXX.dll that is easily overlooked. That
is, extension modules (.pyd) are DLLs as well. Even if required by two
interpreters, they will only be loaded into the process image once.
Thus you have to rename all of them as well, or you will get havoc
with refcounts. Not to speak of what will happen if a Windows HANDLE
is closed by one interpreter while still needed by another. It is
almost guaranteed to bite you, sooner or later.

There are other options as well:

- Use IronPython. It does not have a GIL.

- Use Jython. It does not have a GIL.

- Use pywin32 to create isolated outproc COM servers in Python. (I'm
not sure what the effect of inproc servers would be.)

- Use os.fork() if your platform supports it (Linux, Unix, Apple,
Cygwin, Windows Vista SUA). This is the standard posix way of doing
multiprocessing . It is almost unbeatable if you have a fast copy-on-
write implementation of fork (that is, all platforms except Cygwin).

Oct 24 '08 #20