2.6, 3.0, and truly independent intepreters - Page 4

Andy

Dear Python dev community,

I'm CTO at a small software company that makes music visualization
software (you can check us out at www.soundspectrum.com). About two
years ago we went with decision to use embedded python in a couple of
our new products, given all the great things about python. We were
close to using lua but for various reasons we decided to go with
python. However, over the last two years, there's been one area of
grief that sometimes makes me think twice about our decision to go
with python...

Some background first... Our software is used for entertainment and
centers around real time, high-performance graphics, so python's
performance, embedded flexibility, and stability are the most
important issues for us. Our software targets a large cross section
of hardware and we currently ship products for Win32, OS X, and the
iPhone and since our customers are end users, our products have to be
robust, have a tidy install footprint, and be foolproof. Basically,
we use embedded python and use it to wrap our high performance C++
class set which wraps OpenGL, DirectX and our own software renderer.
In addition to wrapping our C++ frameworks, we use python to perform
various "worker" tasks on worker thread (e.g. image loading and
processing). However, we require *true* thread/interpreter
independence so python 2 has been frustrating at time, to say the
least. Please don't start with "but really, python supports multiple
interpreters" because I've been there many many times with people.
And, yes, I'm aware of the multiprocessing module added in 2.6, but
that stuff isn't lightweight and isn't suitable at all for many
environments (including ours). The bottom line is that if you want to
perform independent processing (in python) on different threads, using
the machine's multiple cores to the fullest, then you're out of luck
under python 2.

Sadly, the only way we could get truly independent interpreters was to
put python in a dynamic library, have our installer make a *duplicate*
copy of it during the installation process (e.g. python.dll/.bundle ->
python2.dll/.bundle) and load each one explicitly in our app, so we
can get truly independent interpreters. In other words, we load a
fresh dynamic lib for each thread-independent interpreter (you can't
reuse the same dynamic library because the OS will just reference the
already-loaded one).

From what I gather from the python community, the basis for not
offering "real" muti-threaded support is that it'd add to much
internal overhead--and I couldn't agree more. As a high performance C
and C++ guy, I fully agree that thread safety should be at the high
level, not at the low level. BUT, the lack of truly independent
interpreters is what ultimately prevents using python in cool,
powerful ways. This shortcoming alone has caused game developers--
both large and small--to choose other embedded interpreters over
python (e.g. Blizzard chose lua over python). For example, Apple's
QuickTime API is powerful in that high-level instance objects can
leverage performance gains associated with multi-threaded processing.
Meanwhile, the QuickTime API simply lists the responsibilitie s of the
caller regarding thread safety and that's all its needs to do. In
other words, CPython doesn't need to step in an provide a threadsafe
environment; it just needs to establish the rules and make sure that
its own implementation supports those rules.

More than once, I had actually considered expending company resources
to develop a high performance, truly independent interpreter
implementation of the python core language and modules but in the end
estimated that the size of that project would just be too much, given
our company's current resources. Should such an implementation ever
be developed, it would be very attractive for companies to support,
fund, and/or license. The truth is, we just love python as a
language, but it's lack of true interpreter independence (in a
interpreter as well as in a thread sense) remains a *huge* liability.

So, my question becomes: is python 3 ready for true multithreaded
support?? Can we finally abandon our Frankenstein approach of loading
multiple identical dynamic libs to achieve truly independent
interpreters?? I've reviewed all the new python 3 C API module stuff,
and all I have to say is: whew--better late then never!! So, although
that solves modules offering truly independent interpreter support,
the following questions remain:

- In python 3, the C module API now supports true interpreter
independence, but have all the modules in the python codebase been
converted over? Are they all now truly compliant? It will only take
a single static/global state variable in a module to potentially cause
no end of pain in a multiple interpreter environment! Yikes!

- How close is python 3 really to true multithreaded use? The
assumption here is that caller ensures safety (e.g. ensuring that
neither interpreter is in use when serializing data from one to
another).

I believe that true python independent thread/interpreter support is
paramount and should become the top priority because this is the key
consideration used by developers when they're deciding which
interpreter to embed in their app. Until there's a hello world that
demonstrates running independent python interpreters on multiple app
threads, lua will remain the clear choice over python. Python 3 needs
true interpreter independence and multi-threaded support!
Thanks,
Andy O'Meara

Oct 22 '08

Subscribe Reply

114

3911

« First
<
2
3
4
5
6
>
Last »

Jesse Noller

On Fri, Oct 24, 2008 at 3:17 PM, Andy O'Meara <an****@gmail.c omwrote:

I'm a lousy writer sometimes, but I feel bad if you took the time to
describe threads vs processes. The only reason I raised IPC with my
"messaging isn't very attractive" comment was to respond to Glenn
Linderman's points regarding tradeoffs of shared memory vs no.

I actually took the time to bring anyone listening in up to speed, and
to clarify so I could better understand your use case. Don't feel bad,
things in the thread are moving fast and I just wanted to clear it up.

Ideally, we all want to improve the language, and the interpreter.
However trying to push it towards a particular use case is dangerous
given the idea of "general use".

-jesse

Oct 24 '08 #31

Rhamphoryncus

On Oct 24, 1:02*pm, Glenn Linderman <v+pyt...@g.nev cal.comwrote:

On approximately 10/24/2008 8:42 AM, came the following characters from
the keyboard of Andy O'Meara:

Glenn, great post and points!

Thanks. I need to admit here that while I've got a fair bit of
professional programming experience, I'm quite new to Python -- I've not
learned its internals, nor even the full extent of its rich library. So
I have some questions that are partly about the goals of the
applications being discussed, partly about how Python is constructed,
and partly about how the library is constructed. I'm hoping to get a
better understanding of all of these; perhaps once a better
understanding is achieved, limitations will be understood, and maybe
solutions be achievable.

Let me define some speculative Python interpreters; I think the first is
today's Python:

PyA: Has a GIL. PyA threads can run within a process; but are
effectively serialized to the places where the GIL is obtained/released.
Needs the GIL because that solves lots of problems with non-reentrant
code (an example of non-reentrant code, is code that uses global (C
global, or C static) variables – note that I'm not talking about Python
vars declared global... they are only module global). In this model,
non-reentrant code could include pieces of the interpreter, and/or
extension modules.

PyB: No GIL. PyB threads acquire/release a lock around each reference to
a global variable (like "with" feature). Requires massive recoding of
all code that contains global variables. Reduces performance
significantly by the increased cost of obtaining and releasing locks.

PyC: No locks. Instead, recoding is done to eliminate global variables
(interpreter requires a state structure to be passed in). Extension
modules that use globals are prohibited... this eliminates large
portions of the library, or requires massive recoding. PyC threads do
not share data between threads except by explicit interfaces.

PyD: (A hybrid of PyA & PyC). The interpreter is recoded to eliminate
global variables, and each interpreter instance is provided a state
structure. There is still a GIL, however, because globals are
potentially still used by some modules. Code is added to detect use of
global variables by a module, or some contract is written whereby a
module can be declared to be reentrant and global-free. PyA threads will
obtain the GIL as they would today. PyC threads would be available to be
created. PyC instances refuse to call non-reentrant modules, but also
need not obtain the GIL... PyC threads would have limited module support
initially, but over time, most modules can be migrated to be reentrant
and global-free, so they can be used by PyC instances. Most 3rd-party
libraries today are starting to care about reentrancy anyway, because of
the popularity of threads.

PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable. A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability. Most other
shareable objects are immutable. Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties. Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.

Our software runs in real time (so performance is paramount),
interacts with other static libraries, depends on worker threads to
perform real-time image manipulation, and leverages Windows and Mac OS
API concepts and features. *Python's performance hits have generally
been a huge challenge with our animators because they often have to go
back and massage their python code to improve execution performance.
So, in short, there are many reasons why we use python as a part
rather than a whole.

[...]

As a python language fan an enthusiast, don't let lua win! *(I say
this endearingly of course--I have the utmost respect for both
communities and I only want to see CPython be an attractive pick when
a company is looking to embed a language that won't intrude upon their
app's design).

I agree with the problem, and desire to make python fill all niches,
but let's just say I'm more ambitious with my solution. ;)

Oct 24 '08 #32

Andy O'Meara

Another great post, Glenn!! Very well laid-out and posed!! Thanks for
taking the time to lay all that out.

>
Questions for Andy: is the type of work you want to do in independent
threads mostly pure Python? Or with libraries that you can control to
some extent? Are those libraries reentrant? Could they be made
reentrant? How much of the Python standard library would need to be
available in reentrant mode to provide useful functionality for those
threads? I think you want PyC

I think you've defined everything perfectly, and you're you're of
course correct about my love for for the PyC model. :^)

Like any software that's meant to be used without restrictions, our
code and frameworks always use a context object pattern so that
there's never and non-const global/shared data). I would go as far to
say that this is the case with more performance-oriented software than
you may think since it's usually a given for us to have to be parallel
friendly in as many ways as possible. Perhaps Patrick can back me up
there.

As to what modules are "essential" ... As you point out, once
reentrant module implementations caught on in PyC or hybrid world, I
think we'd start to see real effort to whip them into compliance--
there's just so much to be gained imho. But to answer the question,
there's the obvious ones (operator, math, etc), string/buffer
processing (string, re), C bridge stuff (struct, array), and OS basics
(time, file system, etc). Nice-to-haves would be buffer and image
decompression (zlib, libpng, etc), crypto modules, and xml. As far as
I can imagine, I have to believe all of these modules already contain
little, if any, global data, so I have to believe they'd be super easy
to make "PyC happy". Patrick, what would you see you guys using?

That's the rub... *In our case, we're doing image and video
manipulation--stuff not good to be messaging from address space to
address space. *The same argument holds for numerical processing with
large data sets. *The workers handing back huge data sets via
messaging isn't very attractive.

In the module multiprocessing environment could you not use shared
memory, then, for the large shared data items?

As I understand things, the multiprocessing puts stuff in a child
process (i.e. a separate address space), so the only to get stuff to/
from it is via IPC, which can include a shared/mapped memory region.
Unfortunately, a shared address region doesn't work when you have
large and opaque objects (e.g. a rendered CoreVideo movie in the
QuickTime API or 300 megs of audio data that just went through a
DSP). Then you've got the hit of serialization if you're got
intricate data structures (that would normally would need to be
serialized, such as a hashtable or something). Also, if I may speak
for commercial developers out there who are just looking to get the
job done without new code, it's usually always preferable to just a
single high level sync object (for when the job is complete) than to
start a child processes and use IPC. The former is just WAY less
code, plain and simple.
Andy

Oct 24 '08 #33

Jesse Noller

On Fri, Oct 24, 2008 at 4:51 PM, Andy O'Meara <an****@gmail.c omwrote:

>In the module multiprocessing environment could you not use shared
memory, then, for the large shared data items?

As I understand things, the multiprocessing puts stuff in a child
process (i.e. a separate address space), so the only to get stuff to/
from it is via IPC, which can include a shared/mapped memory region.
Unfortunately, a shared address region doesn't work when you have
large and opaque objects (e.g. a rendered CoreVideo movie in the
QuickTime API or 300 megs of audio data that just went through a
DSP). Then you've got the hit of serialization if you're got
intricate data structures (that would normally would need to be
serialized, such as a hashtable or something). Also, if I may speak
for commercial developers out there who are just looking to get the
job done without new code, it's usually always preferable to just a
single high level sync object (for when the job is complete) than to
start a child processes and use IPC. The former is just WAY less
code, plain and simple.

Are you familiar with the API at all? Multiprocessing was designed to
mimic threading in about every way possible, the only restriction on
shared data is that it must be serializable, but event then you can
override or customize the behavior.

Also, inter process communication is done via pipes. It can also be
done with messages if you want to tweak the manager(s).

-jesse

Oct 24 '08 #34

Rhamphoryncus

On Oct 24, 2:59*pm, Glenn Linderman <gl...@nevcal.c omwrote:

On approximately 10/24/2008 1:09 PM, came the following characters from
the keyboard of Rhamphoryncus:
PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable. *A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability. *Most other
shareable objects are immutable. *Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties. *Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.

Hmm. *So I think your PyE is an instance is an attempt to be more
explicit about what I said above in PyC: PyC threads do not share data
between threads except by explicit interfaces. *I consider your
definitions of shared data types somewhat orthogonal to the types of
threads, in that both PyA and PyC threads could use these new shared
data items.

Unlike PyC, there's a *lot* shared by default (classes, modules,
function), but it requires only minimal recoding. It's as close to
"have your cake and eat it too" as you're gonna get.

I think/hope that you meant that "many types are now only allowed to be
non-shareable"? *At least, I think that should be the default; they
should be within the context of a single, independent interpreter
instance, so other interpreters don't even know they exist, much less
how to share them. *If so, then I understand most of the rest of your
paragraph, and it could be a way of providing shared objects, perhaps.

There aren't multiple interpreters under my model. You only need
one. Instead, you create a monitor, and run a thread on it. A list
is not shareable, so it can only be used within the monitor it's
created within, but the list type object is shareable.

I've no interest in *requiring* a C/C++ extension to communicate
between isolated interpreters. Without that they're really no better
than processes.

Oct 24 '08 #35

Rhamphoryncus

On Oct 24, 3:02*pm, Glenn Linderman <v+pyt...@g.nev cal.comwrote:

On approximately 10/23/2008 2:24 PM, came the following characters from the
keyboard of Rhamphoryncus:
>>
On Oct 23, 11:30 am, Glenn Linderman <v+pyt...@g.nev cal.comwrote:

>>>
On approximately 10/23/2008 12:24 AM, came the following characters from
the keyboard of Christian Heimes

Andy wrote:
I'm very - not absolute, but very - sure that Guido and the initial
designers of Python would have added the GIL anyway. The GIL makes
Python faster on single core machines and more stable on multi core
machines.

Actually, the GIL doesn't make Python faster; it is a design decision that
reduces the overhead of lock acquisition, while still allowing use of global
variables.

Using finer-grained locks has higher run-time cost; eliminating the use of
global variables has a higher programmer-time cost, but would actually run
faster and more concurrently than using a GIL. Especially on a
multi-core/multi-CPU machine.

Those "globals" include classes, modules, and functions. You can't
have *any* objects shared. Your interpreters are entirely isolated,
much like processes (and we all start wondering why you don't use
processes in the first place.)

Or use safethread. It imposes safe semantics on shared objects, so
you can keep your global classes, modules, and functions. Still need
garbage collection though, and on CPython that means refcounting and
the GIL.

>Another peeve I have is his characterizatio n of the observer pattern.
The generalized form of the problem exists in both single-threaded
sequential programs, in the form of unexpected reentrancy, and message
passing, with infinite CPU usage or infinite number of pending
messages.

So how do you get reentrancy is a single-threaded sequential program? I
think only via recursion? Which isn't a serious issue for the observer
pattern. If you add interrupts, then your program is no longer sequential..

Sorry, I meant recursion. Why isn't it a serious issue for
single-threaded programs? Just the fact that it's much easier to
handle when it does happen?

>Try looking at it on another level: when your CPU wants to read from a
bit of memory controlled by another CPU it sends them a message
requesting they get it for us. They send back a message containing
that memory. They also note we have it, in case they want to modify
it later. We also note where we got it, in case we want to modify it
(and not wait for them to do modifications for us).

I understand that level... one of my degrees is in EE, and I started college
wanting to design computers (at about the time the first microprocessor chip
came along, and they, of course, have now taken over). But I was side-lined
by the malleability of software, and have mostly practiced software during
my career.

Anyway, that is the level that Herb Sutter was describing in the Dr Dobbs
articles I mentioned. And the overhead of doing that at the level of a cache
line is high, if there is lots of contention for particular memory locations
between threads running on different cores/CPUs. So to achieve concurrency,
you must not only limit explicit software locks, but must also avoid memory
layouts where data needed by different cores/CPUs are in the same cache
line.

I suspect they'll end up redesigning the caching to use a size and
alignment of 64 bits (or smaller). Same cache line size, but with
masking.

You still need to minimize contention of course, but that should at
least be more predictable. Having two unrelated mallocs contend could
suck.

>Message passing vs shared memory isn't really a yes/no question. It's
about ratios, usage patterns, and tradeoffs. *All* programs will
share data, but in what way? If it's just the code itself you can
move the cache validation into software and simplify the CPU, making
it faster. If the shared data is a lot more than that, and you use it
to coordinate accesses, then it'll be faster to have it in hardware.

I agree there are tradeoffs... unfortunately, the hardware architectures
vary, and the languages don't generally understand the hardware. So then it
becomes an OS API, which adds the overhead of an OS API call to the cost of
the synchronization ... It could instead be (and in clever applications is) a
non-portable assembly level function that wraps on OS locking or waiting
API.

In practice I highly doubt we'll see anything that doesn't extend
traditional threading (posix threads, whatever MS has, etc).

Nonetheless, while putting the shared data accesses in hardware might be
more efficient per unit operation, there are still tradeoffs: A software
solution can group multiple accesses under a single lock acquisition; the
hardware probably doesn't have enough smarts to do that. So it may well
require many more hardware unit operations for the same overall concurrently
executed function, and the resulting performance may not be any better.

Speculative ll/sc? ;)

Sidestepping the whole issue, by minimizing shared data in the application
design, avoiding not only software lock calls, and hardware cache
contention, is going to provide the best performance... it isn't the things
you do efficiently that make software fast — it is the things you don'tdo
at all.

Minimizing contention, certainly. Minimizing the shared data itself
is iffier though.

Oct 24 '08 #36

Adam Olsen

On Fri, Oct 24, 2008 at 4:48 PM, Glenn Linderman <v+******@g.nev cal.comwrote:

On approximately 10/24/2008 2:15 PM, came the following characters from the
keyboard of Rhamphoryncus:
>>
On Oct 24, 2:59 pm, Glenn Linderman <gl...@nevcal.c omwrote:

>>>
On approximately 10/24/2008 1:09 PM, came the following characters from
the keyboard of Rhamphoryncus:
PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable. A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability. Most other
shareable objects are immutable. Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties. Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.
Hmm. So I think your PyE is an instance is an attempt to be more
explicit about what I said above in PyC: PyC threads do not share data
between threads except by explicit interfaces. I consider your
definitions of shared data types somewhat orthogonal to the types of
threads, in that both PyA and PyC threads could use these new shared
data items.

Unlike PyC, there's a *lot* shared by default (classes, modules,
function), but it requires only minimal recoding. It's as close to
"have your cake and eat it too" as you're gonna get.

Yes, but I like my cake frosted with performance; Guido's non-acceptance of
granular locks in the blog entry someone referenced was due to the slowdown
acquired with granular locking and shared objects. Your PyE model, with
highly granular sharing, will likely suffer the same fate.

No, my approach includes scalable performance. Typical paths will
involve *no* contention (ie no locking). classes and modules use
shareddict, which is based on a read-write lock built into the
interpreter, so it's uncontended for read-only usage patterns. Pretty
much everything else is immutable.

Of course that doesn't include the cost of garbage collection.
CPython's refcounting can't scale.

The independent threads model, with only slight locking for a few explicitly
shared objects, has a much better chance of getting better performance
overall. With one thread running, it would be the same as today; with
multiple threads, it should scale at the same rate as the system... minus
any locking done at the higher level.

So use processes with a little IPC for these expensive-yet-"shared"
objects. multiprocessing does it already.

>>I think/hope that you meant that "many types are now only allowed to be
non-shareable"? At least, I think that should be the default; they
should be within the context of a single, independent interpreter
instance, so other interpreters don't even know they exist, much less
how to share them. If so, then I understand most of the rest of your
paragraph, and it could be a way of providing shared objects, perhaps.

There aren't multiple interpreters under my model. You only need
one. Instead, you create a monitor, and run a thread on it. A list
is not shareable, so it can only be used within the monitor it's
created within, but the list type object is shareable.

The python interpreter code should be sharable, having been written in C,
and being/becoming reentrant. So in that sense, there is only one
interpreter. Similarly, any other reentrant C extensions would be that way.
On the other hand, each thread of execution requires its own interpreter
context, so that would have to be independent for the threads to be
independent. It is the combination of code+context that I call an
interpreter, and there would be one per thread for PyC threads. Bytecode
for loaded modules could potentially be shared, if it is also immutable.
However, that could be in my mental "phase 2", as it would require an extra
level of complexity in the interpreter as it creates shared bytecode...
there would be a memory savings from avoiding multiple copies of shared
bytecode, likely, and maybe also a compilation performance savings. So it
sounds like a win, but it is a win that can deferred for initial simplicity,
to prove the concept is or is not workable.

A monitor allows a single thread to run at a time; that is the same
situation as the present GIL. I guess I don't fully understand your model.

To use your terminology, each monitor is a context. Each thread
operates in a different monitor. As you say, most C functions are
already thread-safe (reentrant). All I need to do is avoid letting
multiple threads modify a single mutable object (such as a list) at a
time, which I do by containing it within a single monitor (context).
--
Adam Olsen, aka Rhamphoryncus

Oct 25 '08 #37

Adam Olsen

On Fri, Oct 24, 2008 at 5:38 PM, Glenn Linderman <v+******@g.nev cal.comwrote:

On approximately 10/24/2008 2:16 PM, came the following characters from the
keyboard of Rhamphoryncus:
>>
On Oct 24, 3:02 pm, Glenn Linderman <v+pyt...@g.nev cal.comwrote:

>>>
On approximately 10/23/2008 2:24 PM, came the following characters from
the
keyboard of Rhamphoryncus:
On Oct 23, 11:30 am, Glenn Linderman <v+pyt...@g.nev cal.comwrote:
>
On approximately 10/23/2008 12:24 AM, came the following characters
from
the keyboard of Christian Heimes
>
>>
>Andy wrote:
>I'm very - not absolute, but very - sure that Guido and the initial
>designer s of Python would have added the GIL anyway. The GIL makes
>Python faster on single core machines and more stable on multi core
>machines .
>>

Actually, the GIL doesn't make Python faster; it is a design decision
that
reduces the overhead of lock acquisition, while still allowing use of
global
variables.

Using finer-grained locks has higher run-time cost; eliminating the use
of
global variables has a higher programmer-time cost, but would actually
run
faster and more concurrently than using a GIL. Especially on a
multi-core/multi-CPU machine.

Those "globals" include classes, modules, and functions. You can't
have *any* objects shared. Your interpreters are entirely isolated,
much like processes (and we all start wondering why you don't use
processes in the first place.)

Indeed; isolated, independent interpreters are one of the goals. It is,
indeed, much like processes, but in a single address space. It allows the
master process (Python or C for the embedded case) to be coded using memory
references and copies and pointer swaps instead of using semaphores, and
potentially multi-megabyte message transfers.

It is not clear to me that with the use of shared memory between processes,
that the application couldn't use processes, and achieve many of the same
goals. On the other hand, the code to create and manipulate processes and
shared memory blocks is harder to write and has more overhead than the code
to create and manipulate threads, which can, when told, access any memory
block in the process. This allows the shared memory to be resized more
easily, or more blocks of shared memory created more easily. On the other
hand, the creation of shared memory blocks shouldn't be a high-use operation
in a program that has sufficient number crunching to do to be able to
consume multiple cores/CPUs.

>Or use safethread. It imposes safe semantics on shared objects, so
you can keep your global classes, modules, and functions. Still need
garbage collection though, and on CPython that means refcounting and
the GIL.

Sounds like safethread has 35-40% overhead. Sounds like too much, to me.

The specific implementation of safethread, which attempts to remove
the GIL from CPython, has significant overhead and had very limited
success at being scalable.

The monitor design proposed by safethread has no inherent overhead and
is completely scalable.
--
Adam Olsen, aka Rhamphoryncus

Oct 25 '08 #38

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

>A c-level module, on the other hand, can sidestep/release

>the GIL at will, and go on it's merry way and process away.

...Unless part of the C module execution involves the need do CPU-
bound work on another thread through a different python interpreter,
right?

Wrong.

(even if the interpreter is 100% independent, yikes).

Again, wrong.

For
example, have a python C module designed to programmaticall y generate
images (and video frames) in RAM for immediate and subsequent use in
animation. Meanwhile, we'd like to have a pthread with its own
interpreter with an instance of this module and have it dequeue jobs
as they come in (in fact, there'd be one of these threads for each
excess core present on the machine).

I don't understand how this example involves multiple threads. You
mention a single thread (running the module), and you mention designing
a module. Where is the second thread?

Let's assume there is another thread producing jobs, and then
a thread that generates the images. The structure would be this

while 1:
job = queue.get()
processing_modu le.process(job)

and in process:

PyArg_ParseTupl e(args, "s", job_data);
result = PyString_New(bu fsize);
buf = PyString_AsStri ng(result);
Py_BEGIN_ALLOW_ THREADS
compute_frame(j ob_data, buf);
Py_END_ALLOW_TH READS
return PyString_FromSt ring(buf);

All these compute_frames could happily run in parallel.

As far as I can tell, it seems
CPython's current state can't CPU bound parallelization in the same
address space.

That's not true.

Regards,
Martin

Oct 25 '08 #39

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

It seems to me that the very simplest move would be to remove global

static data so the app could provide all thread-related data, which
Andy suggests through references to the QuickTime API. This would
suggest compiling python without thread support so as to leave it up
to the application.

I'm not sure whether you realize that this is not simple at all.
Consider this fragment

if (string == Py_None || index >= state->lastmark ||
!state->mark[index] || !state->mark[index+1]) {
if (empty)
/* want empty string */
i = j = 0;
else {
Py_INCREF(Py_No ne);
return Py_None;

Py_None here is a global variable. How would you replace it?
It's used in thousands of places.

For another example, consider

PyErr_SetString (PyExc_ValueErr or,
"Empty module name");
or

dp = PyObject_New(db mobject, &Dbmtype);

There are tons of different variables denoting exceptions and
other types which all somehow need to be rewritten (likely with
undesirable effects on readability).

So I don't think that this is a simple solution. It's the right
one, but it will take five or ten years to implement.

Regards,
Martin

Oct 25 '08 #40