472,146 Members | 1,212 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,146 software developers and data experts.

2.6, 3.0, and truly independent intepreters

Dear Python dev community,

I'm CTO at a small software company that makes music visualization
software (you can check us out at www.soundspectrum.com). About two
years ago we went with decision to use embedded python in a couple of
our new products, given all the great things about python. We were
close to using lua but for various reasons we decided to go with
python. However, over the last two years, there's been one area of
grief that sometimes makes me think twice about our decision to go
with python...

Some background first... Our software is used for entertainment and
centers around real time, high-performance graphics, so python's
performance, embedded flexibility, and stability are the most
important issues for us. Our software targets a large cross section
of hardware and we currently ship products for Win32, OS X, and the
iPhone and since our customers are end users, our products have to be
robust, have a tidy install footprint, and be foolproof. Basically,
we use embedded python and use it to wrap our high performance C++
class set which wraps OpenGL, DirectX and our own software renderer.
In addition to wrapping our C++ frameworks, we use python to perform
various "worker" tasks on worker thread (e.g. image loading and
processing). However, we require *true* thread/interpreter
independence so python 2 has been frustrating at time, to say the
least. Please don't start with "but really, python supports multiple
interpreters" because I've been there many many times with people.
And, yes, I'm aware of the multiprocessing module added in 2.6, but
that stuff isn't lightweight and isn't suitable at all for many
environments (including ours). The bottom line is that if you want to
perform independent processing (in python) on different threads, using
the machine's multiple cores to the fullest, then you're out of luck
under python 2.

Sadly, the only way we could get truly independent interpreters was to
put python in a dynamic library, have our installer make a *duplicate*
copy of it during the installation process (e.g. python.dll/.bundle ->
python2.dll/.bundle) and load each one explicitly in our app, so we
can get truly independent interpreters. In other words, we load a
fresh dynamic lib for each thread-independent interpreter (you can't
reuse the same dynamic library because the OS will just reference the
already-loaded one).

From what I gather from the python community, the basis for not
offering "real" muti-threaded support is that it'd add to much
internal overhead--and I couldn't agree more. As a high performance C
and C++ guy, I fully agree that thread safety should be at the high
level, not at the low level. BUT, the lack of truly independent
interpreters is what ultimately prevents using python in cool,
powerful ways. This shortcoming alone has caused game developers--
both large and small--to choose other embedded interpreters over
python (e.g. Blizzard chose lua over python). For example, Apple's
QuickTime API is powerful in that high-level instance objects can
leverage performance gains associated with multi-threaded processing.
Meanwhile, the QuickTime API simply lists the responsibilities of the
caller regarding thread safety and that's all its needs to do. In
other words, CPython doesn't need to step in an provide a threadsafe
environment; it just needs to establish the rules and make sure that
its own implementation supports those rules.

More than once, I had actually considered expending company resources
to develop a high performance, truly independent interpreter
implementation of the python core language and modules but in the end
estimated that the size of that project would just be too much, given
our company's current resources. Should such an implementation ever
be developed, it would be very attractive for companies to support,
fund, and/or license. The truth is, we just love python as a
language, but it's lack of true interpreter independence (in a
interpreter as well as in a thread sense) remains a *huge* liability.

So, my question becomes: is python 3 ready for true multithreaded
support?? Can we finally abandon our Frankenstein approach of loading
multiple identical dynamic libs to achieve truly independent
interpreters?? I've reviewed all the new python 3 C API module stuff,
and all I have to say is: whew--better late then never!! So, although
that solves modules offering truly independent interpreter support,
the following questions remain:

- In python 3, the C module API now supports true interpreter
independence, but have all the modules in the python codebase been
converted over? Are they all now truly compliant? It will only take
a single static/global state variable in a module to potentially cause
no end of pain in a multiple interpreter environment! Yikes!

- How close is python 3 really to true multithreaded use? The
assumption here is that caller ensures safety (e.g. ensuring that
neither interpreter is in use when serializing data from one to
another).

I believe that true python independent thread/interpreter support is
paramount and should become the top priority because this is the key
consideration used by developers when they're deciding which
interpreter to embed in their app. Until there's a hello world that
demonstrates running independent python interpreters on multiple app
threads, lua will remain the clear choice over python. Python 3 needs
true interpreter independence and multi-threaded support!
Thanks,
Andy O'Meara
Oct 22 '08 #1
114 3623
Andy schrieb:
Dear Python dev community,

[...] Basically,
we use embedded python and use it to wrap our high performance C++
class set which wraps OpenGL, DirectX and our own software renderer.
In addition to wrapping our C++ frameworks, we use python to perform
various "worker" tasks on worker thread (e.g. image loading and
processing). However, we require *true* thread/interpreter
independence so python 2 has been frustrating at time, to say the
least.
[...]
>
Sadly, the only way we could get truly independent interpreters was to
put python in a dynamic library, have our installer make a *duplicate*
copy of it during the installation process (e.g. python.dll/.bundle ->
python2.dll/.bundle) and load each one explicitly in our app, so we
can get truly independent interpreters. In other words, we load a
fresh dynamic lib for each thread-independent interpreter (you can't
reuse the same dynamic library because the OS will just reference the
already-loaded one).
Interesting questions you ask.

A random note: py2exe also does something similar for executables build
with the 'bundle = 1' option. The python.dll and .pyd extension modules
in this case are not loaded into the process in the 'normal' way (with
some kind of windows LoadLibrary() call, instead they are loaded by code
in py2exe that /emulates/ LoadLibrary - the code segments are loaded into
memory, fixups are made for imported functions, and marked executable.

The result is that separate COM objects implemented as Python modules and
converted into separate dlls by py2exe do not share their interpreters even
if they are running in the same process. Of course this only works on windows.
In effect this is similar to using /statically/ linked python interpreters
in separate dlls. Can't you do something like that?
So, my question becomes: is python 3 ready for true multithreaded
support?? Can we finally abandon our Frankenstein approach of loading
multiple identical dynamic libs to achieve truly independent
interpreters?? I've reviewed all the new python 3 C API module stuff,
and all I have to say is: whew--better late then never!! So, although
that solves modules offering truly independent interpreter support,
the following questions remain:

- In python 3, the C module API now supports true interpreter
independence, but have all the modules in the python codebase been
converted over? Are they all now truly compliant? It will only take
a single static/global state variable in a module to potentially cause
no end of pain in a multiple interpreter environment! Yikes!
I don't think this is the case (currently). But you could submit patches
to Python so that at least the 'official' modules (builtin and extensions)
would behave corectly in the case of multiple interpreters. At least
this is a much lighter task than writing your own GIL-less interpreter.

My 2 cents,

Thomas
Oct 22 '08 #2
- In python 3, the C module API now supports true interpreter
independence, but have all the modules in the python codebase been
converted over?
No, none of them.
Are they all now truly compliant? It will only take
a single static/global state variable in a module to potentially cause
no end of pain in a multiple interpreter environment! Yikes!
So you will have to suffer pain.
- How close is python 3 really to true multithreaded use?
Python is as thread-safe as ever (i.e. completely thread-safe).
I believe that true python independent thread/interpreter support is
paramount and should become the top priority because this is the key
consideration used by developers when they're deciding which
interpreter to embed in their app. Until there's a hello world that
demonstrates running independent python interpreters on multiple app
threads, lua will remain the clear choice over python. Python 3 needs
true interpreter independence and multi-threaded support!
So what patches to achieve that goal have you contributed so far?

In open source, pleas have nearly zero effect; code contributions is
what has effect.

I don't think any of the current committers has a significant interest
in supporting multiple interpreters (and I say that as the one who wrote
and implemented PEP 3121). To make a significant change, you need to
start with a PEP, offer to implement it once accepted, and offer to
maintain the feature for five years.

Regards,
Martin
Oct 22 '08 #3

Hi Thomas -

I appreciate your thoughts and time on this subject.
>
The result is that separate COM objects implemented as Python modules and
converted into separate dlls by py2exe do not share their interpreters even
if they are running in the same process. *Of course this only works on windows.
In effect this is similar to using /statically/ linked python interpreters
in separate dlls. *Can't you do something like that?
You're definitely correct that homebrew loading and linking would do
the trick. However, because our python stuff makes callbacks into our
C/C++, that complicates the linking process (if I understand you
correctly). Also, then there's the problem of OS X.

- In python 3, the C module API now supports true interpreter
independence, but have all the modules in the python codebase been
converted over? *Are they all now truly compliant? *It will only take
a single static/global state variable in a module to potentially cause
no end of pain in a multiple interpreter environment! *Yikes!

I don't think this is the case (currently). *But you could submit patches
to Python so that at least the 'official' modules (builtin and extensions)
would behave corectly in the case of multiple interpreters. *At least
this is a much lighter task than writing your own GIL-less interpreter.
I agree -- and I've been considering that (or rather, having our
company hire/pay part of the python dev community to do the work). To
consider that, the question becomes, how many modules are we talking
about do you think? 10? 100? I confess that I'm no familiar enough
with the full C python suite to have a good idea of how much work
we're talking about here.

Regards,
Andy


Oct 22 '08 #4

- In python 3, the C module API now supports true interpreter
independence, but have all the modules in the python codebase been
converted over?

No, none of them.
:^)
>
- How close is python 3 really to true multithreaded use?

Python is as thread-safe as ever (i.e. completely thread-safe).
If you're referring to the fact that the GIL does that, then you're
certainly correct. But if you've got multiple CPUs/cores and actually
want to use them, that GIL means you might as well forget about them.
So please take my use of "true multithreaded" to mean "turning off"
the GIL and push the responsibility of object safety to the client/API
level (such as in my QuickTime API example).

I believe that true python independent thread/interpreter support is
paramount and should become the top priority because this is the key
consideration used by developers when they're deciding which
interpreter to embed in their app. Until there's a hello world that
demonstrates running independent python interpreters on multiple app
threads, lua will remain the clear choice over python. Python 3 needs
true interpreter independence and multi-threaded support!

So what patches to achieve that goal have you contributed so far?

In open source, pleas have nearly zero effect; code contributions is
what has effect.
This is just my second email, please be a little patient. :^) But
more seriously, I do represent a company ready, able, and willing to
fund the development of features that we're looking for, so please
understand that I'm definitely not coming to the table empty-handed
here.

I don't think any of the current committers has a significant interest
in supporting multiple interpreters (and I say that as the one who wrote
and implemented PEP 3121). To make a significant change, you need to
start with a PEP, offer to implement it once accepted, and offer to
maintain the feature for five years.
Nice to meet you! :^) Seriously though, thank you for all your work on
3121 and taking the initiative with it! It's definitely the first
step in what companies like ours attract us to embedded an interpreted
language. Specifically: unrestricted interpreter and thread-
independent use.

I would *love* for our company to be 10 times larger and be able to
add another zero to what we'd be able to hire/offer the python dev
community for work that we're looking for, but we unfortunately have
limits at the moment. And I would love to see python become the
leading choice when companies look to use an embedded interpreter, and
I offer my comments here to paint a picture of what can make python
more appealing to commercial software developers. Hopefully, the
python dev community doesn't underestimate the dev funding that could
potentially come in from companies if python grew in certain ways!

So, that said, I represent a company willing to fund the development
of features that move python towards thread-independent operation. No
software engineer can deny that we're entering a new era of
multithreaded processing where support frameworks (such as python)
need to be open minded with how they're used in a multi-threaded
environment--that's all I'm saying here.

Anyway, I can definitely tell you and anyone else interested that
we're willing to put our money where our wish-list is. As I mentioned
in my previous post to Thomas, the next step is to get an
understanding of the options available that will satisfy our needs.
We have a budget for this, but it's not astronomical (it's driven by
the cost associated with dropping python and going with lua--or,
making our own pared-down interpreter implementation). Please let me
be clear--I love python (as a language) and I don't want to switch.
BUT, we have to be able to run interpreters in different threads (and
get unhindered/full CPU core performance--ie. no GIL).

Thoughts? Also, please feel free to email me off-list if you prefer.

Oh, while I'm at it, if anyone in the python dev community (or anyone
that has put real work into python) is interested in our software,
email me and I'll hook you up with a complimentary copy of the
products that use python (music visuals for iTunes and WMP).

Regards,
Andy


Oct 22 '08 #5
I would *love* for our company to be 10 times larger and be able to
add another zero to what we'd be able to hire/offer the python dev
community for work that we're looking for, but we unfortunately have
limits at the moment.
There is another thing about open source that you need to consider:
you don't have to do it all on your own.

It needs somebody to take the lead, start a project, define a plan,
and small steps to approach it. If it's really something that the
community desperately needs, and if you make it clear that you will
just lead, but get nowhere without contributions, then the
contributions will come in.

If there won't be any contributions, then the itch in the the
community isn't that strong that it needs scratching.

Regards,
Martin
Oct 22 '08 #6
Andy wrote:
I agree -- and I've been considering that (or rather, having our
company hire/pay part of the python dev community to do the work). To
consider that, the question becomes, how many modules are we talking
about do you think? 10? 100?
In your Python directory, everything in Lib is Python, I believe.
Everything in DLLs is compiled C extensions. I see about 15 for Windows
3.0. These reflect two separate directories in the source tree. Builtin
classes are part of pythonxx.dll in the main directory. I have no idea
if things such as lists (from listobject.c), for instance, are a
potential problem for you.

You could start with the module of most interest to you, or perhaps a
small one, and see if it needs patching (from your viewpoint) and how
much effort it would take to meet your needs.

Terry Jan Reedy

Oct 22 '08 #7
On Wed, Oct 22, 2008 at 12:32 PM, Andy <an****@gmail.comwrote:
And, yes, I'm aware of the multiprocessing module added in 2.6, but
that stuff isn't lightweight and isn't suitable at all for many
environments (including ours). The bottom line is that if you want to
perform independent processing (in python) on different threads, using
the machine's multiple cores to the fullest, then you're out of luck
under python 2.
So, as the guy-on-the-hook for multiprocessing, I'd like to know what
you might suggest for it to make it more apt for your - and other
environments.

Additionally, have you looked at:
https://launchpad.net/python-safethread
http://code.google.com/p/python-safethread/w/list
(By Adam olsen)

-jesse
Oct 22 '08 #8
Andy wrote:
This is just my second email, please be a little patient. :^)
As a 10-year veteran, I welcome new contributors with new viewpoints and
information.
more appealing to commercial software developers. Hopefully, the
python dev community doesn't underestimate the dev funding that could
potentially come in from companies if python grew in certain ways!
This seems to be something of a chicken-and-egg problem.
So, that said, I represent a company willing to fund the development
of features that move python towards thread-independent operation.
Perhaps you know of and can persuade other companies to contribute to
such focused effort.
No
software engineer can deny that we're entering a new era of
multithreaded processing where support frameworks (such as python)
need to be open minded with how they're used in a multi-threaded
environment--that's all I'm saying here.
The *current* developers seem to be more interested in exploiting
multiple processors with multiprocessing. Note that Google choose that
route for Chrome (as I understood their comic introduction). 2.6 and 3.0
come with a new multiprocessing module that mimics the threading module
api fairly closely. It is now being backported to run with 2.5 and 2.4.

Advances in multithreading will probably require new ideas and
development energy.

Terry Jan Reedy

Oct 22 '08 #9
On Wed, Oct 22, 2008 at 5:34 PM, Terry Reedy <tj*****@udel.eduwrote:
The *current* developers seem to be more interested in exploiting multiple
processors with multiprocessing. Note that Google choose that route for
Chrome (as I understood their comic introduction). 2.6 and 3.0 come with a
new multiprocessing module that mimics the threading module api fairly
closely. It is now being backported to run with 2.5 and 2.4.
That's not exactly correct. Multiprocessing was added to 2.6 and 3.0
as a *additional* method for parallel/concurrent programming that
allows you to use multiple cores - however, as I noted in the PEP:

" In the future, the package might not be as relevant should the
CPython interpreter enable "true" threading, however for some
applications, forking an OS process may sometimes be more
desirable than using lightweight threads, especially on those
platforms where process creation is fast and optimized."

Multiprocessing is not a replacement for a "free threading" future
(ergo my mentioning Adam Olsen's work) - it is a tool in the
"batteries included" box. I don't want my cheerleading and driving of
this to somehow implicate that the rest of Python-Dev thinks this is
the "silver bullet" or final answer in concurrency.

However, a free-threaded python has a lot of implications, and if we
were to do it, it requires we not only "drop" the GIL - it also
requires we consider the ramifications of enabling true threading ala
Java et al - just having "true threads" lying around is great if
you've spent a ton of time learning locking, avoiding shared data/etc,
stepping through and cursing poor debugger support for multiple
threads, etc.

This is why I've been a fan of Adam's approach - enabling free
threading via GIL removal is actually secondary to the project's
stated goal: Enable Safe Threading.

In any case, I've jumped the rails - let's just say there's room in
python for multiprocessing, threading and possible a concurrent
package ala java.util.concurrent - but it really does have to be
thought out and done right.

Speaking of which: If you wanted "real" threads, you could use a
combination of JCC (http://pypi.python.org/pypi/JCC/) and Jython. :)

-jesse
Oct 22 '08 #10
On Oct 22, 10:32*am, Andy <and...@gmail.comwrote:
Dear Python dev community,

I'm CTO at a small software company that makes music visualization
software (you can check us out atwww.soundspectrum.com). *About two
years ago we went with decision to use embedded python in a couple of
our new products, given all the great things about python. *We were
close to using lua but for various reasons we decided to go with
python. *However, over the last two years, there's been one area of
grief that sometimes makes me think twice about our decision to go
with python...

Some background first... * Our software is used for entertainment and
centers around real time, high-performance graphics, so python's
performance, embedded flexibility, and stability are the most
important issues for us. *Our software targets a large cross section
of hardware and we currently ship products for Win32, OS X, and the
iPhone and since our customers are end users, our products have to be
robust, have a tidy install footprint, and be foolproof. *Basically,
we use embedded python and use it to wrap our high performance C++
class set which wraps OpenGL, DirectX and our own software renderer.
In addition to wrapping our C++ frameworks, we use python to perform
various "worker" tasks on worker thread (e.g. image loading andprocessing). *However, we require *true* thread/interpreter
independence so python 2 has been frustrating at time, to say the
least. *Please don't start with "but really, python supports multiple
interpreters" because I've been there many many times with people.
And, yes, I'm aware of the multiprocessing module added in 2.6, but
that stuff isn't lightweight and isn't suitable at all for many
environments (including ours). *The bottom line is that if you want to
perform independentprocessing (in python) on different threads, using
the machine's multiple cores to the fullest, then you're out of luck
under python 2.

Sadly, the only way we could get truly independent interpreters was to
put python in a dynamic library, have our installer make a *duplicate*
copy of it during the installationprocess(e.g. python.dll/.bundle ->
python2.dll/.bundle) and load each one explicitly in our app, so we
can get truly independent interpreters. *In other words, we load a
fresh dynamic lib for each thread-independent interpreter (you can't
reuse the same dynamic library because the OS will just reference the
already-loaded one).

From what I gather from the python community, the basis for not
offering "real" muti-threaded support is that it'd add to much
internal overhead--and I couldn't agree more. *As a high performance C
and C++ guy, I fully agree that thread safety should be at the high
level, not at the low level. *BUT, the lack of truly independent
interpreters is what ultimately prevents using python in cool,
powerful ways. *This shortcoming alone has caused game developers--
both large and small--to choose other embedded interpreters over
python (e.g. Blizzard chose lua over python). *For example, Apple's
QuickTime API is powerful in that high-level instance objects can
leverage performance gains associated with multi-threadedprocessing.
Meanwhile, the QuickTime API simply lists the responsibilities of the
caller regarding thread safety and that's all its needs to do. *In
other words, CPython doesn't need to step in an provide a threadsafe
environment; it just needs to establish the rules and make sure that
its own implementation supports those rules.

More than once, I had actually considered expending company resources
to develop a high performance, truly independent interpreter
implementation of the python core language and modules but in the end
estimated that the size of that project would just be too much, given
our company's current resources. *Should such an implementation ever
be developed, it would be very attractive for companies to support,
fund, and/or license. *The truth is, we just love python as a
language, but it's lack of true interpreter independence (in a
interpreter as well as in a thread sense) remains a *huge* liability.

So, my question becomes: is python 3 ready for true multithreaded
support?? *Can we finally abandon our Frankenstein approach of loading
multiple identical dynamic libs to achieve truly independent
interpreters?? I've reviewed all the new python 3 C API module stuff,
and all I have to say is: whew--better late then never!! *So, although
that solves modules offering truly independent interpreter support,
the following questions remain:

- In python 3, the C module API now supports true interpreter
independence, but have all the modules in the python codebase been
converted over? *Are they all now truly compliant? *It will only take
a single static/global state variable in a module to potentially cause
no end of pain in a multiple interpreter environment! *Yikes!

- How close is python 3 really to true multithreaded use? *The
assumption here is that caller ensures safety (e.g. ensuring that
neither interpreter is in use when serializing data from one to
another).

I believe that true python independent thread/interpreter support is
paramount and should become the top priority because this is the key
consideration used by developers when they're deciding which
interpreter to embed in their app. Until there's a hello world that
demonstrates running independent python interpreters on multiple app
threads, lua will remain the clear choice over python. *Python 3 needs
true interpreter independence and multi-threaded support!
What you describe, truly independent interpreters, is not threading at
all: it is processes, emulated at the application level, with all the
memory cost and none of the OS protections. True threading would
involve sharing most objects.

Your solution depends on what you need:
* Killable "threads" -OS processes
* multicore usage (GIL removal) -OS processes or alternative Python
implementations (PyPy/Jython/IronPython)
* Sane shared objects -safethread
Oct 22 '08 #11
>
What you describe, truly independent interpreters, is not threading at
all: it is processes, emulated at the application level, with all the
memory cost and none of the OS protections. *True threading would
involve sharing most objects.

Your solution depends on what you need:
* Killable "threads" -OS processes
* multicore usage (GIL removal) -OS processes or alternative Python
implementations (PyPy/Jython/IronPython)
* Sane shared objects -safethread

I realize what you're saying, but it's better said there's two issues
at hand:

1) Independent interpreters (this is the easier one--and solved, in
principle anyway, by PEP 3121, by Martin v. Lwis, but is FAR from
being carried through in modules as he pointed out). As you point
out, this doesn't directly relate to multi-threading BUT it is
intimately tied to the issue because if, in principle, every module
used instance data (rather than static data), then python would be
WELL on its way to "free threading" (as Jesse Noller calls it), or as
I was calling it "true multi-threading".

2) Barriers to "free threading". As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps (see my QuickTime API example). Perhaps if we could go
back in time, we would not put the GIL in place, strict guidelines
regarding multithreaded use would have been established, and PEP 3121
would have been mandatory for C modules. Then again--screw that, if I
could go back in time, I'd just go for the lottery tickets!! :^)

Anyway, I've been at this issue for quite a while now (we're
approaching our 3rd release cycle), so I'm pretty comfortable with the
principles at hand. I'd say the theme of your comments share the
theme of others here, so perhaps consider where end-user software
houses (like us) are coming from. Specifically, developing commercial
software for end users imposes some restrictions that open source
development communities aren't often as sensitive to, namely:

- Performance -- emulation is a no-go (e.g. Jython)
- Maturity and Licensing -- experimental/academic projects are no-go
(PyPy)
- Cross platform support -- love it or hate it, Win32 and OS X are all
that matter when you're talking about selling (and supporting)
software to the masses. I'm just the messenger here (ie. this is NOT
flamebait). We publish for OS X, so IronPython is therefore out.

Basically, our company is at a crossroads where we really need light,
clean "free threading" as Jesse calls it (e.g. on the iPhone, using
our python drawing wrapper to do primary drawing while running python
jobs on another thread doing image decoding and processing). In our
current iPhone app, we achieve this by using two python bundles
(dynamic libs) in the way I described in my initial post. Sure, thus
solves our problem, but it's pretty messy, sucks up resources, and has
been a pain to maintain.

Moving forward, please understand my posts here are also intended to
give the CPython dev community a glimpse of the issues that may not be
as visible to you guys (as they are for dev houses like us). For
example, it'd be pretty cool if Blizzard went with python instead of
lua, wouldn't you think? But some of the issues I've raised here no
doubt factor in to why end-user dev houses ultimately may have to pass
up python in favor of another interpreted language.

Bottom line: why give prospective devs any reason to turn down python--
there's just so many great things about python!

Regards,
Andy


Oct 23 '08 #12
Jesse, Terry, Martin -

First off, thanks again for your time and interest in this matter.
It's definitely encouraging to know that time and real effort is being
put into the matter and I hope my posts on this subject are hopefully
an informative data point for everyone here.

Thanks for that link to Adam Olsen's work, Jesse--I'll definitely look
more closely at it. As I mentioned in my previous post, end-user devs
like me are programmed to get nervous around new mods but at first
glance there definitely seems to be interesting. My initial reaction,
as interesting as the project is, goes back to by previous post about
putting all the object safety responsibility on the shoulders of the
API client. That way, one gets the best of both worlds: free
threading and no unnecessary object locking/blocking (ie. the API
client will manage moving the synchronization req'd to move objects
from one interpreter to another). I could have it wrong, but it seems
like safethread inserts some thread-safety features but they come at
the cost of performance. I know I keep mentioning it, but I think the
QuickTime API (and its documentation) is a great model for how any API
should approach threading. Check out their docs to see how they
address it; conceptually speaking, there's not a single line of thread
safety in QuickTime:

http://developer.apple.com/technotes/tn/tn2125.html

In short: multiple thread is tricky; it's the responsibility of the
API client to not do hazardous things.

And for the record: the module multiprocessing is totally great answer
for python-level MP stuff--very nice work, Jesse!

I'd like to post and discuss more, but I'll pick it up tomorrow...
All this stuff is fun and interesting to talk about, but I have to get
to some other things and it unfortunately comes down to cost
analysis. Sadly, I look at it as I can allocate 2-3 man months (~
$40k) to build our own basic python interpreter implementation that
solves our need for free threading and increased performance (we've
built various internal interpreters over the years so we have good
experience in house, our tools are high performance, and we only use a
pretty small subset of python). Or, there's the more attractive
approach to work with the python dev community and put that dev
expenditure into a form everyone can benefit from.
Regards,
Andy


On Oct 22, 5:21*pm, "Jesse Noller" <jnol...@gmail.comwrote:
On Wed, Oct 22, 2008 at 12:32 PM, Andy <and...@gmail.comwrote:
And, yes, I'm aware of the multiprocessing module added in 2.6, but
that stuff isn't lightweight and isn't suitable at all for many
environments (including ours). *The bottom line is that if you want to
perform independent processing (in python) on different threads, using
the machine's multiple cores to the fullest, then you're out of luck
under python 2.

So, as the guy-on-the-hook for multiprocessing, I'd like to know what
you might suggest for it to make it more apt for your - and other
environments.

Additionally, have you looked at:https://launchpad.net/python-safethr...ethread/w/list
(By Adam olsen)

-jesse
Oct 23 '08 #13

You seem confused. *PEP 3121 is for isolated interpreters (ie emulated
processes), not threading.
Please reread my points--inherently isolated interpreters (ie. the top
level object) are indirectly linked to thread independence. I don't
want to argue, but you seem hell-bent on not hearing what I'm trying
to say here.
>
Got some real benchmarks to back that up? *How about testing it on a
16 core (or more) box and seeing how it scales?
I don't care to argue with you, and you'll have to take it on faith
that I'm not spouting hot air. But just to put this to rest, I'll
make it clear in this Jython case:

You can't sell software to end users and expect them have a recent,
working java distro. Look around you: no real commercial software
title that sells to soccer moms and gamers use java. There's method
to commercial software production, so please don't presume that you
know my job, product line, and customers better than me, ok?

Just to put things in perspective, I already have exposed my company
to more support and design liability than I knew I was getting into by
going with python (as a result of all this thread safety and
interpreter independence business). I love to go into that one, but
it's frankly just not a good use of my time right now. Please just
accept that when someone says an option is a deal breaker, then it's a
deal breaker. This isn't some dude's masters thesis project here--we
pay our RENT and put our KIDS through school because we sell and ship
software that works is meant to entertain people happy.
>
I'd like to see python used more, but fixing these things properly is
not as easy as believed. *Those in the user community see only their
immediate problem (threads don't use multicore). *People like me see
much bigger problems. *We need consensus on the problems, and how to
solve it, and a commitment to invest what's required.
Well, you seem to come down pretty hard on people that at your
doorstep saying their WILLING and INTERESTED in supporting python
development. And, you're exactly right: users see only their
immediate problem--but that's the definition of being a user. If
users saw the whole picture from the dev side, then they be
developers, not users.

Please consider that you're representing the python dev community
here; I'm you're friend here, not your enemy.

Andy


Oct 23 '08 #14
On Oct 22, 10:31*pm, Andy <and...@gmail.comwrote:
You seem confused. *PEP 3121 is for isolated interpreters (ie emulated
processes), not threading.

Please reread my points--inherently isolated interpreters (ie. the top
level object) are indirectly linked to thread independence. *I don't
want to argue, but you seem hell-bent on not hearing what I'm trying
to say here.
I think the confusion is a matter of context. Your app, written in C
or some other non-python language, shares data between the threads and
thus treats them as real threads. However, from python's perspective
nothing is shared, and thus it is processes.

Although this contradiction is fine for embedding purposes, python is
a general purpose language, and needs to be capable of directly
sharing objects. Imagine you wanted to rewrite the bulk of your app
in python, with only a relatively small portion left in a C extension
module.

Got some real benchmarks to back that up? *How about testing it on a
16 core (or more) box and seeing how it scales?

I don't care to argue with you, and you'll have to take it on faith
that I'm not spouting hot air. *But just to put this to rest, I'll
make it clear in this Jython case:

You can't sell software to end users and expect them have a recent,
working java distro. *Look around you: no real commercial software
title that sells to soccer moms and gamers use java. *There's method
to commercial software production, so please don't presume that you
know my job, product line, and customers better than me, ok?

Just to put things in perspective, I already have exposed my company
to more support and design liability than I knew I was getting into by
going with python (as a result of all this thread safety and
interpreter independence business). *I love to go into that one, but
it's frankly just not a good use of my time right now. *Please just
accept that when someone says an option is a deal breaker, then it's a
deal breaker. *This isn't some dude's masters thesis project here--we
pay our RENT and put our KIDS through school because we sell and ship
software that works is meant to entertain people happy.
Consider it accepted. I understand that PyPy/Jython/IronPython don't
fit your needs. Likewise though, CPython cannot fit my needs. What
we both need simply does not exist today.

I'd like to see python used more, but fixing these things properly is
not as easy as believed. *Those in the user community see only their
immediate problem (threads don't use multicore). *People like me see
much bigger problems. *We need consensus on the problems, and how to
solve it, and a commitment to invest what's required.

Well, you seem to come down pretty hard on people that at your
doorstep saying their WILLING and INTERESTED in supporting python
development. *And, you're exactly right: *users see only their
immediate problem--but that's the definition of being a user. *If
users saw the whole picture from the dev side, then they be
developers, not users.

Please consider that you're representing the python dev community
here; I'm you're friend here, not your enemy.
I'm sorry if I came across harshly. My intent was merely to push you
towards supporting long-term solutions, rather than short-term ones.
Oct 23 '08 #15
Andy wrote:
2) Barriers to "free threading". As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps (see my QuickTime API example). Perhaps if we could go
back in time, we would not put the GIL in place, strict guidelines
regarding multithreaded use would have been established, and PEP 3121
would have been mandatory for C modules. Then again--screw that, if I
could go back in time, I'd just go for the lottery tickets!! :^)
I'm very - not absolute, but very - sure that Guido and the initial
designers of Python would have added the GIL anyway. The GIL makes
Python faster on single core machines and more stable on multi core
machines. Other language designers think the same way. Ruby recently got
a GIL. The article
http://www.infoq.com/news/2007/05/ru...eading-futures explains the
rationales for a GIL in Ruby. The article also holds a quote from Guido
about threading in general.

Several people inside and outside the Python community think that
threads are dangerous and don't scale. The paper
http://www.eecs.berkeley.edu/Pubs/Te...ECS-2006-1.pdf sums it
up nicely, It explains why modern processors are going to cause more and
more trouble with the Java approach to threads, too.

Python *must* gain means of concurrent execution of CPU bound code
eventually to survive on the market. But it must get the right means or
we are going to suffer the consequences.

Christian
Oct 23 '08 #16
On Oct 23, 11:30*am, Glenn Linderman <v+pyt...@g.nevcal.comwrote:
On approximately 10/23/2008 12:24 AM, came the following characters from
the keyboard of Christian Heimes:
Andy wrote:
2) Barriers to "free threading". *As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps (see my QuickTime API example). *Perhaps if we could go
back in time, we would not put the GIL in place, strict guidelines
regarding multithreaded use would have been established, and PEP 3121
would have been mandatory for C modules. *Then again--screw that, ifI
could go back in time, I'd just go for the lottery tickets!! :^)

I've been following this discussion with interest, as it certainly seems
that multi-core/multi-CPU machines are the coming thing, and many
applications will need to figure out how to use them effectively.
I'm very - not absolute, but very - sure that Guido and the initial
designers of Python would have added the GIL anyway. The GIL makes
Python faster on single core machines and more stable on multi core
machines. Other language designers think the same way. Ruby recently
got a GIL. The article
http://www.infoq.com/news/2007/05/ru...uturesexplains the
rationales for a GIL in Ruby. The article also holds a quote from
Guido about threading in general.
Several people inside and outside the Python community think that
threads are dangerous and don't scale. The paper
http://www.eecs.berkeley.edu/Pubs/Te...2006-1.pdfsums
it up nicely, It explains why modern processors are going to cause
more and more trouble with the Java approach to threads, too.

Reading this PDF paper is extremely interesting (albeit somewhat
dependent on understanding abstract theories of computation; I have
enough math background to follow it, sort of, and most of the text can
be read even without fully understanding the theoretical abstractions).

I have already heard people talking about "Java applications are
buggy". *I don't believe that general sequential programs written in
Java are any buggier than programs written in other languages... so I
had interpreted that to mean (based on some inquiry) that complex,
multi-threaded Java applications are buggy. *And while I also don't
believe that complex, multi-threaded programs written in Java are any
buggier than complex, multi-threaded programs written in other
languages, it does seem to be true that Java is one of the currently
popular languages in which to write complex, multi-threaded programs,
because of its language support for threads and concurrency primitives. *
These reports were from people that are not programmers, but are field
IT people, that have bought and/or support software and/or hardware with
drivers, that are written in Java, and seem to have non-ideal behavior,
(apparently only) curable by stopping/restarting the application or
driver, or sometimes requiring a reboot.

The paper explains many traps that lead to complex, multi-threaded
programs being buggy, and being hard to test. *I have worked with
parallel machines, applications, and databases for 25 years, and can
appreciate the succinct expression of the problems explained within the
paper, and can, from experience, agree with its premises and
conclusions. *Parallel applications only have been commercial successes
when the parallelism is tightly constrained to well-controlled patterns
that could be easily understood. *Threads, especially in "cooperation"
with languages that use memory pointers, have the potential to get out
of control, in inexplicable ways.
Although the paper is correct in many ways, I find it fails to
distinguish the core of the problem from the chaff surrounding it, and
thus is used to justify poor language designs.

For example, the amount of interaction may be seen as a spectrum: at
one end is C or Java threads, with complicated memory models, and a
tendency to just barely control things using locks. At the other end
would be completely isolated processes with no form of IPC. The later
is considered the worst possible, while the latter is the best
possible (purely sequential).

However, the latter is too weak for many uses. At a minimum we'd like
some pipes to communicate. Helps, but it's still too weak. What if
you have a large amount of data to share, created at startup but
otherwise not modified? So we add some read only types and ways to
define your own read only types. A couple of those types need a
process associated with them, so we make sure process handles are
proper objects too.

What have we got now? It's more on the thread end of the spectrum
than the process end, but it's definitely not a C or Java thread, and
it's definitely not an OS process. What is it? Does it have the
problems in the paper? Only some? Which?

Another peeve I have is his characterization of the observer pattern.
The generalized form of the problem exists in both single-threaded
sequential programs, in the form of unexpected reentrancy, and message
passing, with infinite CPU usage or infinite number of pending
messages.

Perhaps threading makes it much worse; I've heard many anecdotes that
would support that. Or perhaps it's the lack of automatic deadlock
detection, giving a clear and diagnosable error for you to fix.
Certainly, the mystery and extremeness of a deadlock could explain how
much it scales people. Either way the paper says nothing.

Python *must* gain means of concurrent execution of CPU bound code
eventually to survive on the market. But it must get the right means
or we are going to suffer the consequences.

This statement, after reading the paper, seems somewhat in line with the
author's premise that language acceptability requires that a language be
self-contained/monolithic, and potentially sufficient to implement
itself. *That seems to also be one of the reasons that Java is used
today for threaded applications. *It does seem to be true, given current
hardware trends, that _some mechanism_ must be provided to obtain the
benefit of multiple cores/CPUs to a single application, and that Python
must either implement or interface to that mechanism to continue to be a
viable language for large scale application development.

Andy seems to want an implementation of independent Python processes
implemented as threads within a single address space, that can be
coordinated by an outer application. *This actually corresponds to the
model promulgated in the paper as being most likely to succeed. *Of
course, it maps nicely into a model using separate processes,
coordinated by an outer process, also. *The differences seem to be:

1) Most applications are historically perceived as corresponding to
single processes. *Language features for multi-processing are rare, and
such languages are not in common use.

2) A single address space can be convenient for the coordinating outer
application. *It does seem simpler and more efficient to simply "copy"
data from one memory location to another, rather than send it in a
message, especially if the data are large. *On the other hand,
coordination of memory access between multiple cores/CPUs effectively
causes memory copies from one cache to the other, and if memory is
accessed from multiple cores/CPUs regularly, the underlying hardware
implements additional synchronization and copying of data, potentially
each time the memory is accessed. *Being forced to do message passing of
data between processes can actually be more efficient than access to
shared memory at times. *I should note that in my 25 years of parallel
development, all the systems created used a message passing paradigm,
partly because the multiple CPUs often didn't share the same memory
chips, much less the same address space, and that a key feature of all
the successful systems of that nature was an efficient inter-CPU message
passing mechanism. *I should also note that Herb Sutter has a recent
series of columns in Dr Dobbs regarding multi-core/multi-CPU parallelism
and a variety of implementation pitfalls, that I found to be very
interesting reading.
Try looking at it on another level: when your CPU wants to read from a
bit of memory controlled by another CPU it sends them a message
requesting they get it for us. They send back a message containing
that memory. They also note we have it, in case they want to modify
it later. We also note where we got it, in case we want to modify it
(and not wait for them to do modifications for us).

Message passing vs shared memory isn't really a yes/no question. It's
about ratios, usage patterns, and tradeoffs. *All* programs will
share data, but in what way? If it's just the code itself you can
move the cache validation into software and simplify the CPU, making
it faster. If the shared data is a lot more than that, and you use it
to coordinate accesses, then it'll be faster to have it in hardware.

It's quite possible they'll come up with something that seems quite
different, but in reality is the same sort of rearrangement. Add
hardware support for transactions, move the caching partly into
software, etc.
>
I have noted the multiprocessing module that is new to Python 2.6/3.0
being feverishly backported to Python 2.5, 2.4, etc... indicating that
people truly find the model/module useful... seems that this is one way,
in Python rather than outside of it, to implement the model Andy is
looking for, although I haven't delved into the details of that module
yet, myself. *I suspect that a non-Python application could load one
embedded Python interpreter, and then indirectly use the multiprocessing
module to control other Python interpreters in other processors. *I
don't know that multithreading primitives such as described in the paper
are available in the multiprocessing module, but perhaps they can be
implemented in some manner using the tools that are provided; in any
case, some interprocess communication primitives are provided via this
new Python module.

There could be opportunity to enhance Python with process creation and
process coordination operations, rather than have it depend on
easy-to-implement-incorrectly coordination patterns or
easy-to-use-improperly libraries/modules of multiprocessing primitives
(this is not a slam of the new multiprocessing module, which appears to
be filling a present need in rather conventional ways, but just to point
out that ideas promulgated by the paper, which I suspect 2 years later
are still research topics, may be a better abstraction than the
conventional mechanisms).

One thing Andy hasn't yet explained (or I missed) is why any of his
application is coded in a language other than Python. *I can think of a
number of possibilities:

A) (Historical) It existed, then the desire for extensions was seen, and
Python was seen as a good extension language.

B) Python is inappropriate (performance?) for some of the algorithms
(but should they be coded instead as Python extensions, with the core
application being in Python?)

C) Unavailability of Python wrappers for particularly useful 3rd-party
libraries

D) Other?
"It already existed" is definitely the original reason, but now it
includes single-threaded performance and multi-threaded scalability.
Although the idea of "just write an extension that releases the GIL"
is a common suggestion, it needs to be fairly coarse to be effective,
and ensure little of the CPU time is left in python. If the apps
spreads around it's CPU time it is likely impossible to use python
effectively.
Oct 23 '08 #17
Andy wrote:
1) Independent interpreters (this is the easier one--and solved, in
principle anyway, by PEP 3121, by Martin v. Löwis
Something like that is necessary for independent interpreters,
but not sufficient. There are also all the built-in constants
and type objects to consider. Most of these are statically
allocated at the moment.
2) Barriers to "free threading". As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps
No, it's there because it's necessary for acceptable performance
when multiple threads are running in one interpreter. Independent
interpreters wouldn't mean the absence of a GIL; it would only
mean each interpreter having its own GIL.

--
Greg
Oct 24 '08 #18
You seem confused. PEP 3121 is for isolated interpreters (ie emulated
processes), not threading.
Just a small remark: this wasn't the primary objective of the PEP.
The primary objective was to support module cleanup in a reliable
manner, to allow eventually to get modules garbage-collected properly.
However, I also kept the isolated interpreters feature in mind there.

Regards,
Martin
Oct 24 '08 #19

Instead of "appdomains" (one interpreter per thread), or free
threading, you could use multiple processes. Take a look at the new
multiprocessing module in Python 2.6. It has roughly the same
interface as Python's threading and queue modules, but uses processes
instead of threads. Processes are scheduled independently by the
operating system. The objects in the multiprocessing module also tend
to have much better performance than their threading and queue
counterparts. If you have a problem with threads due to the GIL, the
multiprocessing module with most likely take care of it.

There is a fundamental problem with using homebrew loading of multiple
(but renamed) copies of PythonXX.dll that is easily overlooked. That
is, extension modules (.pyd) are DLLs as well. Even if required by two
interpreters, they will only be loaded into the process image once.
Thus you have to rename all of them as well, or you will get havoc
with refcounts. Not to speak of what will happen if a Windows HANDLE
is closed by one interpreter while still needed by another. It is
almost guaranteed to bite you, sooner or later.

There are other options as well:

- Use IronPython. It does not have a GIL.

- Use Jython. It does not have a GIL.

- Use pywin32 to create isolated outproc COM servers in Python. (I'm
not sure what the effect of inproc servers would be.)

- Use os.fork() if your platform supports it (Linux, Unix, Apple,
Cygwin, Windows Vista SUA). This is the standard posix way of doing
multiprocessing. It is almost unbeatable if you have a fast copy-on-
write implementation of fork (that is, all platforms except Cygwin).




Oct 24 '08 #20
On Oct 24, 9:35*am, sturlamolden <sturlamol...@yahoo.nowrote:
Instead of "appdomains" (one interpreter per thread), or free
threading, you could use multiple processes. Take a look at the new
multiprocessing module in Python 2.6.
That's mentioned earlier in the thread.
>
There is a fundamental problem with using homebrew loading of multiple
(but renamed) copies of PythonXX.dll that is easily overlooked. That
is, extension modules (.pyd) are DLLs as well.
Tell me about it--there's all kinds of problems and maintenance
liabilities with our approach. That's why I'm here talking about this
stuff.
There are other options as well:

- Use IronPython. It does not have a GIL.

- Use Jython. It does not have a GIL.

- Use pywin32 to create isolated outproc COM servers in Python. (I'm
not sure what the effect of inproc servers would be.)

- Use os.fork() if your platform supports it (Linux, Unix, Apple,
Cygwin, Windows Vista SUA). This is the standard posix way of doing
multiprocessing. It is almost unbeatable if you have a fast copy-on-
write implementation of fork (that is, all platforms except Cygwin).
This is discussed earlier in the thread--they're unfortunately all
out.

Oct 24 '08 #21
Terry Reedy wrote:
Everything in DLLs is compiled C extensions. I see about 15 for Windows
3.0.
Ah, weren't that wonderful times back in the days of Win3.0, when DLL-hell was
inhabited by only 15 libraries? *sigh*

.... although ... wait, didn't Win3.0 have more than that already? Maybe you
meant Windows 1.0?

SCNR-ly,

Stefan
Oct 24 '08 #22
On Oct 24, 3:58*pm, "Andy O'Meara" <and...@gmail.comwrote:
This is discussed earlier in the thread--they're unfortunately all
out.
It occurs to me that tcl is doing what you want. Have you ever thought
of not using Python?

That aside, the fundamental problem is what I perceive a fundamental
design flaw in Python's C API. In Java JNI, each function takes a
JNIEnv* pointer as their first argument. There is nothing the
prevents you from embedding several JVMs in a process. Python can
create embedded subinterpreters, but it works differently. It swaps
subinterpreters like a finite state machine: only one is concurrently
active, and the GIL is shared. The approach is fine, except it kills
free threading of subinterpreters. The argument seems to be that
Apache's mod_python somehow depends on it (for reasons I don't
understand).

Oct 24 '08 #23
On Oct 24, 2:12*am, greg <g...@cosc.canterbury.ac.nzwrote:
Andy wrote:
1) Independent interpreters (this is the easier one--and solved, in
principle anyway, by PEP 3121, by Martin v. Lwis

Something like that is necessary for independent interpreters,
but not sufficient. There are also all the built-in constants
and type objects to consider. Most of these are statically
allocated at the moment.
Agreed--I was just trying to speak generally. Or, put another way,
there's no hope for independent interpreters without the likes of PEP
3121. Also, as Martin pointed out, there's the issue of module
cleanup some guys here may underestimate (and I'm glad Martin pointed
out the importance of it). Without the module cleanup, every time a
dynamic library using python loads and unloads you've got leaks. This
issue is a real problem for us since our software is loaded and
unloaded many many times in a host app (iTunes, WMP, etc). I hadn't
raised it here yet (and I don't want to turn the discussion to this),
but lack of multiple load and unload support has been another painful
issue that we didn't expect to encounter when we went with python.

2) Barriers to "free threading". *As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps

No, it's there because it's necessary for acceptable performance
when multiple threads are running in one interpreter. Independent
interpreters wouldn't mean the absence of a GIL; it would only
mean each interpreter having its own GIL.
I see what you're saying, but let's note that what you're talking
about at this point is an interpreter containing protection from the
client level violating (supposed) direction put forth in python
multithreaded guidelines. Glenn Linderman's post really gets at
what's at hand here. It's really important to consider that it's not
a given that python (or any framework) has to be designed against
hazardous use. Again, I refer you to the diagrams and guidelines in
the QuickTime API:

http://developer.apple.com/technotes/tn/tn2125.html

They tell you point-blank what you can and can't do, and it's that's
simple. Their engineers can then simply create the implementation
around those specs and not weigh any of the implementation down with
sync mechanisms. I'm in the camp that simplicity and convention wins
the day when it comes to an API. It's safe to say that software
engineers expect and assume that a thread that doesn't have contact
with other threads (except for explicit, controlled message/object
passing) will run unhindered and safely, so I raise an eyebrow at the
GIL (or any internal "helper" sync stuff) holding up an thread's
performance when the app is designed to not need lower-level global
locks.

Anyway, let's talk about solutions. My company looking to support
python dev community endeavor that allows the following:

- an app makes N worker threads (using the OS)

- each worker thread makes its own interpreter, pops scripts off a
work queue, and manages exporting (and then importing) result data to
other parts of the app. Generally, we're talking about CPU-bound work
here.

- each interpreter has the essentials (e.g. math support, string
support, re support, and so on -- I realize this is open-ended, but
work with me here).

Let's guesstimate about what kind of work we're talking about here and
if this is even in the realm of possibility. If we find that it *is*
possible, let's figure out what level of work we're talking about.
From there, I can get serious about writing up a PEP/spec, paid
support, and so on.

Regards,
Andy

Oct 24 '08 #24
I'm not finished reading the whole thread yet, but I've got some
things below to respond to this post with.

On Thu, Oct 23, 2008 at 9:30 AM, Glenn Linderman <v+******@g.nevcal.comwrote:
On approximately 10/23/2008 12:24 AM, came the following characters from the
keyboard of Christian Heimes:
>>
Andy wrote:
>>>
2) Barriers to "free threading". As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps (see my QuickTime API example). Perhaps if we could go
back in time, we would not put the GIL in place, strict guidelines
regarding multithreaded use would have been established, and PEP 3121
would have been mandatory for C modules. Then again--screw that, if I
could go back in time, I'd just go for the lottery tickets!! :^)


I've been following this discussion with interest, as it certainly seems
that multi-core/multi-CPU machines are the coming thing, and many
applications will need to figure out how to use them effectively.
>I'm very - not absolute, but very - sure that Guido and the initial
designers of Python would have added the GIL anyway. The GIL makes Python
faster on single core machines and more stable on multi core machines. Other
language designers think the same way. Ruby recently got a GIL. The article
http://www.infoq.com/news/2007/05/ru...eading-futures explains the
rationales for a GIL in Ruby. The article also holds a quote from Guido
about threading in general.

Several people inside and outside the Python community think that threads
are dangerous and don't scale. The paper
http://www.eecs.berkeley.edu/Pubs/Te...ECS-2006-1.pdf sums it up
nicely, It explains why modern processors are going to cause more and more
trouble with the Java approach to threads, too.

Reading this PDF paper is extremely interesting (albeit somewhat dependent
on understanding abstract theories of computation; I have enough math
background to follow it, sort of, and most of the text can be read even
without fully understanding the theoretical abstractions).

I have already heard people talking about "Java applications are buggy". I
don't believe that general sequential programs written in Java are any
buggier than programs written in other languages... so I had interpreted
that to mean (based on some inquiry) that complex, multi-threaded Java
applications are buggy. And while I also don't believe that complex,
multi-threaded programs written in Java are any buggier than complex,
multi-threaded programs written in other languages, it does seem to be true
that Java is one of the currently popular languages in which to write
complex, multi-threaded programs, because of its language support for
threads and concurrency primitives. These reports were from people that are
not programmers, but are field IT people, that have bought and/or support
software and/or hardware with drivers, that are written in Java, and seem to
have non-ideal behavior, (apparently only) curable by stopping/restarting
the application or driver, or sometimes requiring a reboot.

The paper explains many traps that lead to complex, multi-threaded programs
being buggy, and being hard to test. I have worked with parallel machines,
applications, and databases for 25 years, and can appreciate the succinct
expression of the problems explained within the paper, and can, from
experience, agree with its premises and conclusions. Parallel applications
only have been commercial successes when the parallelism is tightly
constrained to well-controlled patterns that could be easily understood.
Threads, especially in "cooperation" with languages that use memory
pointers, have the potential to get out of control, in inexplicable ways.

>Python *must* gain means of concurrent execution of CPU bound code
eventually to survive on the market. But it must get the right means or we
are going to suffer the consequences.

This statement, after reading the paper, seems somewhat in line with the
author's premise that language acceptability requires that a language be
self-contained/monolithic, and potentially sufficient to implement itself.
That seems to also be one of the reasons that Java is used today for
threaded applications. It does seem to be true, given current hardware
trends, that _some mechanism_ must be provided to obtain the benefit of
multiple cores/CPUs to a single application, and that Python must either
implement or interface to that mechanism to continue to be a viable language
for large scale application development.

Andy seems to want an implementation of independent Python processes
implemented as threads within a single address space, that can be
coordinated by an outer application. This actually corresponds to the model
promulgated in the paper as being most likely to succeed. Of course, it
maps nicely into a model using separate processes, coordinated by an outer
process, also. The differences seem to be:

1) Most applications are historically perceived as corresponding to single
processes. Language features for multi-processing are rare, and such
languages are not in common use.

2) A single address space can be convenient for the coordinating outer
application. It does seem simpler and more efficient to simply "copy" data
from one memory location to another, rather than send it in a message,
especially if the data are large. On the other hand, coordination of memory
access between multiple cores/CPUs effectively causes memory copies from one
cache to the other, and if memory is accessed from multiple cores/CPUs
regularly, the underlying hardware implements additional synchronization and
copying of data, potentially each time the memory is accessed. Being forced
to do message passing of data between processes can actually be more
efficient than access to shared memory at times. I should note that in my
25 years of parallel development, all the systems created used a message
passing paradigm, partly because the multiple CPUs often didn't share the
same memory chips, much less the same address space, and that a key feature
of all the successful systems of that nature was an efficient inter-CPU
message passing mechanism. I should also note that Herb Sutter has a recent
series of columns in Dr Dobbs regarding multi-core/multi-CPU parallelism and
a variety of implementation pitfalls, that I found to be very interesting
reading.

I have noted the multiprocessing module that is new to Python 2.6/3.0 being
feverishly backported to Python 2.5, 2.4, etc... indicating that people
truly find the model/module useful... seems that this is one way, in Python
rather than outside of it, to implement the model Andy is looking for,
although I haven't delved into the details of that module yet, myself. I
suspect that a non-Python application could load one embedded Python
interpreter, and then indirectly use the multiprocessing module to control
other Python interpreters in other processors. I don't know that
multithreading primitives such as described in the paper are available in
the multiprocessing module, but perhaps they can be implemented in some
manner using the tools that are provided; in any case, some interprocess
communication primitives are provided via this new Python module.

There could be opportunity to enhance Python with process creation and
process coordination operations, rather than have it depend on
easy-to-implement-incorrectly coordination patterns or
easy-to-use-improperly libraries/modules of multiprocessing primitives (this
is not a slam of the new multiprocessing module, which appears to be filling
a present need in rather conventional ways, but just to point out that ideas
promulgated by the paper, which I suspect 2 years later are still research
topics, may be a better abstraction than the conventional mechanisms).

One thing Andy hasn't yet explained (or I missed) is why any of his
application is coded in a language other than Python. I can think of a
number of possibilities:

A) (Historical) It existed, then the desire for extensions was seen, and
Python was seen as a good extension language.

B) Python is inappropriate (performance?) for some of the algorithms (but
should they be coded instead as Python extensions, with the core application
being in Python?)

C) Unavailability of Python wrappers for particularly useful 3rd-party
libraries

D) Other?
We develop virtual instrument plugins for music production using
AudioUnit, VST, and RTAS on Windows and OS X. While our dsp engine's
code has to be written in C/C++ for performance reasons, the gui could
have been written in python. But, we didn't because:

1) Our project lead didn't know python, and the project began with
little time for him to learn it.
2) All of our third-party libs (for dsp, plugin-wrappers, etc) are
written in C++, so it would far easier to write and debug our app if
written in the same language. Could I do it now? yes. Could we do it
then? No.

** Additionally **, we would have run into this problem, which is very
appropriate to this thread:

3) Adding python as an audio scripting language in the audio thread
would have caused concurrency issues if our GUI had been written in
python, since audio threads are not allowed to make blockign calls
(f.ex. acquiring the GIL).

OK, I'll continue reading the thread now :)
>
--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

--
http://mail.python.org/mailman/listinfo/python-list
Oct 24 '08 #25

Glenn, great post and points!
>
Andy seems to want an implementation of independent Python processes
implemented as threads within a single address space, that can be
coordinated by an outer application. *This actually corresponds to the
model promulgated in the paper as being most likely to succeed.
Yeah, that's the idea--let the highest levels run and coordinate the
show.
>
It does seem simpler and more efficient to simply "copy"
data from one memory location to another, rather than send it in a
message, especially if the data are large.
That's the rub... In our case, we're doing image and video
manipulation--stuff not good to be messaging from address space to
address space. The same argument holds for numerical processing with
large data sets. The workers handing back huge data sets via
messaging isn't very attractive.
One thing Andy hasn't yet explained (or I missed) is why any of his
application is coded in a language other than Python. *
Our software runs in real time (so performance is paramount),
interacts with other static libraries, depends on worker threads to
perform real-time image manipulation, and leverages Windows and Mac OS
API concepts and features. Python's performance hits have generally
been a huge challenge with our animators because they often have to go
back and massage their python code to improve execution performance.
So, in short, there are many reasons why we use python as a part
rather than a whole.

The other area of pain that I mentioned in one of my other posts is
that what we ship, above all, can't be flaky. The lack of module
cleanup (intended to be addressed by PEP 3121), using a duplicate copy
of the python dynamic lib, and namespace black magic to achieve
independent interpreters are all examples that have made using python
for us much more challenging and time-consuming then we ever
anticipated.

Again, if it turns out nothing can be done about our needs (which
appears to be more and more like the case), I think it's important for
everyone here to consider the points raised here in the last week.
Moreover, realize that the python dev community really stands to gain
from making python usable as a tool (rather than a monolith). This
fact alone has caused lua to *rapidly* rise in popularity with
software companies looking to embed a powerful, lightweight
interpreter in their software.

As a python language fan an enthusiast, don't let lua win! (I say
this endearingly of course--I have the utmost respect for both
communities and I only want to see CPython be an attractive pick when
a company is looking to embed a language that won't intrude upon their
app's design).
Andy
Oct 24 '08 #26
We are in the same position as Andy here.

I think that something that would help people like us produce
something in code form is a collection of information outlining the
problem and suggested solutions, appropriate parts of the CPython's
current threading API, and pros and cons of the many various proposed
solutions to the different levels of the problem. The most valuable
information I've found is contained in the many (lengthy!) discussions
like this one, a few related PEP's, and the CPython docs, but has
anyone condensed the state of the problem into a wiki or something
similar? Maybe we should start one?

For example, Guido's post here
http://www.artima.com/weblogs/viewpo...14235describes some
possible solutions to the problem, like interpreter-specific locks, or
fine-grained object locks, and he also mentions the primary
requirement of not harming from the performance of single-threaded
apps. As I understand it, that requirement does not rule out new build
configurations that provide some level of concurrency, as long as you
can still compile python so as to perform as well on single-threaded
apps.

To add to the heap of use cases, the most important thing to us is to
simple have the python language and the sip/PyQt modules available to
us. All we wanted to do was embed the interpreter and language core as
a local scripting engine, so had we patched python to provide
concurrent execution, we wouldn't have cared about all of the other
unsuppported extension modules since our scripts are quite
application-specific.

It seems to me that the very simplest move would be to remove global
static data so the app could provide all thread-related data, which
Andy suggests through references to the QuickTime API. This would
suggest compiling python without thread support so as to leave it up
to the application.

Anyway, I'm having fun reading all of these papers and news postings,
but it's true that code talks, and it could be a little easier if the
state of the problems was condensed. This could be an intense and fun
project, but frankly it's a little tough to keep it all in my head. Is
there a wiki or something out there or should we start one, or do I
just need to read more code?

On Fri, Oct 24, 2008 at 6:40 AM, Andy O'Meara <an****@gmail.comwrote:
On Oct 24, 2:12 am, greg <g...@cosc.canterbury.ac.nzwrote:
>Andy wrote:
1) Independent interpreters (this is the easier one--and solved, in
principle anyway, by PEP 3121, by Martin v. Lwis

Something like that is necessary for independent interpreters,
but not sufficient. There are also all the built-in constants
and type objects to consider. Most of these are statically
allocated at the moment.

Agreed--I was just trying to speak generally. Or, put another way,
there's no hope for independent interpreters without the likes of PEP
3121. Also, as Martin pointed out, there's the issue of module
cleanup some guys here may underestimate (and I'm glad Martin pointed
out the importance of it). Without the module cleanup, every time a
dynamic library using python loads and unloads you've got leaks. This
issue is a real problem for us since our software is loaded and
unloaded many many times in a host app (iTunes, WMP, etc). I hadn't
raised it here yet (and I don't want to turn the discussion to this),
but lack of multiple load and unload support has been another painful
issue that we didn't expect to encounter when we went with python.

2) Barriers to "free threading". As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps

No, it's there because it's necessary for acceptable performance
when multiple threads are running in one interpreter. Independent
interpreters wouldn't mean the absence of a GIL; it would only
mean each interpreter having its own GIL.

I see what you're saying, but let's note that what you're talking
about at this point is an interpreter containing protection from the
client level violating (supposed) direction put forth in python
multithreaded guidelines. Glenn Linderman's post really gets at
what's at hand here. It's really important to consider that it's not
a given that python (or any framework) has to be designed against
hazardous use. Again, I refer you to the diagrams and guidelines in
the QuickTime API:

http://developer.apple.com/technotes/tn/tn2125.html

They tell you point-blank what you can and can't do, and it's that's
simple. Their engineers can then simply create the implementation
around those specs and not weigh any of the implementation down with
sync mechanisms. I'm in the camp that simplicity and convention wins
the day when it comes to an API. It's safe to say that software
engineers expect and assume that a thread that doesn't have contact
with other threads (except for explicit, controlled message/object
passing) will run unhindered and safely, so I raise an eyebrow at the
GIL (or any internal "helper" sync stuff) holding up an thread's
performance when the app is designed to not need lower-level global
locks.

Anyway, let's talk about solutions. My company looking to support
python dev community endeavor that allows the following:

- an app makes N worker threads (using the OS)

- each worker thread makes its own interpreter, pops scripts off a
work queue, and manages exporting (and then importing) result data to
other parts of the app. Generally, we're talking about CPU-bound work
here.

- each interpreter has the essentials (e.g. math support, string
support, re support, and so on -- I realize this is open-ended, but
work with me here).

Let's guesstimate about what kind of work we're talking about here and
if this is even in the realm of possibility. If we find that it *is*
possible, let's figure out what level of work we're talking about.
From there, I can get serious about writing up a PEP/spec, paid
support, and so on.

Regards,
Andy

--
http://mail.python.org/mailman/listinfo/python-list
Oct 24 '08 #27
As a side note to the performance question, we are executing python
code in an audio thread that is used in all of the top-end music
production environments. We have found the language to perform
extremely well when executed at control-rate frequency, meaning we
aren't doing DSP computations, just responding to less-frequent events
like user input and MIDI messages.

So we are sitting this music platform with unimaginable possibilities
in the music world (of which python does not play a role), but those
little CPU spikes caused by the GIL at low latencies won't let us have
it. AFAIK, there is no music scripting language out there that would
come close, and yet we are sooooo close! This is a big deal.

On Fri, Oct 24, 2008 at 7:42 AM, Andy O'Meara <an****@gmail.comwrote:
>
Glenn, great post and points!
>>
Andy seems to want an implementation of independent Python processes
implemented as threads within a single address space, that can be
coordinated by an outer application. This actually corresponds to the
model promulgated in the paper as being most likely to succeed.

Yeah, that's the idea--let the highest levels run and coordinate the
show.
>>
It does seem simpler and more efficient to simply "copy"
data from one memory location to another, rather than send it in a
message, especially if the data are large.

That's the rub... In our case, we're doing image and video
manipulation--stuff not good to be messaging from address space to
address space. The same argument holds for numerical processing with
large data sets. The workers handing back huge data sets via
messaging isn't very attractive.
>One thing Andy hasn't yet explained (or I missed) is why any of his
application is coded in a language other than Python.

Our software runs in real time (so performance is paramount),
interacts with other static libraries, depends on worker threads to
perform real-time image manipulation, and leverages Windows and Mac OS
API concepts and features. Python's performance hits have generally
been a huge challenge with our animators because they often have to go
back and massage their python code to improve execution performance.
So, in short, there are many reasons why we use python as a part
rather than a whole.

The other area of pain that I mentioned in one of my other posts is
that what we ship, above all, can't be flaky. The lack of module
cleanup (intended to be addressed by PEP 3121), using a duplicate copy
of the python dynamic lib, and namespace black magic to achieve
independent interpreters are all examples that have made using python
for us much more challenging and time-consuming then we ever
anticipated.

Again, if it turns out nothing can be done about our needs (which
appears to be more and more like the case), I think it's important for
everyone here to consider the points raised here in the last week.
Moreover, realize that the python dev community really stands to gain
from making python usable as a tool (rather than a monolith). This
fact alone has caused lua to *rapidly* rise in popularity with
software companies looking to embed a powerful, lightweight
interpreter in their software.

As a python language fan an enthusiast, don't let lua win! (I say
this endearingly of course--I have the utmost respect for both
communities and I only want to see CPython be an attractive pick when
a company is looking to embed a language that won't intrude upon their
app's design).
Andy
--
http://mail.python.org/mailman/listinfo/python-list
Oct 24 '08 #28
Stefan Behnel wrote:
Terry Reedy wrote:
>Everything in DLLs is compiled C extensions. I see about 15 for Windows
3.0.

Ah, weren't that wonderful times back in the days of Win3.0, when DLL-hell was
inhabited by only 15 libraries? *sigh*

... although ... wait, didn't Win3.0 have more than that already? Maybe you
meant Windows 1.0?

SCNR-ly,
Is that the equivalent of a smilely? or did you really not understand
what I wrote?

Oct 24 '08 #29

>
The Global Interpreter Lock is fundamentally designed to make the
interpreter easier to maintain and safer: Developers do not need to
worry about other code stepping on their namespace. This makes things
thread-safe, inasmuch as having multiple PThreads within the same
interpreter space modifying global state and variable at once is,
well, bad. A c-level module, on the other hand, can sidestep/release
the GIL at will, and go on it's merry way and process away.
....Unless part of the C module execution involves the need do CPU-
bound work on another thread through a different python interpreter,
right? (even if the interpreter is 100% independent, yikes). For
example, have a python C module designed to programmatically generate
images (and video frames) in RAM for immediate and subsequent use in
animation. Meanwhile, we'd like to have a pthread with its own
interpreter with an instance of this module and have it dequeue jobs
as they come in (in fact, there'd be one of these threads for each
excess core present on the machine). As far as I can tell, it seems
CPython's current state can't CPU bound parallelization in the same
address space (basically, it seems that we're talking about the
"embarrassingly parallel" scenario raised in that paper). Why does it
have to be in same address space? Convenience and simplicity--the
same reasons that most APIs let you hang yourself if the app does dumb
things with threads. Also, when the data sets that you need to send
to and from each process is large, using the same address space makes
more and more sense.

So, just to clarify - Andy, do you want one interpreter, $N threads
(e.g. PThreads) or the ability to fork multiple "heavyweight"
processes?
Sorry if I haven't been clear, but we're talking the app starting a
pthread, making a fresh/clean/independent interpreter, and then being
responsible for its safety at the highest level (with the payoff of
each of these threads executing without hinderance). No different
than if you used most APIs out there where step 1 is always to make
and init a context object and the final step is always to destroy/take-
down that context object.

I'm a lousy writer sometimes, but I feel bad if you took the time to
describe threads vs processes. The only reason I raised IPC with my
"messaging isn't very attractive" comment was to respond to Glenn
Linderman's points regarding tradeoffs of shared memory vs no.
Andy

Oct 24 '08 #30
On Fri, Oct 24, 2008 at 3:17 PM, Andy O'Meara <an****@gmail.comwrote:
I'm a lousy writer sometimes, but I feel bad if you took the time to
describe threads vs processes. The only reason I raised IPC with my
"messaging isn't very attractive" comment was to respond to Glenn
Linderman's points regarding tradeoffs of shared memory vs no.
I actually took the time to bring anyone listening in up to speed, and
to clarify so I could better understand your use case. Don't feel bad,
things in the thread are moving fast and I just wanted to clear it up.

Ideally, we all want to improve the language, and the interpreter.
However trying to push it towards a particular use case is dangerous
given the idea of "general use".

-jesse
Oct 24 '08 #31
On Oct 24, 1:02*pm, Glenn Linderman <v+pyt...@g.nevcal.comwrote:
On approximately 10/24/2008 8:42 AM, came the following characters from
the keyboard of Andy O'Meara:
Glenn, great post and points!

Thanks. I need to admit here that while I've got a fair bit of
professional programming experience, I'm quite new to Python -- I've not
learned its internals, nor even the full extent of its rich library. So
I have some questions that are partly about the goals of the
applications being discussed, partly about how Python is constructed,
and partly about how the library is constructed. I'm hoping to get a
better understanding of all of these; perhaps once a better
understanding is achieved, limitations will be understood, and maybe
solutions be achievable.

Let me define some speculative Python interpreters; I think the first is
today's Python:

PyA: Has a GIL. PyA threads can run within a process; but are
effectively serialized to the places where the GIL is obtained/released.
Needs the GIL because that solves lots of problems with non-reentrant
code (an example of non-reentrant code, is code that uses global (C
global, or C static) variables note that I'm not talking about Python
vars declared global... they are only module global). In this model,
non-reentrant code could include pieces of the interpreter, and/or
extension modules.

PyB: No GIL. PyB threads acquire/release a lock around each reference to
a global variable (like "with" feature). Requires massive recoding of
all code that contains global variables. Reduces performance
significantly by the increased cost of obtaining and releasing locks.

PyC: No locks. Instead, recoding is done to eliminate global variables
(interpreter requires a state structure to be passed in). Extension
modules that use globals are prohibited... this eliminates large
portions of the library, or requires massive recoding. PyC threads do
not share data between threads except by explicit interfaces.

PyD: (A hybrid of PyA & PyC). The interpreter is recoded to eliminate
global variables, and each interpreter instance is provided a state
structure. There is still a GIL, however, because globals are
potentially still used by some modules. Code is added to detect use of
global variables by a module, or some contract is written whereby a
module can be declared to be reentrant and global-free. PyA threads will
obtain the GIL as they would today. PyC threads would be available to be
created. PyC instances refuse to call non-reentrant modules, but also
need not obtain the GIL... PyC threads would have limited module support
initially, but over time, most modules can be migrated to be reentrant
and global-free, so they can be used by PyC instances. Most 3rd-party
libraries today are starting to care about reentrancy anyway, because of
the popularity of threads.
PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable. A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability. Most other
shareable objects are immutable. Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties. Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.

Our software runs in real time (so performance is paramount),
interacts with other static libraries, depends on worker threads to
perform real-time image manipulation, and leverages Windows and Mac OS
API concepts and features. *Python's performance hits have generally
been a huge challenge with our animators because they often have to go
back and massage their python code to improve execution performance.
So, in short, there are many reasons why we use python as a part
rather than a whole.
[...]
As a python language fan an enthusiast, don't let lua win! *(I say
this endearingly of course--I have the utmost respect for both
communities and I only want to see CPython be an attractive pick when
a company is looking to embed a language that won't intrude upon their
app's design).
I agree with the problem, and desire to make python fill all niches,
but let's just say I'm more ambitious with my solution. ;)
Oct 24 '08 #32

Another great post, Glenn!! Very well laid-out and posed!! Thanks for
taking the time to lay all that out.
>
Questions for Andy: is the type of work you want to do in independent
threads mostly pure Python? Or with libraries that you can control to
some extent? Are those libraries reentrant? Could they be made
reentrant? How much of the Python standard library would need to be
available in reentrant mode to provide useful functionality for those
threads? I think you want PyC
I think you've defined everything perfectly, and you're you're of
course correct about my love for for the PyC model. :^)

Like any software that's meant to be used without restrictions, our
code and frameworks always use a context object pattern so that
there's never and non-const global/shared data). I would go as far to
say that this is the case with more performance-oriented software than
you may think since it's usually a given for us to have to be parallel
friendly in as many ways as possible. Perhaps Patrick can back me up
there.

As to what modules are "essential"... As you point out, once
reentrant module implementations caught on in PyC or hybrid world, I
think we'd start to see real effort to whip them into compliance--
there's just so much to be gained imho. But to answer the question,
there's the obvious ones (operator, math, etc), string/buffer
processing (string, re), C bridge stuff (struct, array), and OS basics
(time, file system, etc). Nice-to-haves would be buffer and image
decompression (zlib, libpng, etc), crypto modules, and xml. As far as
I can imagine, I have to believe all of these modules already contain
little, if any, global data, so I have to believe they'd be super easy
to make "PyC happy". Patrick, what would you see you guys using?

That's the rub... *In our case, we're doing image and video
manipulation--stuff not good to be messaging from address space to
address space. *The same argument holds for numerical processing with
large data sets. *The workers handing back huge data sets via
messaging isn't very attractive.

In the module multiprocessing environment could you not use shared
memory, then, for the large shared data items?
As I understand things, the multiprocessing puts stuff in a child
process (i.e. a separate address space), so the only to get stuff to/
from it is via IPC, which can include a shared/mapped memory region.
Unfortunately, a shared address region doesn't work when you have
large and opaque objects (e.g. a rendered CoreVideo movie in the
QuickTime API or 300 megs of audio data that just went through a
DSP). Then you've got the hit of serialization if you're got
intricate data structures (that would normally would need to be
serialized, such as a hashtable or something). Also, if I may speak
for commercial developers out there who are just looking to get the
job done without new code, it's usually always preferable to just a
single high level sync object (for when the job is complete) than to
start a child processes and use IPC. The former is just WAY less
code, plain and simple.
Andy
Oct 24 '08 #33
On Fri, Oct 24, 2008 at 4:51 PM, Andy O'Meara <an****@gmail.comwrote:
>In the module multiprocessing environment could you not use shared
memory, then, for the large shared data items?

As I understand things, the multiprocessing puts stuff in a child
process (i.e. a separate address space), so the only to get stuff to/
from it is via IPC, which can include a shared/mapped memory region.
Unfortunately, a shared address region doesn't work when you have
large and opaque objects (e.g. a rendered CoreVideo movie in the
QuickTime API or 300 megs of audio data that just went through a
DSP). Then you've got the hit of serialization if you're got
intricate data structures (that would normally would need to be
serialized, such as a hashtable or something). Also, if I may speak
for commercial developers out there who are just looking to get the
job done without new code, it's usually always preferable to just a
single high level sync object (for when the job is complete) than to
start a child processes and use IPC. The former is just WAY less
code, plain and simple.
Are you familiar with the API at all? Multiprocessing was designed to
mimic threading in about every way possible, the only restriction on
shared data is that it must be serializable, but event then you can
override or customize the behavior.

Also, inter process communication is done via pipes. It can also be
done with messages if you want to tweak the manager(s).

-jesse
Oct 24 '08 #34
On Oct 24, 2:59*pm, Glenn Linderman <gl...@nevcal.comwrote:
On approximately 10/24/2008 1:09 PM, came the following characters from
the keyboard of Rhamphoryncus:
PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable. *A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability. *Most other
shareable objects are immutable. *Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties. *Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.

Hmm. *So I think your PyE is an instance is an attempt to be more
explicit about what I said above in PyC: PyC threads do not share data
between threads except by explicit interfaces. *I consider your
definitions of shared data types somewhat orthogonal to the types of
threads, in that both PyA and PyC threads could use these new shared
data items.
Unlike PyC, there's a *lot* shared by default (classes, modules,
function), but it requires only minimal recoding. It's as close to
"have your cake and eat it too" as you're gonna get.

I think/hope that you meant that "many types are now only allowed to be
non-shareable"? *At least, I think that should be the default; they
should be within the context of a single, independent interpreter
instance, so other interpreters don't even know they exist, much less
how to share them. *If so, then I understand most of the rest of your
paragraph, and it could be a way of providing shared objects, perhaps.
There aren't multiple interpreters under my model. You only need
one. Instead, you create a monitor, and run a thread on it. A list
is not shareable, so it can only be used within the monitor it's
created within, but the list type object is shareable.

I've no interest in *requiring* a C/C++ extension to communicate
between isolated interpreters. Without that they're really no better
than processes.
Oct 24 '08 #35
On Oct 24, 3:02*pm, Glenn Linderman <v+pyt...@g.nevcal.comwrote:
On approximately 10/23/2008 2:24 PM, came the following characters from the
keyboard of Rhamphoryncus:
>>
On Oct 23, 11:30 am, Glenn Linderman <v+pyt...@g.nevcal.comwrote:
>>>
On approximately 10/23/2008 12:24 AM, came the following characters from
the keyboard of Christian Heimes

Andy wrote:
I'm very - not absolute, but very - sure that Guido and the initial
designers of Python would have added the GIL anyway. The GIL makes
Python faster on single core machines and more stable on multi core
machines.

Actually, the GIL doesn't make Python faster; it is a design decision that
reduces the overhead of lock acquisition, while still allowing use of global
variables.

Using finer-grained locks has higher run-time cost; eliminating the use of
global variables has a higher programmer-time cost, but would actually run
faster and more concurrently than using a GIL. Especially on a
multi-core/multi-CPU machine.
Those "globals" include classes, modules, and functions. You can't
have *any* objects shared. Your interpreters are entirely isolated,
much like processes (and we all start wondering why you don't use
processes in the first place.)

Or use safethread. It imposes safe semantics on shared objects, so
you can keep your global classes, modules, and functions. Still need
garbage collection though, and on CPython that means refcounting and
the GIL.

>Another peeve I have is his characterization of the observer pattern.
The generalized form of the problem exists in both single-threaded
sequential programs, in the form of unexpected reentrancy, and message
passing, with infinite CPU usage or infinite number of pending
messages.

So how do you get reentrancy is a single-threaded sequential program? I
think only via recursion? Which isn't a serious issue for the observer
pattern. If you add interrupts, then your program is no longer sequential..
Sorry, I meant recursion. Why isn't it a serious issue for
single-threaded programs? Just the fact that it's much easier to
handle when it does happen?

>Try looking at it on another level: when your CPU wants to read from a
bit of memory controlled by another CPU it sends them a message
requesting they get it for us. They send back a message containing
that memory. They also note we have it, in case they want to modify
it later. We also note where we got it, in case we want to modify it
(and not wait for them to do modifications for us).

I understand that level... one of my degrees is in EE, and I started college
wanting to design computers (at about the time the first microprocessor chip
came along, and they, of course, have now taken over). But I was side-lined
by the malleability of software, and have mostly practiced software during
my career.

Anyway, that is the level that Herb Sutter was describing in the Dr Dobbs
articles I mentioned. And the overhead of doing that at the level of a cache
line is high, if there is lots of contention for particular memory locations
between threads running on different cores/CPUs. So to achieve concurrency,
you must not only limit explicit software locks, but must also avoid memory
layouts where data needed by different cores/CPUs are in the same cache
line.
I suspect they'll end up redesigning the caching to use a size and
alignment of 64 bits (or smaller). Same cache line size, but with
masking.

You still need to minimize contention of course, but that should at
least be more predictable. Having two unrelated mallocs contend could
suck.

>Message passing vs shared memory isn't really a yes/no question. It's
about ratios, usage patterns, and tradeoffs. *All* programs will
share data, but in what way? If it's just the code itself you can
move the cache validation into software and simplify the CPU, making
it faster. If the shared data is a lot more than that, and you use it
to coordinate accesses, then it'll be faster to have it in hardware.

I agree there are tradeoffs... unfortunately, the hardware architectures
vary, and the languages don't generally understand the hardware. So then it
becomes an OS API, which adds the overhead of an OS API call to the cost of
the synchronization... It could instead be (and in clever applications is) a
non-portable assembly level function that wraps on OS locking or waiting
API.
In practice I highly doubt we'll see anything that doesn't extend
traditional threading (posix threads, whatever MS has, etc).

Nonetheless, while putting the shared data accesses in hardware might be
more efficient per unit operation, there are still tradeoffs: A software
solution can group multiple accesses under a single lock acquisition; the
hardware probably doesn't have enough smarts to do that. So it may well
require many more hardware unit operations for the same overall concurrently
executed function, and the resulting performance may not be any better.
Speculative ll/sc? ;)

Sidestepping the whole issue, by minimizing shared data in the application
design, avoiding not only software lock calls, and hardware cache
contention, is going to provide the best performance... it isn't the things
you do efficiently that make software fast it is the things you don'tdo
at all.
Minimizing contention, certainly. Minimizing the shared data itself
is iffier though.
Oct 24 '08 #36
On Fri, Oct 24, 2008 at 4:48 PM, Glenn Linderman <v+******@g.nevcal.comwrote:
On approximately 10/24/2008 2:15 PM, came the following characters from the
keyboard of Rhamphoryncus:
>>
On Oct 24, 2:59 pm, Glenn Linderman <gl...@nevcal.comwrote:
>>>
On approximately 10/24/2008 1:09 PM, came the following characters from
the keyboard of Rhamphoryncus:
PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable. A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability. Most other
shareable objects are immutable. Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties. Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.
Hmm. So I think your PyE is an instance is an attempt to be more
explicit about what I said above in PyC: PyC threads do not share data
between threads except by explicit interfaces. I consider your
definitions of shared data types somewhat orthogonal to the types of
threads, in that both PyA and PyC threads could use these new shared
data items.

Unlike PyC, there's a *lot* shared by default (classes, modules,
function), but it requires only minimal recoding. It's as close to
"have your cake and eat it too" as you're gonna get.

Yes, but I like my cake frosted with performance; Guido's non-acceptance of
granular locks in the blog entry someone referenced was due to the slowdown
acquired with granular locking and shared objects. Your PyE model, with
highly granular sharing, will likely suffer the same fate.
No, my approach includes scalable performance. Typical paths will
involve *no* contention (ie no locking). classes and modules use
shareddict, which is based on a read-write lock built into the
interpreter, so it's uncontended for read-only usage patterns. Pretty
much everything else is immutable.

Of course that doesn't include the cost of garbage collection.
CPython's refcounting can't scale.

The independent threads model, with only slight locking for a few explicitly
shared objects, has a much better chance of getting better performance
overall. With one thread running, it would be the same as today; with
multiple threads, it should scale at the same rate as the system... minus
any locking done at the higher level.
So use processes with a little IPC for these expensive-yet-"shared"
objects. multiprocessing does it already.

>>I think/hope that you meant that "many types are now only allowed to be
non-shareable"? At least, I think that should be the default; they
should be within the context of a single, independent interpreter
instance, so other interpreters don't even know they exist, much less
how to share them. If so, then I understand most of the rest of your
paragraph, and it could be a way of providing shared objects, perhaps.

There aren't multiple interpreters under my model. You only need
one. Instead, you create a monitor, and run a thread on it. A list
is not shareable, so it can only be used within the monitor it's
created within, but the list type object is shareable.

The python interpreter code should be sharable, having been written in C,
and being/becoming reentrant. So in that sense, there is only one
interpreter. Similarly, any other reentrant C extensions would be that way.
On the other hand, each thread of execution requires its own interpreter
context, so that would have to be independent for the threads to be
independent. It is the combination of code+context that I call an
interpreter, and there would be one per thread for PyC threads. Bytecode
for loaded modules could potentially be shared, if it is also immutable.
However, that could be in my mental "phase 2", as it would require an extra
level of complexity in the interpreter as it creates shared bytecode...
there would be a memory savings from avoiding multiple copies of shared
bytecode, likely, and maybe also a compilation performance savings. So it
sounds like a win, but it is a win that can deferred for initial simplicity,
to prove the concept is or is not workable.

A monitor allows a single thread to run at a time; that is the same
situation as the present GIL. I guess I don't fully understand your model.
To use your terminology, each monitor is a context. Each thread
operates in a different monitor. As you say, most C functions are
already thread-safe (reentrant). All I need to do is avoid letting
multiple threads modify a single mutable object (such as a list) at a
time, which I do by containing it within a single monitor (context).
--
Adam Olsen, aka Rhamphoryncus
Oct 25 '08 #37
On Fri, Oct 24, 2008 at 5:38 PM, Glenn Linderman <v+******@g.nevcal.comwrote:
On approximately 10/24/2008 2:16 PM, came the following characters from the
keyboard of Rhamphoryncus:
>>
On Oct 24, 3:02 pm, Glenn Linderman <v+pyt...@g.nevcal.comwrote:
>>>
On approximately 10/23/2008 2:24 PM, came the following characters from
the
keyboard of Rhamphoryncus:
On Oct 23, 11:30 am, Glenn Linderman <v+pyt...@g.nevcal.comwrote:
>
On approximately 10/23/2008 12:24 AM, came the following characters
from
the keyboard of Christian Heimes
>
>>
>Andy wrote:
>I'm very - not absolute, but very - sure that Guido and the initial
>designers of Python would have added the GIL anyway. The GIL makes
>Python faster on single core machines and more stable on multi core
>machines.
>>

Actually, the GIL doesn't make Python faster; it is a design decision
that
reduces the overhead of lock acquisition, while still allowing use of
global
variables.

Using finer-grained locks has higher run-time cost; eliminating the use
of
global variables has a higher programmer-time cost, but would actually
run
faster and more concurrently than using a GIL. Especially on a
multi-core/multi-CPU machine.

Those "globals" include classes, modules, and functions. You can't
have *any* objects shared. Your interpreters are entirely isolated,
much like processes (and we all start wondering why you don't use
processes in the first place.)

Indeed; isolated, independent interpreters are one of the goals. It is,
indeed, much like processes, but in a single address space. It allows the
master process (Python or C for the embedded case) to be coded using memory
references and copies and pointer swaps instead of using semaphores, and
potentially multi-megabyte message transfers.

It is not clear to me that with the use of shared memory between processes,
that the application couldn't use processes, and achieve many of the same
goals. On the other hand, the code to create and manipulate processes and
shared memory blocks is harder to write and has more overhead than the code
to create and manipulate threads, which can, when told, access any memory
block in the process. This allows the shared memory to be resized more
easily, or more blocks of shared memory created more easily. On the other
hand, the creation of shared memory blocks shouldn't be a high-use operation
in a program that has sufficient number crunching to do to be able to
consume multiple cores/CPUs.
>Or use safethread. It imposes safe semantics on shared objects, so
you can keep your global classes, modules, and functions. Still need
garbage collection though, and on CPython that means refcounting and
the GIL.

Sounds like safethread has 35-40% overhead. Sounds like too much, to me.
The specific implementation of safethread, which attempts to remove
the GIL from CPython, has significant overhead and had very limited
success at being scalable.

The monitor design proposed by safethread has no inherent overhead and
is completely scalable.
--
Adam Olsen, aka Rhamphoryncus
Oct 25 '08 #38
>A c-level module, on the other hand, can sidestep/release
>the GIL at will, and go on it's merry way and process away.

...Unless part of the C module execution involves the need do CPU-
bound work on another thread through a different python interpreter,
right?
Wrong.
(even if the interpreter is 100% independent, yikes).
Again, wrong.
For
example, have a python C module designed to programmatically generate
images (and video frames) in RAM for immediate and subsequent use in
animation. Meanwhile, we'd like to have a pthread with its own
interpreter with an instance of this module and have it dequeue jobs
as they come in (in fact, there'd be one of these threads for each
excess core present on the machine).
I don't understand how this example involves multiple threads. You
mention a single thread (running the module), and you mention designing
a module. Where is the second thread?

Let's assume there is another thread producing jobs, and then
a thread that generates the images. The structure would be this

while 1:
job = queue.get()
processing_module.process(job)

and in process:

PyArg_ParseTuple(args, "s", job_data);
result = PyString_New(bufsize);
buf = PyString_AsString(result);
Py_BEGIN_ALLOW_THREADS
compute_frame(job_data, buf);
Py_END_ALLOW_THREADS
return PyString_FromString(buf);

All these compute_frames could happily run in parallel.
As far as I can tell, it seems
CPython's current state can't CPU bound parallelization in the same
address space.
That's not true.

Regards,
Martin
Oct 25 '08 #39
It seems to me that the very simplest move would be to remove global
static data so the app could provide all thread-related data, which
Andy suggests through references to the QuickTime API. This would
suggest compiling python without thread support so as to leave it up
to the application.
I'm not sure whether you realize that this is not simple at all.
Consider this fragment

if (string == Py_None || index >= state->lastmark ||
!state->mark[index] || !state->mark[index+1]) {
if (empty)
/* want empty string */
i = j = 0;
else {
Py_INCREF(Py_None);
return Py_None;

Py_None here is a global variable. How would you replace it?
It's used in thousands of places.

For another example, consider

PyErr_SetString(PyExc_ValueError,
"Empty module name");
or

dp = PyObject_New(dbmobject, &Dbmtype);

There are tons of different variables denoting exceptions and
other types which all somehow need to be rewritten (likely with
undesirable effects on readability).

So I don't think that this is a simple solution. It's the right
one, but it will take five or ten years to implement.

Regards,
Martin
Oct 25 '08 #40
Glenn Linderman wrote:
For example, Python presently has a rather stupid algorithm for string
concatenation.
Python the language has syntax and semantics. Python implementations
have algorithms that fulfill the defined semantics.
It allocates only the exactly necessary space for the
concatenated string. This is a brilliant move, when you realize that
strings are immutable, and once allocated can never change, but the
operation

for line in mylistofstrings:
string = string + line

is basically O(N-squared) as a result. The better algorithm would
double the size of memory allocated for string each time there is not
enough room to add the next line, and that reduces the cost of the
algorithm to O(N).
If there is more than one reference to a guaranteed immutable object,
such as a string, the 'stupid' algorithm seem necessary to me. In-place
modification of a shared immutable would violate semantics.

However, if you do

string = ''
for line in strings:
string =+ line

so that there is only one reference and you tell the interpreter that
you don't mind the old value being updated, then I believe in 2.6, if
not before, CPython does overallocation and in-place extension. (I am
not sure about s=s+l.) But this is just ref-counted CPython.

Terry Jan Reedy

Oct 25 '08 #41
Andy O'Meara wrote:
I would definitely agree if there was a context (i.e. environment)
object passed around then perhaps we'd have the best of all worlds.
Moreover, I think this is probably the *only* way that
totally independent interpreters could be realized.

Converting the whole C API to use this strategy would be
a very big project. Also, on the face of it, it seems like
it would render all existing C extension code obsolete,
although it might be possible to do something clever with
macros to create a compatibility layer.

Another thing to consider is that passing all these extra
pointers around everywhere is bound to have some effect
on performance. The idea mightn't go down too well if it
slows things significantly in the case where you're only
using one interpreter.

--
Greg
Oct 25 '08 #42
Andy O'Meara wrote:
- each worker thread makes its own interpreter, pops scripts off a
work queue, and manages exporting (and then importing) result data to
other parts of the app.
I hope you realize that starting up one of these interpreters
is going to be fairly expensive. It will have to create its
own versions of all the builtin constants and type objects,
and import its own copy of all the modules it uses.

One wonders if it wouldn't be cheaper just to fork the
process. Shared memory can be used to transfer large lumps
of data if needed.

--
Greg
Oct 25 '08 #43
Glenn Linderman wrote:
If Py_None corresponds to None in Python syntax ... then
it is a fixed constant and could be left global, probably.
No, it couldn't, because it's a reference-counted object
like any other Python object, and therefore needs to be
protected against simultaneous refcount manipulation by
different threads. So each interpreter would need its own
instance of Py_None.

The same goes for all the other built-in constants and
type objects -- there are dozens of these.
The cost is one more push on every function call,
Which sounds like it could be a rather high cost! If
(just a wild guess) each function has an average of 2
parameters, then this is increasing the amount of
argument pushing going on by 50%...
On many platforms, there is the concept of TLS, or thread-local storage.
That's another possibility, although doing it that
way would require you to have a separate thread for
each interpreter, which you mightn't always want.

--
Greg
Oct 25 '08 #44
Andy O'Meara wrote:
In our case, we're doing image and video
manipulation--stuff not good to be messaging from address space to
address space.
Have you considered using shared memory?

Using mmap or equivalent, you can arrange for a block of
memory to be shared between processes. Then you can dump
the big lump of data to be transferred in there, and send
a short message through a pipe to the other process to
let it know it's there.

--
Greg
Oct 25 '08 #45
Rhamphoryncus wrote:
A list
is not shareable, so it can only be used within the monitor it's
created within, but the list type object is shareable.
Type objects contain dicts, which allow arbitrary values
to be stored in them. What happens if one thread puts
a private object in there? It becomes visible to other
threads using the same type object. If it's not safe
for sharing, bad things happen.

Python's data model is not conducive to making a clear
distinction between "private" and "shared" objects,
except at the level of an entire interpreter.

--
Greg
Oct 25 '08 #46
If Py_None corresponds to None in Python syntax (sorry I'm not familiar
with Python internals yet; glad you are commenting, since you are), then
it is a fixed constant and could be left global, probably.
If None remains global, then type(None) also remains global, and
type(None),__bases__[0]. Then type(None).__bases__[0].__subclasses__()
will yield "interesting" results. This is essentially the status quo.
But if we
want a separate None for each interpreter, or if we just use Py_None as
an example global variable to use to answer the question then here goes
There are a number of problems with that approach. The biggest one is
that it is theoretical. Of course I'm aware of thread-local variables,
and the abstract possibility of collecting all global variables in
a single data structure (in fact, there is already an interpreter
structure and per-interpreter state in Python). I wasn't claiming that
it was impossible to solve that problem - just that it is not simple.
If you want to find out what all the problems are, please try
implementing it for real.

Regards,
Martin
Oct 25 '08 #47
Hi Andy,
Andy wrote:
However, we require true thread/interpreter
independence so python 2 has been frustrating at time, to say the
least. *Please don't start with "but really, python supports multiple
interpreters" because I've been there many many times with people.
And, yes, I'm aware of the multiprocessing module added in 2.6, but
that stuff isn't lightweight and isn't suitable at all for many
environments (including ours).
This is a very conflicting set of statements and whilst you appear to be
extremely clear on what you want here, and why multiprocessing, and
associated techniques are not appropriate, this does sound very
conflicting. I'm guessing I'm not the only person who finds this a
little odd.

Based on the size of the thread, having read it all, I'm guessing also
that you're not going to have an immediate solution but a work around.
However, also based on reading it, I think it's a usecase that would be
generally useful in embedding python.

So, I'll give it a stab as to what I think you're after.

The scenario as I understand it is this:
* You have an application written in C,C++ or similar.
* You've been providing users the ability to script it or customise it
in some fashion using scripts.

Based on the conversation:
* This worked well, and you really liked the results, but...
* You only had one interpreter embedded in the system
* You were allowing users to use multiple scripts

Suddenly you go from: Single script, single memory space.
To multiple scripts, unconstrained shared shared memory space.

That then causes pain for you and your users. So as a result, you decided to
look for this scenario:
* A mechanism that allows each script to think it's the only script
running on the python interpreter.
* But to still have only one embedded instance of the interpreter.
* With the primary motivation to eliminate the unconstrained shared
memory causing breakage to your software.

So, whilst the multiprocessing module gives you this:
* With the primary motivation to eliminate the unconstrained shared
memory causing breakage to your software.

It's (for whatever reason) too heavyweight for you, due to the multiprocess
usage. At a guess the reason for this is because you allow the user to run
lots of these little scripts.

Essentially what this means is that you want "green processes".

One workaround of achieving that may be to find a way to force threads in
python to ONLY be allowed access to (and only update) thread local values,
rather than default to shared values.

The reason I say that, is because the closest you get to green processes in
python at the moment is /inside/ a python generator. It's nowhere near the
level you want, but it's what made me think of the idea of green processes.

Specifically if you have the canonical example of a python generator:

def fib():
a,b = 1,1
while 1:
a,b = b, a+b
yield 1

Then no matter how many times I run that, the values are local, and can't
impact each other. Now clearly this isn't what you want, but on some level
it's *similar*.

You want to be able to do:
run(this_script)

and then when (this_script) is running only use a local environment.

Now, if you could change the threading API, such that there was a means of
forcing all value lookups to look in thread local store before looking
outside the thread local store [1], then this would give you a much greater
level of safety.

[1] I don't know if there is or isn't I've not been sufficiently interested
to look...

I suspect that this would also be a very nice easy win for many
multi-threaded applications as well, reducing accidental data sharing.

Indeed, reversing things such that rather than doing this:
myLocal = threading.local()
myLocal.X = 5

Allowing a thread to force the default to be the other way round:
systemGlobals = threading.globals()
systemGlobals = 5

Would make a big difference. Furthermore, it would also mean that the
following:
import MyModule
from MyOtherModule import whizzy thing

I don't know if such a change would be sufficient to stop the python
interpreter going bang for extension modules though :-)

I suspect also that this change, whilst potentially fraught with
difficulties, would be incredibly useful in python implementations
that are GIL-free (such as Jython or IronPython)

Now, this for me is entirely theoretical because I don't know much about
python's threading implementation (because I've never needed to), but it
does seem to me to be the easier win than looking for truly independent
interpreters...

It would also be more generally useful, since it would make accidental
sharing of data (which is where threads really hurt people most) much
harder.

Since it was raised in the thread, I'd like to say "use Kamaelia", but your
usecase is slightly different as I understand it. You want to take existing
stuff that won't be written in any particular way, to encourage it to be
safely reusable in a shared environment. We do do that to an extent, but I'm
guessing not quite as unconstrained as you. (We specifically require usage
of things in a lightly constrained manner)

I suspect though that this hypothetical ability to switch a thread to search
thread locals (or only have thread locals) first would itself be incredibly
useful as time goes on.

Kamaelia implements the kind of model that this paper referenced in the
thread advocates:
http://www.eecs.berkeley.edu/Pubs/Te...ECS-2006-1.pdf

As you'll see from this recent Pycon UK presentation:
http://tinyurl.com/KamaeliaPyconUK

It goes a stage further though by actively providing metaphors based around
components built using inboxes/outboxes designed *specifically* to encourage
safe concurrency. (heritage wise, kamaelia owes more to occam & CSP than
anything else)

After all we've found times when concurrency using generators is good
which is most of the time - it's probably the most fundamental unit of
concurrency you can get, followed by true coroutines (greenlets). Next
up is threads (you can put generators into threads, but not vice versa).
Next up is processes (you can put threads in processes, but not vice
versa).

Finishing on a random note:

The interesting thing from my perspective is you essentially want something
half way between threads and processes, which I called green processes for
want of a decent phrase. Now that's akin to sandboxing, but I suspect leaky
sandboxing might be sufficient for you. (ie a sandbox where you have to try
hard to break out the box as oppose to it being trivial) I'd be pretty
certain that something like green processes, or "thread local only" would
be useful in the future.

After all, that along with decent sandboxing would be the sort of thing
necessary to allow python to be embedded in a browser. (If flash used
multiple processes, it'd kill most people's systems after all, and if they
don't have something like green processes, flash would make web pages even
worse...)

Indeed, thread local only and globals accessed via STM [1] would be
incredibly handy. (I say that because generator globals and globals accessed
via a CAT (which is kamaelia specific thing, but similar conceptually),
works extremely well)

[1] even something as lightweight as http://www.kamaelia.org/STM

If a "search thread local" approach or "thread local only" approach
sounds reasonable, then it may be a "leaky sandbox" approach is perhaps
worth investigating. After all, a leaky sandbox may be doable.

Tuppence-worthy-ly-yours,.
Michael.
--
http://www.kamaelia.org/GetKamaelia

Oct 25 '08 #48
Andy O'Meara wrote:
Yeah, that's the idea--let the highest levels run and coordinate the
show.
Yes, this works really well in python and it's lots of fun. We've found so
far you need at minimum the following parts to a co-ordination little
language:

Pipeline
Graphline
Carousel
Seq
OneShot
PureTransformer
TPipe
Filter
Backplane
PublishTo
SubscribeTo

The interesting thing to me about this is in most systems these would be
patterns of behaviour in activities, whereas in python/kamaelia these are
concrete things you can drop things into. As you'd expect this all becomes
highly declarative.

In practice the world is slightly messier than a theoretical document would
like to suggest, primarily because if you consider things like pygame,
sometimes you have only have a resource instantiated once in a single
process. So you do need a mechanism for advertising services inside a
process and looking those up. (The Backplane idea though helps with
wrapping those up a lot I admit, for certain sorts of service :)

And sometimes you do need to just share data, and when you do that's when
STM is useful.

But concurrent python systems are fun to build :-)
Michael.
--
http://www.kamaelia.org/GetKamaelia

Oct 25 '08 #49
Glenn Linderman wrote:
In the module multiprocessing environment could you not use shared
memory, then, for the large shared data items?
If the poshmodule had a bit of TLC, it would be extremely useful for this,
since it does (surprisingly) still work with python 2.5, but does need a
bit of TLC to make it usable.

http://poshmodule.sourceforge.net/
Michael
--
http://www.kamaelia.org/GetKamaelia
Oct 25 '08 #50

This discussion thread is closed

Replies have been disabled for this discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.