threading support in python

Hi all,

Is there any PEP to introduce true threading features into python's
next version as in java? i mean without having GIL.
when compared to other languages, python is fun to code but i feel its
is lacking behind in threading

regards,
KM

Sep 4 '06 #1

Subscribe Reply

2304

bayerj

Hi,

GIL won't go. You might want to read
http://blog.ianbicking.org/gil-of-doom.html .

Regards,
-Justin

Sep 4 '06 #2

Hi all,
Are there any alternate ways of attaining true threading in python ?
if GIL doesnt go then does it mean that python is useless for
computation intensive scientific applications which are in need of
parallelization in threading context ?

regards,
KM
---------------------------------------------------------------------------
On 4 Sep 2006 07:58:00 -0700, bayerj <ba****@in.tum.dewrote:

Hi,

GIL won't go. You might want to read
http://blog.ianbicking.org/gil-of-doom.html .

Regards,
-Justin

--
http://mail.python.org/mailman/listinfo/python-list

Sep 4 '06 #3

bayerj

Hi,

You might want to split your calculation onto different
worker-processes.

Then you can use POSH [1] to share data and objects.
You might even want to go a step further and share the data via
Sockets/XML-RPC or something like that. That makes it easy to throw
aditional boxes at a specific calculation, because it can be set up in
about no time.
You can even use Twisted Spread [2] and its perspective broker to do
this on a higher level.

If that's not what you want, you are left with Java I guess.

Regards,
-Justin

[1] http://poshmodule.sourceforge.net/
[2] http://twistedmatrix.com/projects/co.../howto/pb.html

Sep 4 '06 #4

Richard Brodie

"km" <sr*************@gmail.comwrote in message
news:ma****************************************@py thon.org...

if GIL doesnt go then does it mean that python is useless for
computation intensive scientific applications which are in need of
parallelization in threading context ?

No.

Sep 4 '06 #5

Sybren Stuvel

km enlightened us with:

Is there any PEP to introduce true threading features into python's
next version as in java? i mean without having GIL.

What is GIL? Except for the Dutch word for SCREAM that is...

when compared to other languages, python is fun to code but i feel
its is lacking behind in threading

What's wrong with the current threading? AFAIK it's directly linked to
the threading of the underlying platform.

Sybren
--
Sybren StÃ¼vel
StÃ¼vel IT - http://www.stuvel.eu/

Sep 4 '06 #6

Diez B. Roggisch

Sybren Stuvel wrote:

km enlightened us with:
>Is there any PEP to introduce true threading features into python's
next version as in java? i mean without having GIL.

What is GIL? Except for the Dutch word for SCREAM that is...

the global interpreter lock, that prevents python from concurrently
modifying internal structures causing segfaults.

>when compared to other languages, python is fun to code but i feel
its is lacking behind in threading

What's wrong with the current threading? AFAIK it's directly linked to
the threading of the underlying platform.

There exist rare cases (see the link from bayerj) where the GIL is an
annoyance, and with the dawn of MP-cores all over the place it might be
considered a good idea removing it - maybe. But I doubt that is something
to be considered for py2.x

Diez

Sep 4 '06 #7

Sandra-24

The trouble is there are some environments where you are forced to use
threads. Apache and mod_python are an example. You can't make use of
mutliple CPUs unless you're on *nux and run with multiple processes AND
you're application doesn't store large amounts of data in memory (which
mine does) so you'd have to physically double the computer's memory for
a daul-core, or quadruple it for a quadcore. And forget about running a
windows server, apache will not even run with multiple processes.

In years to come this will be more of an issue because single core CPUs
will be harder to come by, you'll be throwing away half of every CPU
you buy.

-Sandra

Sep 4 '06 #8

Daniel Dittmar

km wrote:

Is there any PEP to introduce true threading features into python's
next version as in java? i mean without having GIL.
when compared to other languages, python is fun to code but i feel its
is lacking behind in threading

Some of the technical problems:

- probably breaks compatibility of extensions at the source level in a
big way, although this might be handled by SWIG, boost and other code
generators
- reference counting will have to be synchronized, which means that
Python will become slower
- removing reference counting and relying on garbage collection alone
will break many Python applications (because they rely on files being
closed at end of scope etc.)

Daniel

Sep 4 '06 #9

Rob Williscroft

Daniel Dittmar wrote in news:ed**********@news.sap-ag.de in
comp.lang.python:

- removing reference counting and relying on garbage collection alone
will break many Python applications (because they rely on files being
closed at end of scope etc.)

They are already broken on at least 2 python implementations, so
why worry about another one.

Rob.
--
http://www.victim-prime.dsl.pipex.com/

Sep 4 '06 #10

sjdevnull

Sandra-24 wrote:

The trouble is there are some environments where you are forced to use
threads. Apache and mod_python are an example. You can't make use of
mutliple CPUs unless you're on *nux and run with multiple processes AND
you're application doesn't store large amounts of data in memory (which
mine does) so you'd have to physically double the computer's memory for
a daul-core, or quadruple it for a quadcore.

You seem to be confused about the nature of multiple-process
programming.

If you're on a modern Unix/Linux platform and you have static read-only
data, you can just read it in before forking and it'll be shared
between the processes..

If it's read/write data or you're not on a Unix platform, you can use
shared memory to shared it between many processes.

Threads are way overused in modern multiexecution programming. The
decision on whether to use processes or threads should come down to
whether you want to share everything, or whether you have specific
pieces of data you want to share. With processes + shm, you can gain
the security of protected memory for the majority of your code + data,
only sacrificing it where you need to share the data.

The entire Windows programming world tends to be so biased toward
multithreading that they often don't even acknowledge the existence of
generally superior alternatives. I think that's in large part because
historically on Windows 3.1/95/98 there was no good way to create
processes without running a new binary, and so a culture of threading
grew up. Even today many Windows programmers are unfamiliar with using
CreateProcessEx with SectionHandle=NULL for efficient copy-on-write
process creation.

And forget about running a
windows server, apache will not even run with multiple processes.

It used to run on windows with multiple processes. If it really won't
now, use an older version or contribute a fix.

Now, the GIL is independent of this; if you really need threading in
your situation (you share almost everything and have hugely complex
data structures that are difficult to maintain in shm) then you're
still going to run into GIL serialization. If you're doing a lot of
work in native code extensions this may not actually be a big
performance hit, if not it can be pretty bad.

Sep 4 '06 #11

Paul Rubin

"sj*******@yahoo.com" <sj*******@yahoo.comwrites:

If it's read/write data or you're not on a Unix platform, you can use
shared memory to shared it between many processes.

Threads are way overused in modern multiexecution programming. The
decision on whether to use processes or threads should come down to
whether you want to share everything, or whether you have specific
pieces of data you want to share.

Shared memory means there's a byte vector (the shared memory region)
accessible to multiple processes. The processes don't use the same
machine addresses to reference the vector. Any data structures
(e.g. those containing pointers) shared between the processes have to
be marshalled in and out of the byte vector instead of being accessed
normally. Any live objects such as open sockets have to be shared
some other way. It's not a matter of sharing "everything"; shared
memory is a pain in the neck even to share a single object. These
things really can be easier with threads.

Sep 4 '06 #12

Daniel Dittmar

Rob Williscroft wrote:

Daniel Dittmar wrote in news:ed**********@news.sap-ag.de in
comp.lang.python:

>>- removing reference counting and relying on garbage collection alone
will break many Python applications (because they rely on files being
closed at end of scope etc.)

They are already broken on at least 2 python implementations, so
why worry about another one.

I guess few applications or libraries are being ported from CPython to
Jython or IronPython as each is targeting a different standard library,
so this isn't that much of a problem yet.

Daniel

Sep 4 '06 #13

Sandra-24

You seem to be confused about the nature of multiple-process

programming.

If you're on a modern Unix/Linux platform and you have static read-only
data, you can just read it in before forking and it'll be shared
between the processes..

Not familiar with *nix programming, but I'll take your word on it.

If it's read/write data or you're not on a Unix platform, you can use
shared memory to shared it between many processes.

I know how shared memory works, it's the last resort in my opinion.

Threads are way overused in modern multiexecution programming. The

<snip>

It used to run on windows with multiple processes. If it really won't
now, use an older version or contribute a fix.

First of all I'm not in control of spawning processes or threads.
Apache does that, and apache has no MPM for windows that uses more than
1 process. Secondly "Superior" is definately a matter of opinion. Let's
see how you would define superior.

1) Port (a nicer word for rewrite) the worker MPM from *nix to Windows.
2) Alternately switch to running Linux servers (which have their
plusses) but about which I know nothing. I've been using Windows since
I was 10 years old, I'm confident in my ability to build, secure, and
maintain a Windows server. I don't think anyone would recommend me to
run Linux servers with very little in the way of Linux experience.
3) Rewrite my codebase to use some form of shared memory. This would be
a terrible nightmare that would take at least a month of development
time and a lot of heavy rewriting. It would be very difficult, but I'll
grant that it may work if done properly with only small performance
losses. Sounds like a deal.

I would find an easier time, I think, porting mod_python to .net and
leaving that GIL behind forever. Thankfully, I'm not considering such
drastic measures - yet.

Why on earth would I want to do all of that work? Just because you want
to keep this evil thing called a GIL? My suggestion is in python 3
ditch the ref counting, use a real garbage collector, and make that GIL
walk the plank. I have my doubts that it would happen, but that's fine,
the future of python is in things like IronPython and PyPy. CPython's
days are numbered. If there was a mod_dotnet I wouldn't be using
CPython anymore.

Now, the GIL is independent of this; if you really need threading in
your situation (you share almost everything and have hugely complex
data structures that are difficult to maintain in shm) then you're
still going to run into GIL serialization. If you're doing a lot of
work in native code extensions this may not actually be a big
performance hit, if not it can be pretty bad.

Actually, I'm not sure I understand you correctly. You're saying that
in an environment like apache (with 250 threads or so) and my hugely
complex shared data structures, that the GIL is going to cause a huge
performance hit? So even if I do manage to find my way around in the
Linux world, and I upgrade my memory, I'm still going to be paying for
that darned GIL?

Will the madness never end?
-Sandra

Sep 5 '06 #14

Steve Holden

Sandra-24 wrote:
[Sandra understands shared memory]

>
I would find an easier time, I think, porting mod_python to .net and
leaving that GIL behind forever. Thankfully, I'm not considering such
drastic measures - yet.

Quite right too. You haven't even sacrificed a chicken yet ...

Why on earth would I want to do all of that work? Just because you want
to keep this evil thing called a GIL? My suggestion is in python 3
ditch the ref counting, use a real garbage collector, and make that GIL
walk the plank. I have my doubts that it would happen, but that's fine,
the future of python is in things like IronPython and PyPy. CPython's
days are numbered. If there was a mod_dotnet I wouldn't be using
CPython anymore.

You write as though the GIL was invented to get in the programmer's way,
which is quite wrong. It's there to avoid deep problems with thread
interaction. Languages that haven't bitten that bullet can bite you in
quite nasty ways when you write threaded applications.

Contrary to your apparent opinion, the GIL has nothing to do with
reference-counting.

>
>>Now, the GIL is independent of this; if you really need threading in
your situation (you share almost everything and have hugely complex
data structures that are difficult to maintain in shm) then you're
still going to run into GIL serialization. If you're doing a lot of
work in native code extensions this may not actually be a big
performance hit, if not it can be pretty bad.

Actually, I'm not sure I understand you correctly. You're saying that
in an environment like apache (with 250 threads or so) and my hugely
complex shared data structures, that the GIL is going to cause a huge
performance hit? So even if I do manage to find my way around in the
Linux world, and I upgrade my memory, I'm still going to be paying for
that darned GIL?

I think the suggestion was rather that abandoning Python because of the
GIL might be premature optimisation. But since you appear to be sticking
with it, that might have been unnecessary advice.

Will the madness never end?

This reveals an opinion of the development team that's altogether too
low. I believe the GIL was introduced for good reasons.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 5 '06 #15

Paul Rubin

Steve Holden <st***@holdenweb.comwrites:

You write as though the GIL was invented to get in the programmer's
way, which is quite wrong. It's there to avoid deep problems with
thread interaction. Languages that haven't bitten that bullet can bite
you in quite nasty ways when you write threaded applications.

And yet, Java programmers manage to write threaded applications all
day long without getting bitten (once they're used to the issues),
despite usually being less skilled than Python programmers ;-).

Contrary to your apparent opinion, the GIL has nothing to do with
reference-counting.

I think it does, i.e. one of the GIL's motivations was to protect the
management of reference counts in CPython, which otherwise wasn't
thread-safe. The obvious implementation of Py_INCREF has a race
condition, for example. The GIL documentation at

http://docs.python.org/api/threads.html

describes this in its very first paragraph.

Will the madness never end?

This reveals an opinion of the development team that's altogether too
low. I believe the GIL was introduced for good reasons.

The GIL was an acceptable tradeoff when it was first created in the
previous century. First of all, it gave a way to add threads to the
existing, non-threadsafe CPython implementation without having to
rework the old code too much. Second, Python was at that time
considered a "scripting language" and there was less concern about
writing complex apps in it, especially multiprocessing apps. Third,
multiprocessor computers were themselves exotic, so people who wanted
to program them probably had exotic problems that they were willing to
jump through hoops to solve.

These days, even semi-entry-level consumer laptop computers have dual
core CPU's, and quad Opteron boxes (8-way multiprocessing using X2
processors) are quite affordable for midrange servers or engineering
workstations, and there's endless desire to write fancy server apps
completely in Python. There is no point paying for all that
multiprocessor hardware if your programming language won't let you use
it. So, Python must punt the GIL if it doesn't want to keep
presenting undue obstacles to writing serious apps on modern hardware.

Sep 5 '06 #16

Felipe Almeida Lessa

4 Sep 2006 19:19:24 -0700, Sandra-24 <sa***********@yahoo.com>:

If there was a mod_dotnet I wouldn't be using
CPython anymore.

I guess you won't be using then: http://www.mono-project.com/Mod_mono

--
Felipe.

Sep 5 '06 #17

Sandra-24

Steve Holden wrote:

Quite right too. You haven't even sacrificed a chicken yet ...

Hopefully we don't get to that point.

You write as though the GIL was invented to get in the programmer's way,
which is quite wrong. It's there to avoid deep problems with thread
interaction. Languages that haven't bitten that bullet can bite you in
quite nasty ways when you write threaded applications.

I know it was put there because it is meant to be a good thing.
However, it gets in my way. I would be perfectly happy if it were gone.
I've never written code that assumes there's a GIL. I always write my
code with all shared writable objects protected by locks. It's far more
portable, and a good habit to get into. You realize that because of the
GIL, they were discussing (and may have already implemented) Java style
synchronized dictionaries and lists for IronPython simply because
python programmers just assume they are thread safe thanks to the GIL.
I always hated that about Java. If you want to give me thread safe
collections, fine, they'll be nice for sharing between threads, but
don't make me use synchronized collections for single-threaded code.
You'll notice the newer Java collections are not synchronized, it would
seem I'm not alone in that opinion.

Contrary to your apparent opinion, the GIL has nothing to do with
reference-counting.

Actually it does. Without the GIL reference counting is not thread
safe. You have to synchronize all reference count accesses, increments,
and decrements because you have no way of knowing which objects get
shared across threads. I think with Python's current memory management,
the GIL is the lesser evil.

I'm mostly writing this to provide a different point of view, many
people seem to think (previously linked blog) that there is no downside
to the GIL, and that's just not true. However, I don't expect that the
GIL can be safely removed from CPython. I also think that it doesn't
matter because projects like IronPython and PyPy are very likely the
way of the future for Python anyway. Once you move away from C there
are so many more things you can do.

I think the suggestion was rather that abandoning Python because of the
GIL might be premature optimisation. But since you appear to be sticking
with it, that might have been unnecessary advice.

I would never abandon Python, and I hold the development team in very
high esteem. That doesn't mean there's a few things (like the GIL, or
super) that I don't like. But overall they've done an excellent job on
the 99% of things the've got right. I guess we don't say that enough.

I might switch from CPython sometime to another implementation, but it
won't be because of the GIL. I'm very fond of the .net framework as a
library, and I'd also rather write performance critical code in C# than
C (who wouldn't?) I'm also watching PyPy with interest.

-Sandra

Sep 5 '06 #18

Bryan Olson

bayerj wrote:

Then you can use POSH [1] to share data and objects.

Do you use POSH? How well does it work with current Python?
Any major gotchas?

I think POSH looks like a great thing to have, but the latest
version is an alpha from over three years ago. Also, it only
runs on *nix systems.
--
--Bryan

Sep 5 '06 #19

Sandra-24

Felipe Almeida Lessa wrote:

4 Sep 2006 19:19:24 -0700, Sandra-24 <sa***********@yahoo.com>:
If there was a mod_dotnet I wouldn't be using
CPython anymore.

I guess you won't be using then: http://www.mono-project.com/Mod_mono

Oh I'm aware of that, but it's not what I'm looking for. Mod_mono just
lets you run ASP.NET on Apache. I'd much rather use Python :) Now if
there was a way to run IronPython on Apache I'd be interested.

-Sandra

Sep 5 '06 #20

skip

SandraHowever, I don't expect that the GIL can be safely removed from
SandraCPython.

It was removed at one point in the dim, dark past (circa Python 1.4) on an
experimental basis. Aside from the huge amount of work, it resulted in
significantly lower performance for single-threaded apps (that is, the
common case). Maybe more effort should have been put in at that time to
improve performance, but that didn't happen. Much more water has gone under
the bridge at this point, so extracting the GIL from the core would be
correspondingly more difficult.

Skip

Sep 5 '06 #21

Hi all,

And yet, Java programmers manage to write threaded applications all
day long without getting bitten (once they're used to the issues),
despite usually being less skilled than Python programmers ;-).
These days, even semi-entry-level consumer laptop computers have dual
core CPU's, and quad Opteron boxes (8-way multiprocessing using X2
processors) are quite affordable for midrange servers or engineering
workstations, and there's endless desire to write fancy server apps
completely in Python. There is no point paying for all that
multiprocessor hardware if your programming language won't let you use
it. So, Python must punt the GIL if it doesn't want to keep
presenting undue obstacles to writing serious apps on modern hardware.

True
GIL implementation must have got its own good causes as it it designed
but as language evolves its very essential that one increases the
scope such that it fits into many usage areas(eg. scientific
applications using multiprocessors etc.).

In the modern scientific age where
__multiprocessor_execution_environment__ is quite common, i feel there
is a need to rethink abt the introduction of true parallelization
capabilities in python.
I know many of my friends who didnot choose python for obvious reasons
of the nature of thread execution in the presence of GIL which means
that one is wasting sophisticated hardware resources.
##########################################
if __name__ == ''__multiprocessor_execution_environment__':
for python_version in range(python2.4.x, python3.x, x):

if python_version.GIL:

print 'unusable for computation intensive multiprocessor
architecture'

else:
print cmp(python,java)
############################################

regards,
KM

Sep 5 '06 #22

Bryan Olson

Paul Rubin wrote:

"sj*******@yahoo.com" <sj*******@yahoo.comwrites:
>If it's read/write data or you're not on a Unix platform, you can use
shared memory to shared it between many processes.

Threads are way overused in modern multiexecution programming. The
decision on whether to use processes or threads should come down to
whether you want to share everything, or whether you have specific
pieces of data you want to share.

Shared memory means there's a byte vector (the shared memory region)
accessible to multiple processes. The processes don't use the same
machine addresses to reference the vector. Any data structures
(e.g. those containing pointers) shared between the processes have to
be marshalled in and out of the byte vector instead of being accessed
normally.

I think it's even worse. The standard Python library offers
shared memory, but not cross-process locks. Sharing read-write
memory looks like an automatic race condition. I guess one could
implement one of the primitive spin-lock based mutual exclusion
algorithms, but I think even that would depend on non-portable
assumptions about cache consistency.
--
--Bryan

Sep 5 '06 #23

Richard Brodie

"km" <sr*************@gmail.comwrote in message
news:ma************************************@python .org...

I know many of my friends who did not choose python for obvious reasons
of the nature of thread execution in the presence of GIL which means
that one is wasting sophisticated hardware resources.

It would probably be easier to find smarter friends than to remove the
GIL from Python.

Sep 5 '06 #24

True, since smartness is a comparison, my friends who have chosen java
over python for considerations of a true threading support in a
language are smarter, which makes me a dumbo ! :-)

KM
On 9/5/06, Richard Brodie <R.******@rl.ac.ukwrote:

>
"km" <sr*************@gmail.comwrote in message
news:ma************************************@python .org...

I know many of my friends who did not choose python for obvious reasons
of the nature of thread execution in the presence of GIL which means
that one is wasting sophisticated hardware resources.

It would probably be easier to find smarter friends than to remove the
GIL from Python.
--
http://mail.python.org/mailman/listinfo/python-list

Sep 5 '06 #25

Richard Brodie

"km" <sr*************@gmail.comwrote in message
news:ma************************************@python .org...

True, since smartness is a comparison, my friends who have chosen java
over python for considerations of a true threading support in a
language are smarter, which makes me a dumbo ! :-)

No, but I think you making unwise assumptions about performance.
You have to ask yourself: is Amdahl's law really hurting me?

In some situations Python could no doubt benefit from fine grained
locking. However, it's likely that scientific programming is not typically
one of them, because most of the heavy lifting is done in C or C++
extensions which can run in parallel if they release the GIL. Or you
are going to use a compute farm, and fork as many worker processes
as you have cores.

You might find these slides from SciPy 2004 interesting:
http://datamining.anu.edu.au/~ole/pypar/py4cfd.pdf

Sep 5 '06 #26

Steve Holden

sk**@pobox.com wrote:

SandraHowever, I don't expect that the GIL can be safely removed from
SandraCPython.

It was removed at one point in the dim, dark past (circa Python 1.4) on an
experimental basis. Aside from the huge amount of work, it resulted in
significantly lower performance for single-threaded apps (that is, the
common case). Maybe more effort should have been put in at that time to
improve performance, but that didn't happen. Much more water has gone under
the bridge at this point, so extracting the GIL from the core would be
correspondingly more difficult.

Given the effort that GIL-removal would take, I'm beginning to wonder if
PyPy doesn't offer a better way forward than CPython, in terms of
execution speed improvements returned per developer-hour.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 5 '06 #27

skip

SteveGiven the effort that GIL-removal would take, I'm beginning to
Stevewonder if PyPy doesn't offer a better way forward than CPython,
Stevein terms of execution speed improvements returned per
Stevedeveloper-hour.

How about execution speed improvements per hour of discussion about removing
the GIL? ;-)
Skip

Sep 5 '06 #28

skip

RichardIt would probably be easier to find smarter friends than to
Richardremove the GIL from Python.

And if the friends you find are smart enough, they can remove the GIL for
you!

Skip

Sep 5 '06 #29

sjdevnull

Bryan Olson wrote:

I think it's even worse. The standard Python library offers
shared memory, but not cross-process locks.

File locks are supported by the standard library (at least on Unix,
I've not tried on Windows). They work cross-process and are a normal
method of interprocess locking even in C code.

Sep 5 '06 #30

Lawrence Oluyede

Sandra-24 <sa***********@yahoo.comwrote:

Oh I'm aware of that, but it's not what I'm looking for. Mod_mono just
lets you run ASP.NET on Apache. I'd much rather use Python :) Now if
there was a way to run IronPython on Apache I'd be interested.

Take a look here:
http://lists.ironpython.com/pipermai.../2006-March/00
2049.html
and this thread:
http://www.mail-archive.com/us***@li.../msg01826.html

--
Lawrence - http://www.oluyede.org/blog
"Nothing is more dangerous than an idea
if it's the only one you have" - E. A. Chartier

Sep 5 '06 #31

Lawrence Oluyede

Lawrence Oluyede <rh****@myself.comwrote:

Take a look here:
http://lists.ironpython.com/pipermai.../2006-March/00
2049.html
and this thread:
http://www.mail-archive.com/us***@li.../msg01826.html

Also this: http://www.codeproject.com/useritems/ipaspnet.asp

Google is you friend! :-)

--
Lawrence - http://www.oluyede.org/blog
"Nothing is more dangerous than an idea
if it's the only one you have" - E. A. Chartier

Sep 5 '06 #32

skip

AndreThis seems to be an important issue and fit for discussion in the
Andrecontext of Py3k. What is Guido's opinion?

Dunno. I've never tried channeling Guido before. You'd have to ask him.
Well, maybe Tim Peters will know. He channels Guido on a fairly regular
basis.

Skip

Sep 5 '06 #33

Sandra-24

sj*******@yahoo.com wrote:

You can do the same on Windows if you use CreateProcessEx to create the
new processes and pass a NULL SectionHandle. I don't think this helps
in your case, but I was correcting your impression that "you'd have to
physically double the computer's memory for a dual core, or quadruple
it for a quadcore". That's just not even near true.

Sorry, my bad. What I meant to say is that for my application I would
have to increase the memory linearly with the number of cores. I have
about 100mb of memory that could be shared between processes, but
everything else would really need to be duplicated.

As I said, Apache used to run on Windows with multiple processes; using
a version that supports that is one option. There are good reasons not
to do that, though, so you could be stuck with threads.

I'm not sure it has done that since the 1.3 releases. mod_python will
work for that, but involves going way back in it's release history as
well. I really don't feel comfortable with that, and I don't doubt I'd
give up a lot of things I'd miss.

Having memory protection is superior to not having it--OS designers
spent years implementing it, why would you toss out a fair chunk of it?
Being explicit about what you're sharing is generally better than not.

Actually, I agree. If shared memory will prove easier, then why not use
it, if the application lends itself to that.

But as I said, threads are a better solution if you're sharing the vast
majority of your memory and have complex data structures to share.
When you're starting a new project, really think about whether they're
worth the considerable tradeoffs, though, and consider the merits of a
multiprocess solution.

There are merits, the GIL being one of those. I believe I can fairly
easily rework things into a multi-process environment by duplicating
memory. Over time I can make the memory usage more efficient by sharing
some data structures out, but that may not even be necessary. The
biggest problem is learning my way around Linux servers. I don't think
I'll choose that option initially, but I may work on it as a project in
the future. It's about time I got more familiar with Linux anyway.

It's almost certainly not worth rewriting a large established
codebase.

Lazy me is in perfect agreement.

I disagree with this, though. The benefits of deterministic GC are
huge and I'd like to see ref-counting semantics as part of the language
definition. That's a debate I just had in another thread, though, and
don't want to repeat.

I just took it for granted that a GC like Java and .NET use is better.
I'll dig up that thread and have a look at it.

I didn't say that. It can be a big hit or it can be unnoticeable. It
depends on your application. You have to benchmark to know for sure.

But if you're trying to make a guess: if you're doing a lot of heavy
lifting in native modules then the GIL may be released during those
calls, and you might get good multithreading performance. If you're
doing lots of I/O requests the GIL is generally released during those
and things will be fine. If you're doing lots of heavy crunching in
Python, the GIL is probably held and can be a big performance issue.

I don't do a lot of work in native modules, other than the standard
library things I use, which doesn't count as heavy lifting. However I
do a fair amount of database calls, and either the GIL is released by
MySQLdb, or I'll contribute a patch so that it is. At any rate, I will
measure, and I suspect the GIL will not be an issue.

-Sandra

Sep 5 '06 #34

Paul Rubin

sk**@pobox.com writes:

It was removed at one point in the dim, dark past (circa Python 1.4) on an
experimental basis. Aside from the huge amount of work, it resulted in
significantly lower performance for single-threaded apps (that is, the
common case).

That's probably because they had to put locking and unlocking around
every access to a reference count. A real GC might have fixed that.

Sep 5 '06 #35

Paul Rubin

"sj*******@yahoo.com" <sj*******@yahoo.comwrites:

I think it's even worse. The standard Python library offers
shared memory, but not cross-process locks.

File locks are supported by the standard library (at least on Unix,
I've not tried on Windows). They work cross-process and are a normal
method of interprocess locking even in C code.

I may be missing your point but I didn't realize you could use file
locks to synchronize shared memory in any useful way. File locks are
usually made and released when the file is opened and closed, or at
best through flock or fcntl calls. Shared memory locks should
generally be done with mechanisms like futex, that in the no-wait case
should not involve any system calls.

Sep 6 '06 #36

sjdevnull

Paul Rubin wrote:

"sj*******@yahoo.com" <sj*******@yahoo.comwrites:

I think it's even worse. The standard Python library offers
shared memory, but not cross-process locks.
File locks are supported by the standard library (at least on Unix,
I've not tried on Windows). They work cross-process and are a normal
method of interprocess locking even in C code.

I may be missing your point but I didn't realize you could use file
locks to synchronize shared memory in any useful way.

You can, absolutely. If you're sharing memory through mmap it's
usually the preferred solution; fcntl locks ranges of an open file, so
you lock exactly the portions of the mmap that you're using at a given
time.

It's not an unusual use at all, Unix programs have used file locks in
this manner for upwards of a decade--things like the Apache public
runtime use fcntl or flock for interprocess mutexes, and they're quite
efficient. (The futexes you mentioned are a very recent Linux
innovation).

Sep 6 '06 #37

Paul Rubin

"sj*******@yahoo.com" <sj*******@yahoo.comwrites:

You can, absolutely. If you're sharing memory through mmap it's
usually the preferred solution; fcntl locks ranges of an open file, so
you lock exactly the portions of the mmap that you're using at a given
time.

How can it do that without having to touch the PTE for every single
page in the range, which might be gigabytes? For that matter, how can
it do that on regions smaller than a page? And how does another
process query whether a region is locked, without taking a kernel trap
if it's locked? This sounds absolutely horrendous compared to a
futex, which should usually be just one or two user-mode instructions
and no context switches.

It's not an unusual use at all, Unix programs have used file locks in
this manner for upwards of a decade--things like the Apache public
runtime use fcntl or flock for interprocess mutexes, and they're quite
efficient. (The futexes you mentioned are a very recent Linux
innovation).

Apache doesn't use shared memory in the same way that something like a
database does, so maybe it can more easily tolerate the overhead of
fcntl. Futex is just a somewhat standardized way to do what
programmers have done less portably since the dawn of multiprocessors.

Sep 6 '06 #38

mystilleef

You can use multiple processes to simulate threads via an IPC
mechanism. I use D-Bus to achieve this.

http://www.freedesktop.org/wiki/Software/dbus

km wrote:

Hi all,
Are there any alternate ways of attaining true threading in python ?
if GIL doesnt go then does it mean that python is useless for
computation intensive scientific applications which are in need of
parallelization in threading context ?

regards,
KM
---------------------------------------------------------------------------
On 4 Sep 2006 07:58:00 -0700, bayerj <ba****@in.tum.dewrote:
Hi,

GIL won't go. You might want to read
http://blog.ianbicking.org/gil-of-doom.html .

Regards,
-Justin

--
http://mail.python.org/mailman/listinfo/python-list

Sep 6 '06 #39

Bryan Olson

sj*******@yahoo.com wrote:

Bryan Olson wrote:
>I think it's even worse. The standard Python library offers
shared memory, but not cross-process locks.

File locks are supported by the standard library (at least on Unix,
I've not tried on Windows). They work cross-process and are a normal
method of interprocess locking even in C code.

Ah, O.K. Like Paul, I was unaware how Unix file worked with
mmap.
--
--Bryan

Sep 6 '06 #40

Bryan Olson

I wrote:

Ah, O.K. Like Paul, I was unaware how Unix file worked with
mmap.

Insert "locking" after "file".
--
--Bryan

Sep 6 '06 #41

lcaamano

Here's a relevant post

http://mail.python.org/pipermail/pyt...il/001051.html

or

http://tinyurl.com/fod9u
sk**@pobox.com wrote:

AndreThis seems to be an important issue and fit for discussion in the
Andrecontext of Py3k. What is Guido's opinion?

Dunno. I've never tried channeling Guido before. You'd have to ask him.
Well, maybe Tim Peters will know. He channels Guido on a fairly regular
basis.

Skip

--
lpc

Sep 6 '06 #42

threading support in python

Similar topics