More than one interpreter per process?

sturlamolden

Python has a GIL that impairs scalability on computers with more than
one processor. The problem seems to be that there is only one GIL per
process. Solutions to removing the GIL has always stranded on the need
for 'fine grained locking' on reference counts. I believe there is a
second way, which has been overlooked: Having one GIL per interpreter
instead of one GIL per process.

Currently, the Python C API - as I understand it - only allows for a
single interpreter per process. Here is how Python would be embedded
in a multi-threaded C program today, with the GIL shared among the C
threads:

#include <windows.h>
#include <Python.h>
#include <process.h>

void threadproc(void *data)
{
/* create a thread state for this thread */
PyThreadState *mainstate = NULL;
mainstate = PyThreadState_Get();
PyThreadState *threadstate = PyThreadState_New(mainstate);
PyEval_ReleaseLock();

/* swap this thread in, do whatever we need */
PyEval_AcquireLock();
PyThreadState_Swap(threadstate);
PyRun_SimpleString("print 'Hello World1'\n");
PyThreadState_Swap(NULL);
PyEval_ReleaseLock();

/* clear thread state for this thread */
PyEval_AcquireLock();
PyThreadState_Swap(NULL);
PyThreadState_Clear(threadstate);
PyThreadState_Delete(threadstate);
PyEval_ReleaseLock();

/* tell Windows this thread is done */
_endthread();
}

int main(int argc, char *argv[])
{
HANDLE t1, t2, t3;
Py_Initialize();
PyEval_InitThreads();
t1 = _beginthread(threadproc, 0, NULL);
t2 = _beginthread(threadproc, 0, NULL);
t3 = _beginthread(threadproc, 0, NULL);
WaitForMultipleObjects(3, {t1, t2, t3}, TRUE, INFINITE);
Py_Finalize();
return 0;
}

In the Java native interface (JNI) all functions take an en
environment variable for the VM. The same thing could be done for
Python, with the VM including GIL encapsulated in a single object:

#include <windows.h>
#include <Python.h>
#include <process.h>

void threadproc(void *data)
{
PyVM *vm = Py_Initialize(); /* create a new interpreter */
PyRun_SimpleString(vm, "print 'Hello World1'\n");
Py_Finalize(vm);
_endthread();
}

int main(int argc, char *argv[])
{
HANDLE t1 = _beginthread(threadproc, 0, NULL);
HANDLE t2 = _beginthread(threadproc, 0, NULL);
HANDLE t3 = _beginthread(threadproc, 0, NULL);
WaitForMultipleObjects(3, {t1, t2, t3}, TRUE, INFINITE);
return 0;
}

Doesn't that look a lot nicer?

If one can have more than one interpreter in a single process, it is
possible to create a pool of them and implement concurrent programming
paradigms such as 'forkjoin' (to appear in Java 7, already in C# 3.0).
It would be possible to emulate a fork on platforms not supporting a
native fork(), such as Windows. Perl does this in 'perlfork'. This
would deal with the GIL issue on computers with more than one CPU.

One could actually use ctypes to embed a pool of Python interpreters
in a process already running Python.

Most of the conversion of the current Python C API could be automated.
Python would also need to be linked against a multi-threaded version
of the C library.

Dec 18 '07 #1

Subscribe Post Reply

2175

Michael L Torrie

sturlamolden wrote:

Python has a GIL that impairs scalability on computers with more than
one processor. The problem seems to be that there is only one GIL per
process. Solutions to removing the GIL has always stranded on the need
for 'fine grained locking' on reference counts. I believe there is a
second way, which has been overlooked: Having one GIL per interpreter
instead of one GIL per process.

How would this handle python resources that a programmer would want to
share among the threads? What facilities for IPC between the
interpreters would be used?

Dec 18 '07 #2

Roger Binns

sturlamolden wrote:

If one can have more than one interpreter in a single process,

You can. Have a look at mod_python and mod_wsgi which does exactly
this. But extension modules that use the simplified GIL api don't work
with them (well, if at all).

Most of the conversion of the current Python C API could be automated.

The biggest stumbling block is what to do when the external environment
makes a new thread and then eventually calls back into Python. It is
hard to know which interpretter that callback should go to.

You are also asking for every extension module to have to be changed.
The vast majority are not part of the Python source tree and would also
have to support the versions before a change like this.

You would have more luck getting this sort of change into Python 3 since
that requires most extension modules to be modified a bit (eg to deal
with string and unicode issues).

But before doing that, why not show how much better your scheme would
make things. The costs of doing it are understood, but what are the
benefits in terms of cpu consumption, memory consumption, OS
responsiveness, cache utilisation, multi-core utilisation etc. If the
answer is 1% then that is noise.

Roger

Dec 18 '07 #3

sturlamolden

On 18 Des, 05:46, Michael L Torrie <torr...@chem.byu.eduwrote:

How would this handle python resources that a programmer would want to
share among the threads? What facilities for IPC between the
interpreters would be used?

There would be no IPC as they would live in the same process. A thread-
safe queue would suffice. Python objects would have to be serialized
before placed in the queue. One could also allow NumPy-like arrays in
two separate interpreters to share the same memory buffer.

With an emulated fork() the whole interpreter would be cloned,
possibly deferred in a 'copy on write' scheme.

Multiple processes and IPC is what we have today with e.g. mpi4py.

Dec 18 '07 #4

sturlamolden

On 18 Des, 10:24, Roger Binns <rog...@rogerbinns.comwrote:

The biggest stumbling block is what to do when the external environment
makes a new thread and then eventually calls back into Python. It is
hard to know which interpretter that callback should go to.

Not if you explicitely hav to pass a pointer to the interpreter in
every API call, which is what I suggested.

You are also asking for every extension module to have to be changed.
The vast majority are not part of the Python source tree and would also
have to support the versions before a change like this.

It would break a lot of stuff.

But porting could be automated by a simple Python script. It just
involves changing PySomething(...) to PySomething(env, ...), with env
being a pointer to the interpreter. Since an extension only needs to
know about a single interpreter, it could possibly be done by
preprocessor macros:

#define PySomething(var) PySomething(env, var)

You would have more luck getting this sort of change into Python 3 since
that requires most extension modules to be modified a bit (eg to deal
with string and unicode issues).

PEPs are closed for Python 3.

Dec 18 '07 #5

sturlamolden

On 18 Des, 10:24, Roger Binns <rog...@rogerbinns.comwrote:

You can. Have a look at mod_python and mod_wsgi which does exactly
this. But extension modules that use the simplified GIL api don't work
with them (well, if at all).

mod_python implements use Py_NewInterpreter() to create sub-
interpreters. They all share the same GIL. The GIL is declared static
in ceval.c, and shared for the whole process. But ok, if
PyEval_AquireLock() would take a pointer to a 'sub-GIL', sub-
interpreters could run concurrent on SMPs. But it would require a
separate thread scheduler in each subprocess.

Dec 18 '07 #6

Aahz

In article <22**********************************@i72g2000hsd. googlegroups.com>,
sturlamolden <st**********@yahoo.nowrote:

>On 18 Des, 10:24, Roger Binns <rog...@rogerbinns.comwrote:
>>
You would have more luck getting this sort of change into Python 3 since
that requires most extension modules to be modified a bit (eg to deal
with string and unicode issues).

PEPs are closed for Python 3.

That's true for core interpreter changes and only applies to Python 3.0.
Overall, what Roger said is true: if there is any hope for your proposal,
you must ensure that you can make it happen in Python 3.
--
Aahz (aa**@pythoncraft.com) <* http://www.pythoncraft.com/

"Typing is cheap. Thinking is expensive." --Roy Smith

Dec 18 '07 #7

Graham Dumpleton

On Dec 19, 2:37 am, sturlamolden <sturlamol...@yahoo.nowrote:

On 18 Des, 10:24, Roger Binns <rog...@rogerbinns.comwrote:

You can. Have a look at mod_python andmod_wsgiwhich does exactly
this. But extension modules that use the simplified GIL api don't work
with them (well, if at all).

mod_python implements use Py_NewInterpreter() to create sub-
interpreters. They all share the same GIL. The GIL is declared static
in ceval.c, and shared for the whole process. But ok, if
PyEval_AquireLock() would take a pointer to a 'sub-GIL', sub-
interpreters could run concurrent on SMPs. But it would require a
separate thread scheduler in each subprocess.

In current versions of Python it is possible for multiple sub
interpreters to access the same instance of a Python object which is
notionally independent of any particular interpreter. In other words,
sharing of objects exists between sub interpreters. If you remove the
global GIL and make it per sub interpreter then you would loose this
ability. This may have an impact of some third party C extension
modules, or in embedded systems, which are able to cache simple Python
data objects for use in multiple sub interpreters so that memory usage
is reduced.

Graham

Dec 19 '07 #8

Graham Dumpleton

On Dec 18, 8:24 pm, Roger Binns <rog...@rogerbinns.comwrote:

sturlamolden wrote:
If one can have more than one interpreter in a single process,

You can. Have a look at mod_python andmod_wsgiwhich does exactly
this. But extension modules that use the simplified GIL api don't work
with them (well, if at all).

When using mod_wsgi there is no problem with C extension modules which
use simplified GIL API provided that one configures mod_wsgi to
delegate that specific application to run in the context of the first
interpreter instance created by Python.

In theory the same should be the case for mod_python but there is
currently a bug in the way that mod_python works such that some C
extension modules using simplified API for the GIL still don't work
even when made to run in first interpreter.

Graham

Dec 19 '07 #9

Roger Binns

sturlamolden wrote:

On 18 Des, 10:24, Roger Binns <ro****@rogerbinns.comwrote:

>The biggest stumbling block is what to do when the external environment
makes a new thread and then eventually calls back into Python. It is
hard to know which interpretter that callback should go to.

Not if you explicitely hav to pass a pointer to the interpreter in
every API call, which is what I suggested.

You missed my point. What if the code calling back into Python doesn't
know which interpreter it belongs to? Think of web server with python
callbacks registered for handling various things. Currently that
situation works just fine as the simplified gil apis just pick the main
interpreter.

You have now imposed a requirement on all extension modules that they
need to keep track of interpreters in such a way that callbacks from new
threads not started by Python know which interpreter they belong to.
This is usually possible because you can give callback data in the
external environment apis, but your mechanism would prevent any that
don't have that ability from working at all. We wouldn't find those
"broken" implementations until changing to your mechanism.

But porting could be automated by a simple Python script.

Have you actually tried it? See if you can do it for the sqlite module
which is a standard part of the Python library.

PEPs are closed for Python 3.

You glossed over my "prove the benefit outweighs the costs" bit :-)
This project will let you transparently use multiple processes:

http://cheeseshop.python.org/pypi/processing

There are other techniques for parallelization using multiple processes
and even the network. For example:

http://www.artima.com/forums/flat.js...&thread=214303
http://www.artima.com/weblogs/viewpo...?thread=214235

Roger

Dec 19 '07 #10

Christian Heimes

Roger Binns wrote:

sturlamolden wrote:
>If one can have more than one interpreter in a single process,

You can. Have a look at mod_python and mod_wsgi which does exactly
this. But extension modules that use the simplified GIL api don't work
with them (well, if at all).

No, you can't. Sub-interpreters share a single GIL and other state. Why
don't you run multiple processes? It's on of the oldest and best working
ways use the full potential of your system. Lot's of Unix servers like
postfix, qmail, apache (with some workers) et al. use processes.

Christian

Dec 19 '07 #11

sturlamolden

On 19 Des, 08:02, Christian Heimes <li...@cheimes.dewrote:

No, you can't. Sub-interpreters share a single GIL and other state. Why
don't you run multiple processes? It's on of the oldest and best working
ways use the full potential of your system. Lot's of Unix servers like
postfix, qmail, apache (with some workers) et al. use processes.

Because there is a broken prominent OS that doesn't support fork()?

MPI works with multiple processes, though, and can be used from Python
(mpi4py) even under Windows.

Dec 20 '07 #12

by: DE | last post by:

Hello, Some long time ago, I used to use Tcl/Tk. I had an tcl embedded into my app. The coolest thing was however, I was able to attach to the interpreter (built in to my app) via a tcl shell...

Python

Running Python interpreter in Emacs

by: Rex Eastbourne | last post by:

Hi, I'm interested in running a Python interpreter in Emacs. I have Python extensions for Emacs, and my python menu lists "C-c !" as the command to run the interpreter. Yet when I run it I get...

Python

161

Is 'everything' a refrence or isn't it?

by: KraftDiner | last post by:

I was under the assumption that everything in python was a refrence... so if I code this: lst = for i in lst: if i==2: i = 4 print lst I though the contents of lst would be modified.....

Python

threading and iterator crashing interpreter

by: Janto Dreijer | last post by:

I have been having problems with the Python 2.4 and 2.5 interpreters on both Linux and Windows crashing on me. Unfortunately it's rather complex code and difficult to pin down the source. So...

Python

Executing Python interpreter from Java

by: nadeemabdulhamid | last post by:

Hello, I'm trying to write some Java code that will launch a python interpreter shell and pipe input/output back and forth from the interpreter's i/o streams and the Java program. The code posted...

Python

Looking for an interpreter that does not request internet access

by: James Alan Farrell | last post by:

Hello, I recently installed new anti-virus software and was surprised the next time I brought up IDLE, that it was accessing the internet. I dislike software accessing the internet without...

Python

interpreter vs. compiled

by: castironpi | last post by:

I'm curious about some of the details of the internals of the Python interpreter: I understand C from a hardware perspective. x= y+ 1; Move value from place y into a register Add 1 to the...

Python

Implementing my own Python interpreter

by: Ognjen Bezanov | last post by:

Hello All, I am a third year computer science student and I'm the process of selection for my final year project. One option that was thought up was the idea of implement my own version of...

Python

Re: Implementing my own Python interpreter

by: Matt Nordhoff | last post by:

Ognjen Bezanov wrote: FWIW... There are several other implementations of Python: IronPython (.Net) Jython (Java) PyPy (Python) <http://codespeak.net/pypy/dist/pypy/doc/home.html> You might...

Python

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

More than one interpreter per process?

Similar topics