Correct way to handle independent interpreters when embedding in asingle-threaded C++ app

Craig Ringer

Hi folks

I'm a bit of a newbie here, though I've tried to appropriately research
this issue before posting. I've found a lot of questions, a few answers
that don't really answer quite what I'm looking for, but nothing that
really solves or explains all this. I'll admit to being stumped, hence
my question here.

I'm also trying to make this post as clear and detailed as possible.
Unfortunately, that means it's come out like a book. I hope a few kind
souls will be game to read it, on the theory that I'm a user who's
putting in the time to actually provide enough information for once.

I have a Python interpreter embedded in a C++/Qt application (Scribus -
http://www.scribus.net). Scribus, while using multi-threading enabled
libraries, runs in a single 'main' thread. The Python interpreter is
implemented as a plug-in that's used to run user scripts. Overall it's
working very well.

I've run into two problems that are proving very difficult to solve,
however, and I thought I'd ask here for some words of wisdom. I'm only
tackling the first one right now. First I'll provide some background on
how I'm doing things, and what I'm trying to achieve. If anything below
comes out as a request for Python functionality it's not intended to be
- it's just a description of what /I'm/ trying to do.

The Scribus Python plugin is pretty standard - it both embeds the Python
interpreter and provides an extension module to expose
application-specific functionality. It is used to permit users to
execute Python scripts to automate tasks within the application. I also
hope to make it possible to extend the application using Python, but
that's not the issue right now. I need to isolate individual script
executions as much as possible, so that to the greatest extent we can
manage each script runs in a new interpreter. In other words, I need to
minimise the chances of scripts treading on each others toes or leaking
too much with each execution.

Specifically, as much as possible I need to:

- Ensure that memory allocated by Python during a script run is all
freed, including any objects created and modules loaded. An
exception can be made for C extension modules, so long as they
don't leak every time a script is run.
- Ensure that no global state (eg loaded modules, globals namespace,
etc) persists across script executions.

I have no need to be able to run Python scripts in parallel with the
application, nor with each other. If a script goes into an endless loop,
that's a bug with the script, and not the application's problem. I'd
like to reduce the chances of scripts conflicting or messing up the app
state, but don't intend to even try to make it possible to safely run
untrusted scripts or to completely isolate scripts. If the odd C
extension module doesn't like it, I can deal with that too.

Also, some of the extension module functions make Qt gui calls (for
example, create and display a file chooser dialog) or access internal
application state in QObject derived classes. According to the Qt
documentation, this should only be done from the main thread. This is
another reason why I'm making no attempt to make it possible to run
normal Python scripts without blocking the application, or run scripts
in parallel. It also means that all my Python sub-interpreters need to
share the main (and in fact only) application thread.
I've hit two issues with this. The first is that executing a script
crashes the application with SIGABRT if a Python debug build is being
used. Python crashes the app with the error "Invalid thread state for
this thread". I'm working with Python 2.3.4 . The crash is triggered by
a check in pystate.c line 276 - in the PyThreadState_S wap() function:

The code in question is:

/* It should not be possible for more than one thread state
to be used for a thread. Check this the best we can in debug
builds.
*/
#if defined(Py_DEBU G) && defined(WITH_TH READ)
if (new) {
PyThreadState *check = PyGILState_GetT hisThreadState( );
if (check && check != new)
/* Py_FatalError(" Invalid thread state for this thread"); */
printf("We would've died here\n");
}
#endif

A trimmed down and simplified version (eg no error checking, etc) of the
code I'm using in the plugin that hits this check is:

PyThreadState *stateo = PyEval_SaveThre ad();
PyThreadState *state = Py_NewInterpret er();
initscribus(Car rier); // init the extension module
PySys_SetArgv(1 , scriptfilename) ;
PyObject* m = PyImport_AddMod ule("__main__") ;
PyObject* globals = PyModule_GetDic t(m);
char* script_string = ... // build script that calls execfile()
PyObject* result = PyRun_String(sc ript_string, Py_file_input,\
globals, globals);
... // handle possible failure and capture exception
Py_EndInterpret er(state);
PyEval_RestoreT hread(stateo);

(The full version can be found in
scribus/plugins/scriptplugin/scriptplugin.cp p line 225-279 of Scribus
CVS, http://www.scribus.net/)

The script text isn't really important. It just execfiles()s the user's
script within a try/catch block to ignore SystemExit and to catch and
capture any other fatal exceptions.

The crash occurs at Py_NewInterpret er, when it calls PyThreadState_S wap.
It's pretty clear _what_ is happening - Python is aborting on a sanity
check because I'm trying to use multiple thread states in one thread -
what I'm looking for help with is _why_. When run with a non-debug
build, scripts run just fine. It also runs fine when I use a debug build
of Python without thread support (as is obvious from the code snippet
above). I'm sure there are cases where things can / do go wrong, but for
general use it appears to be just peachy.

So ... my question is, what are the issues behind this check? Does it
indicate that there will be a problem with this condition in all cases?
My understanding is that it's to do with the way Python doesn't use the
full capabilities of platform threading libraries, and has some shared
globals that could cause issues. Correct? If so, is there a way around
this?

All I'm looking to do is to create a clean sub-interpreter state, run a
script in it (in the main thread, with nothing else running) then
dispose of the interpreter at script exit. It's desirable to keep the
main interpreter usable as well, but there will never be more than one
sub-interpreter, and there will never be Python code running in the main
and sub interpreters at the same time. Does the existence of this check
mean that what I'm trying to do is incorrect or unsafe? If not, might it
be possible to provide apps with a way to disable this check (think an
"I know what I'm doing" flag)? Is there another, saner way to do what I
want?

This post describes a similar issue to mine, though their goals are
different, and I don't think the solution will work for me:
http://groups.google.com.au/groups?h....net.au&rnum=7

This message describes the issue I'm seeing:
http://groups.google.com.au/groups?q...4ax.com&rnum=5

Another related message:
http://groups.google.com.au/groups?q...hon.org&rnum=1

Someone says it's just broken:
http://groups.google.com.au/groups?q...lin.de&rnum=17
I've tried one other approach that doesn't involve
Py_NewInterpret er/Py_EndInterpret er, but didn't have much success. What
I tried to do was run each script with a new global dict, so that they
at least had separate global namespaces (though they'd still be able to
influence the next script's interpreter state / module state). If I
recall correctly I ended up with code like this:

execfile(filena me, {'__builtins__' =__builtins__,
'__name__':'__m ain__',
'__file__':file name})

being called from PyRun_String.

This appeared to work fine, but turned out to leak memory like a sieve.
Objects in the script's global namespace weren't being disposed of when
the script terminated. Consequently, if I had a script with one line:

x = x = ' '*200000000

then each time I ran the script the app would gobble a large chunk more
memory and not release it. If I wrote a script that very carefully
deleted everything it put in the top-level namespace before it exited,
such as all variables, imports, classes, and functions, I still leaked a
little memory and a few references, but nothing much. Unfortunately,
doing that is also rather painful at best and seems _really_ clumsy.

It looked to me after some testing with a debug build like the global
dictionaries that were being created for each execfile() call were not
being disposed of after the call terminated, even though no code I was
aware of continued to hold references to them. Circular references? Do I
have to manually invoke the cyclic reference cleanup code in Python when
embedding?

I'm sorry for the lack of detail provided in the discussion of this
approach. It was a while ago. If folks here think it's viable I can go
back and get some more hard data.

With the 'new globals dict' approach, it was also possible for people to
mangle modules and for the next script to see the changes. If there's a
way to re-init modules between runs (at least the built-in ones like
sys, __builtins__, etc, plus the app's extension module and any modules
written in Python), that'd be fantastic.
If there's some way to do achieve what I want to do - get scripts to
execute in private or mostly-private environments in the main thread of
an application - I'd be overjoyed to hear it. I'm very sorry for the
mammoth message, and hope I've made some sense and provided enough
information without boring you all to tears. It's clear that there's
been quite a bit of interest in this topic from my digging through the
list archives, but I just wasn't able to find a clear, definitive
answer.

Phew. To anybody who got this far, thankyou very much for your time and
patience.

--
Craig Ringer

Jul 18 '05 #1

Subscribe Reply

3168

Mustafa Demirhan

If you are always running the Python scripts within the main thread of
the application, then why are you creating a new thread state and run
the script in that state? Why not just this:

Py_Initialize() ;
PyRun_SimpleStr ing(...);
Py_Finalize();

(Instead of PyRun_SimpleStr ing, do whatever you want to do there)

Since you are not running any python scripts or calling any python
related stuff from other threads, this is the best approach in my
opinion. This will also ensure that execution of one script wont
effect the execution of another because you call Py_Finalize after
the script and thus shut down the interpreter.

Mustafa Demirhan

Craig Ringer <cr***@postnews papers.com.au> wrote in message news:<ma******* *************** *************** *@python.org>.. .

Hi folks

I'm a bit of a newbie here, though I've tried to appropriately research
this issue before posting. I've found a lot of questions, a few answers
that don't really answer quite what I'm looking for, but nothing that
really solves or explains all this. I'll admit to being stumped, hence
my question here.

I'm also trying to make this post as clear and detailed as possible.
Unfortunately, that means it's come out like a book. I hope a few kind
souls will be game to read it, on the theory that I'm a user who's
putting in the time to actually provide enough information for once.

I have a Python interpreter embedded in a C++/Qt application (Scribus -
http://www.scribus.net). Scribus, while using multi-threading enabled
libraries, runs in a single 'main' thread. The Python interpreter is
implemented as a plug-in that's used to run user scripts. Overall it's
working very well.

I've run into two problems that are proving very difficult to solve,
however, and I thought I'd ask here for some words of wisdom. I'm only
tackling the first one right now. First I'll provide some background on
how I'm doing things, and what I'm trying to achieve. If anything below
comes out as a request for Python functionality it's not intended to be
- it's just a description of what /I'm/ trying to do.

The Scribus Python plugin is pretty standard - it both embeds the Python
interpreter and provides an extension module to expose
application-specific functionality. It is used to permit users to
execute Python scripts to automate tasks within the application. I also
hope to make it possible to extend the application using Python, but
that's not the issue right now. I need to isolate individual script
executions as much as possible, so that to the greatest extent we can
manage each script runs in a new interpreter. In other words, I need to
minimise the chances of scripts treading on each others toes or leaking
too much with each execution.

Specifically, as much as possible I need to:

- Ensure that memory allocated by Python during a script run is all
freed, including any objects created and modules loaded. An
exception can be made for C extension modules, so long as they
don't leak every time a script is run.
- Ensure that no global state (eg loaded modules, globals namespace,
etc) persists across script executions.

I have no need to be able to run Python scripts in parallel with the
application, nor with each other. If a script goes into an endless loop,
that's a bug with the script, and not the application's problem. I'd
like to reduce the chances of scripts conflicting or messing up the app
state, but don't intend to even try to make it possible to safely run
untrusted scripts or to completely isolate scripts. If the odd C
extension module doesn't like it, I can deal with that too.

Also, some of the extension module functions make Qt gui calls (for
example, create and display a file chooser dialog) or access internal
application state in QObject derived classes. According to the Qt
documentation, this should only be done from the main thread. This is
another reason why I'm making no attempt to make it possible to run
normal Python scripts without blocking the application, or run scripts
in parallel. It also means that all my Python sub-interpreters need to
share the main (and in fact only) application thread.
I've hit two issues with this. The first is that executing a script
crashes the application with SIGABRT if a Python debug build is being
used. Python crashes the app with the error "Invalid thread state for
this thread". I'm working with Python 2.3.4 . The crash is triggered by
a check in pystate.c line 276 - in the PyThreadState_S wap() function:

The code in question is:

/* It should not be possible for more than one thread state
to be used for a thread. Check this the best we can in debug
builds.
*/
#if defined(Py_DEBU G) && defined(WITH_TH READ)
if (new) {
PyThreadState *check = PyGILState_GetT hisThreadState( );
if (check && check != new)
/* Py_FatalError(" Invalid thread state for this thread"); */
printf("We would've died here\n");
}
#endif

A trimmed down and simplified version (eg no error checking, etc) of the
code I'm using in the plugin that hits this check is:

PyThreadState *stateo = PyEval_SaveThre ad();
PyThreadState *state = Py_NewInterpret er();
initscribus(Car rier); // init the extension module
PySys_SetArgv(1 , scriptfilename) ;
PyObject* m = PyImport_AddMod ule("__main__") ;
PyObject* globals = PyModule_GetDic t(m);
char* script_string = ... // build script that calls execfile()
PyObject* result = PyRun_String(sc ript_string, Py_file_input,\
globals, globals);
... // handle possible failure and capture exception
Py_EndInterpret er(state);
PyEval_RestoreT hread(stateo);

(The full version can be found in
scribus/plugins/scriptplugin/scriptplugin.cp p line 225-279 of Scribus
CVS, http://www.scribus.net/)

The script text isn't really important. It just execfiles()s the user's
script within a try/catch block to ignore SystemExit and to catch and
capture any other fatal exceptions.

The crash occurs at Py_NewInterpret er, when it calls PyThreadState_S wap.
It's pretty clear _what_ is happening - Python is aborting on a sanity
check because I'm trying to use multiple thread states in one thread -
what I'm looking for help with is _why_. When run with a non-debug
build, scripts run just fine. It also runs fine when I use a debug build
of Python without thread support (as is obvious from the code snippet
above). I'm sure there are cases where things can / do go wrong, but for
general use it appears to be just peachy.

So ... my question is, what are the issues behind this check? Does it
indicate that there will be a problem with this condition in all cases?
My understanding is that it's to do with the way Python doesn't use the
full capabilities of platform threading libraries, and has some shared
globals that could cause issues. Correct? If so, is there a way around
this?

All I'm looking to do is to create a clean sub-interpreter state, run a
script in it (in the main thread, with nothing else running) then
dispose of the interpreter at script exit. It's desirable to keep the
main interpreter usable as well, but there will never be more than one
sub-interpreter, and there will never be Python code running in the main
and sub interpreters at the same time. Does the existence of this check
mean that what I'm trying to do is incorrect or unsafe? If not, might it
be possible to provide apps with a way to disable this check (think an
"I know what I'm doing" flag)? Is there another, saner way to do what I
want?

This post describes a similar issue to mine, though their goals are
different, and I don't think the solution will work for me:
http://groups.google.com.au/groups?h....net.au&rnum=7

This message describes the issue I'm seeing:
http://groups.google.com.au/groups?q...4ax.com&rnum=5

Another related message:
http://groups.google.com.au/groups?q...hon.org&rnum=1

Someone says it's just broken:
http://groups.google.com.au/groups?q...lin.de&rnum=17
I've tried one other approach that doesn't involve
Py_NewInterpret er/Py_EndInterpret er, but didn't have much success. What
I tried to do was run each script with a new global dict, so that they
at least had separate global namespaces (though they'd still be able to
influence the next script's interpreter state / module state). If I
recall correctly I ended up with code like this:

execfile(filena me, {'__builtins__' =__builtins__,
'__name__':'__m ain__',
'__file__':file name})

being called from PyRun_String.

This appeared to work fine, but turned out to leak memory like a sieve.
Objects in the script's global namespace weren't being disposed of when
the script terminated. Consequently, if I had a script with one line:

x = x = ' '*200000000

then each time I ran the script the app would gobble a large chunk more
memory and not release it. If I wrote a script that very carefully
deleted everything it put in the top-level namespace before it exited,
such as all variables, imports, classes, and functions, I still leaked a
little memory and a few references, but nothing much. Unfortunately,
doing that is also rather painful at best and seems _really_ clumsy.

It looked to me after some testing with a debug build like the global
dictionaries that were being created for each execfile() call were not
being disposed of after the call terminated, even though no code I was
aware of continued to hold references to them. Circular references? Do I
have to manually invoke the cyclic reference cleanup code in Python when
embedding?

I'm sorry for the lack of detail provided in the discussion of this
approach. It was a while ago. If folks here think it's viable I can go
back and get some more hard data.

With the 'new globals dict' approach, it was also possible for people to
mangle modules and for the next script to see the changes. If there's a
way to re-init modules between runs (at least the built-in ones like
sys, __builtins__, etc, plus the app's extension module and any modules
written in Python), that'd be fantastic.
If there's some way to do achieve what I want to do - get scripts to
execute in private or mostly-private environments in the main thread of
an application - I'd be overjoyed to hear it. I'm very sorry for the
mammoth message, and hope I've made some sense and provided enough
information without boring you all to tears. It's clear that there's
been quite a bit of interest in this topic from my digging through the
list archives, but I just wasn't able to find a clear, definitive
answer.

Phew. To anybody who got this far, thankyou very much for your time and
patience.

Jul 18 '05 #2

Similar topics

4119

pyc / pyo architecture independent?

by: Terry Hancock | last post by:

This question was brought up by packagers trying to set policy for including Python modules in Debian Gnu/Linux: Are the .pyc / .pyo files safely architecture independent? (I.e. are they now, and are they likely or even guaranteed to remain so?). I know the bytecode can change between interpreter versions and other interpreters like...

Python

2147

Multiple interpreters in a single process

by: Maciej Sobczak | last post by:

Hi, I'm interested in embedding the Python interpreter in a C++ application. What I miss is the possibility to create many different interpreters, so that the stuff that is running in one interpreter does not influence the other. In essence, the interpreter can be used in different modules of a single application. It would be nice to...

Python

1147

Independent interpreters in threads

by: Konrad Hinsen | last post by:

I am looking for a way to have several threads, each of which running an independent Python interpreter. Most of the work is done by the module "code", but I also need to provide each interpreter with an independent set of modules. I found a solution using name-mangling through the ihooks module, but it is not quite clear which aspects of...

Python

2165

Multiple Interpreters In a Single Thread

by: bmatt | last post by:

I am trying to support multiple interpreter instances within a single main application thread. The reason I would like separate interpreters is because objects in my system can be extended with python scripts via a well defined interface (i.e. The onCreate script will be called when the object is created). So...Is it necessary to use...

Python

1821

Embedding: many interpreters OR one interpreter with many thread states ?

by: adsheehan | last post by:

Hi, Does anyone know the reasoning or pros/cons for either (in a multi-threaded C++ app): - creating many sub-interpreters (Py_NewInterpreter) with a thread state each Or

Python

1086

Sharing between multiple interpreters and restricted mode

by: gabriel.becedillas | last post by:

Hi, At the company I work for we've embedded Python 2.4.1 in a C++ application. We execute multiple scripts concurrenlty, each one in its own interpreter (created using Py_NewInterpreter()). We are sharing a certain instance between interpreters because its to expensive to instantiate that class every time an interpreter is created. The...

Python

1677

Unable to Get desktop Handle in .net

by: gnrgattadi | last post by:

Hi All, Im Unable to get desktop handle in vc++.net.Is there any method existed in .net that will provide desktop handle like in vc++ GetDesktopWindow(). In my Project I have to make A listview which is on the form,and the items to the listview are added at runtime from a database ,according to that the listview height & width are...

.NET Framework

9147

Multiple python interpreters within the same process

by: Marcin Kalicinski | last post by:

How do I use multiple Python interpreters within the same process? I know there's a function Py_NewInterpreter. However, how do I use functions like Py_RunString etc. with it? They don't take any arguments that would tell on which interpreter to run the string...? Marcin

Python

12532

Multiple independent Python interpreters in a C/C++ program?

by: skip | last post by:

This question was posed to me today. Given a C/C++ program we can clearly embed a Python interpreter in it. Is it possible to fire up multiple interpreters in multiple threads? For example: C++ main thread 1 Py_Initialize() thread 2 Py_Initialize()

Python

114

3833

2.6, 3.0, and truly independent intepreters

by: Andy | last post by:

Dear Python dev community, I'm CTO at a small software company that makes music visualization software (you can check us out at www.soundspectrum.com). About two years ago we went with decision to use embedded python in a couple of our new products, given all the great things about python. We were close to using lua but for various...

Python

7804

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...

Windows Server

8156

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...

C / C++

8310

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...

Online Marketing

6563

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...

Career Advice

5366

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...

C# / C Sharp

3809

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...

Networking - Hardware / Configuration

3832

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2307

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1409

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP