473,320 Members | 1,978 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Correct way to handle independent interpreters when embedding in asingle-threaded C++ app

Hi folks

I'm a bit of a newbie here, though I've tried to appropriately research
this issue before posting. I've found a lot of questions, a few answers
that don't really answer quite what I'm looking for, but nothing that
really solves or explains all this. I'll admit to being stumped, hence
my question here.

I'm also trying to make this post as clear and detailed as possible.
Unfortunately, that means it's come out like a book. I hope a few kind
souls will be game to read it, on the theory that I'm a user who's
putting in the time to actually provide enough information for once.

I have a Python interpreter embedded in a C++/Qt application (Scribus -
http://www.scribus.net). Scribus, while using multi-threading enabled
libraries, runs in a single 'main' thread. The Python interpreter is
implemented as a plug-in that's used to run user scripts. Overall it's
working very well.

I've run into two problems that are proving very difficult to solve,
however, and I thought I'd ask here for some words of wisdom. I'm only
tackling the first one right now. First I'll provide some background on
how I'm doing things, and what I'm trying to achieve. If anything below
comes out as a request for Python functionality it's not intended to be
- it's just a description of what /I'm/ trying to do.

The Scribus Python plugin is pretty standard - it both embeds the Python
interpreter and provides an extension module to expose
application-specific functionality. It is used to permit users to
execute Python scripts to automate tasks within the application. I also
hope to make it possible to extend the application using Python, but
that's not the issue right now. I need to isolate individual script
executions as much as possible, so that to the greatest extent we can
manage each script runs in a new interpreter. In other words, I need to
minimise the chances of scripts treading on each others toes or leaking
too much with each execution.

Specifically, as much as possible I need to:

- Ensure that memory allocated by Python during a script run is all
freed, including any objects created and modules loaded. An
exception can be made for C extension modules, so long as they
don't leak every time a script is run.
- Ensure that no global state (eg loaded modules, globals namespace,
etc) persists across script executions.

I have no need to be able to run Python scripts in parallel with the
application, nor with each other. If a script goes into an endless loop,
that's a bug with the script, and not the application's problem. I'd
like to reduce the chances of scripts conflicting or messing up the app
state, but don't intend to even try to make it possible to safely run
untrusted scripts or to completely isolate scripts. If the odd C
extension module doesn't like it, I can deal with that too.

Also, some of the extension module functions make Qt gui calls (for
example, create and display a file chooser dialog) or access internal
application state in QObject derived classes. According to the Qt
documentation, this should only be done from the main thread. This is
another reason why I'm making no attempt to make it possible to run
normal Python scripts without blocking the application, or run scripts
in parallel. It also means that all my Python sub-interpreters need to
share the main (and in fact only) application thread.
I've hit two issues with this. The first is that executing a script
crashes the application with SIGABRT if a Python debug build is being
used. Python crashes the app with the error "Invalid thread state for
this thread". I'm working with Python 2.3.4 . The crash is triggered by
a check in pystate.c line 276 - in the PyThreadState_Swap() function:

The code in question is:

/* It should not be possible for more than one thread state
to be used for a thread. Check this the best we can in debug
builds.
*/
#if defined(Py_DEBUG) && defined(WITH_THREAD)
if (new) {
PyThreadState *check = PyGILState_GetThisThreadState();
if (check && check != new)
/* Py_FatalError("Invalid thread state for this thread"); */
printf("We would've died here\n");
}
#endif

A trimmed down and simplified version (eg no error checking, etc) of the
code I'm using in the plugin that hits this check is:

PyThreadState *stateo = PyEval_SaveThread();
PyThreadState *state = Py_NewInterpreter();
initscribus(Carrier); // init the extension module
PySys_SetArgv(1, scriptfilename);
PyObject* m = PyImport_AddModule("__main__");
PyObject* globals = PyModule_GetDict(m);
char* script_string = ... // build script that calls execfile()
PyObject* result = PyRun_String(script_string, Py_file_input,\
globals, globals);
... // handle possible failure and capture exception
Py_EndInterpreter(state);
PyEval_RestoreThread(stateo);

(The full version can be found in
scribus/plugins/scriptplugin/scriptplugin.cpp line 225-279 of Scribus
CVS, http://www.scribus.net/)

The script text isn't really important. It just execfiles()s the user's
script within a try/catch block to ignore SystemExit and to catch and
capture any other fatal exceptions.

The crash occurs at Py_NewInterpreter, when it calls PyThreadState_Swap.
It's pretty clear _what_ is happening - Python is aborting on a sanity
check because I'm trying to use multiple thread states in one thread -
what I'm looking for help with is _why_. When run with a non-debug
build, scripts run just fine. It also runs fine when I use a debug build
of Python without thread support (as is obvious from the code snippet
above). I'm sure there are cases where things can / do go wrong, but for
general use it appears to be just peachy.

So ... my question is, what are the issues behind this check? Does it
indicate that there will be a problem with this condition in all cases?
My understanding is that it's to do with the way Python doesn't use the
full capabilities of platform threading libraries, and has some shared
globals that could cause issues. Correct? If so, is there a way around
this?

All I'm looking to do is to create a clean sub-interpreter state, run a
script in it (in the main thread, with nothing else running) then
dispose of the interpreter at script exit. It's desirable to keep the
main interpreter usable as well, but there will never be more than one
sub-interpreter, and there will never be Python code running in the main
and sub interpreters at the same time. Does the existence of this check
mean that what I'm trying to do is incorrect or unsafe? If not, might it
be possible to provide apps with a way to disable this check (think an
"I know what I'm doing" flag)? Is there another, saner way to do what I
want?

This post describes a similar issue to mine, though their goals are
different, and I don't think the solution will work for me:
http://groups.google.com.au/groups?h....net.au&rnum=7

This message describes the issue I'm seeing:
http://groups.google.com.au/groups?q...4ax.com&rnum=5

Another related message:
http://groups.google.com.au/groups?q...hon.org&rnum=1

Someone says it's just broken:
http://groups.google.com.au/groups?q...lin.de&rnum=17
I've tried one other approach that doesn't involve
Py_NewInterpreter/Py_EndInterpreter, but didn't have much success. What
I tried to do was run each script with a new global dict, so that they
at least had separate global namespaces (though they'd still be able to
influence the next script's interpreter state / module state). If I
recall correctly I ended up with code like this:

execfile(filename, {'__builtins__'=__builtins__,
'__name__':'__main__',
'__file__':filename})

being called from PyRun_String.

This appeared to work fine, but turned out to leak memory like a sieve.
Objects in the script's global namespace weren't being disposed of when
the script terminated. Consequently, if I had a script with one line:

x = x = ' '*200000000

then each time I ran the script the app would gobble a large chunk more
memory and not release it. If I wrote a script that very carefully
deleted everything it put in the top-level namespace before it exited,
such as all variables, imports, classes, and functions, I still leaked a
little memory and a few references, but nothing much. Unfortunately,
doing that is also rather painful at best and seems _really_ clumsy.

It looked to me after some testing with a debug build like the global
dictionaries that were being created for each execfile() call were not
being disposed of after the call terminated, even though no code I was
aware of continued to hold references to them. Circular references? Do I
have to manually invoke the cyclic reference cleanup code in Python when
embedding?

I'm sorry for the lack of detail provided in the discussion of this
approach. It was a while ago. If folks here think it's viable I can go
back and get some more hard data.

With the 'new globals dict' approach, it was also possible for people to
mangle modules and for the next script to see the changes. If there's a
way to re-init modules between runs (at least the built-in ones like
sys, __builtins__, etc, plus the app's extension module and any modules
written in Python), that'd be fantastic.
If there's some way to do achieve what I want to do - get scripts to
execute in private or mostly-private environments in the main thread of
an application - I'd be overjoyed to hear it. I'm very sorry for the
mammoth message, and hope I've made some sense and provided enough
information without boring you all to tears. It's clear that there's
been quite a bit of interest in this topic from my digging through the
list archives, but I just wasn't able to find a clear, definitive
answer.

Phew. To anybody who got this far, thankyou very much for your time and
patience.

--
Craig Ringer

Jul 18 '05 #1
1 3150
If you are always running the Python scripts within the main thread of
the application, then why are you creating a new thread state and run
the script in that state? Why not just this:

Py_Initialize();
PyRun_SimpleString(...);
Py_Finalize();

(Instead of PyRun_SimpleString, do whatever you want to do there)

Since you are not running any python scripts or calling any python
related stuff from other threads, this is the best approach in my
opinion. This will also ensure that execution of one script wont
effect the execution of another because you call Py_Finalize after
the script and thus shut down the interpreter.

Mustafa Demirhan

Craig Ringer <cr***@postnewspapers.com.au> wrote in message news:<ma**************************************@pyt hon.org>...
Hi folks

I'm a bit of a newbie here, though I've tried to appropriately research
this issue before posting. I've found a lot of questions, a few answers
that don't really answer quite what I'm looking for, but nothing that
really solves or explains all this. I'll admit to being stumped, hence
my question here.

I'm also trying to make this post as clear and detailed as possible.
Unfortunately, that means it's come out like a book. I hope a few kind
souls will be game to read it, on the theory that I'm a user who's
putting in the time to actually provide enough information for once.

I have a Python interpreter embedded in a C++/Qt application (Scribus -
http://www.scribus.net). Scribus, while using multi-threading enabled
libraries, runs in a single 'main' thread. The Python interpreter is
implemented as a plug-in that's used to run user scripts. Overall it's
working very well.

I've run into two problems that are proving very difficult to solve,
however, and I thought I'd ask here for some words of wisdom. I'm only
tackling the first one right now. First I'll provide some background on
how I'm doing things, and what I'm trying to achieve. If anything below
comes out as a request for Python functionality it's not intended to be
- it's just a description of what /I'm/ trying to do.

The Scribus Python plugin is pretty standard - it both embeds the Python
interpreter and provides an extension module to expose
application-specific functionality. It is used to permit users to
execute Python scripts to automate tasks within the application. I also
hope to make it possible to extend the application using Python, but
that's not the issue right now. I need to isolate individual script
executions as much as possible, so that to the greatest extent we can
manage each script runs in a new interpreter. In other words, I need to
minimise the chances of scripts treading on each others toes or leaking
too much with each execution.

Specifically, as much as possible I need to:

- Ensure that memory allocated by Python during a script run is all
freed, including any objects created and modules loaded. An
exception can be made for C extension modules, so long as they
don't leak every time a script is run.
- Ensure that no global state (eg loaded modules, globals namespace,
etc) persists across script executions.

I have no need to be able to run Python scripts in parallel with the
application, nor with each other. If a script goes into an endless loop,
that's a bug with the script, and not the application's problem. I'd
like to reduce the chances of scripts conflicting or messing up the app
state, but don't intend to even try to make it possible to safely run
untrusted scripts or to completely isolate scripts. If the odd C
extension module doesn't like it, I can deal with that too.

Also, some of the extension module functions make Qt gui calls (for
example, create and display a file chooser dialog) or access internal
application state in QObject derived classes. According to the Qt
documentation, this should only be done from the main thread. This is
another reason why I'm making no attempt to make it possible to run
normal Python scripts without blocking the application, or run scripts
in parallel. It also means that all my Python sub-interpreters need to
share the main (and in fact only) application thread.
I've hit two issues with this. The first is that executing a script
crashes the application with SIGABRT if a Python debug build is being
used. Python crashes the app with the error "Invalid thread state for
this thread". I'm working with Python 2.3.4 . The crash is triggered by
a check in pystate.c line 276 - in the PyThreadState_Swap() function:

The code in question is:

/* It should not be possible for more than one thread state
to be used for a thread. Check this the best we can in debug
builds.
*/
#if defined(Py_DEBUG) && defined(WITH_THREAD)
if (new) {
PyThreadState *check = PyGILState_GetThisThreadState();
if (check && check != new)
/* Py_FatalError("Invalid thread state for this thread"); */
printf("We would've died here\n");
}
#endif

A trimmed down and simplified version (eg no error checking, etc) of the
code I'm using in the plugin that hits this check is:

PyThreadState *stateo = PyEval_SaveThread();
PyThreadState *state = Py_NewInterpreter();
initscribus(Carrier); // init the extension module
PySys_SetArgv(1, scriptfilename);
PyObject* m = PyImport_AddModule("__main__");
PyObject* globals = PyModule_GetDict(m);
char* script_string = ... // build script that calls execfile()
PyObject* result = PyRun_String(script_string, Py_file_input,\
globals, globals);
... // handle possible failure and capture exception
Py_EndInterpreter(state);
PyEval_RestoreThread(stateo);

(The full version can be found in
scribus/plugins/scriptplugin/scriptplugin.cpp line 225-279 of Scribus
CVS, http://www.scribus.net/)

The script text isn't really important. It just execfiles()s the user's
script within a try/catch block to ignore SystemExit and to catch and
capture any other fatal exceptions.

The crash occurs at Py_NewInterpreter, when it calls PyThreadState_Swap.
It's pretty clear _what_ is happening - Python is aborting on a sanity
check because I'm trying to use multiple thread states in one thread -
what I'm looking for help with is _why_. When run with a non-debug
build, scripts run just fine. It also runs fine when I use a debug build
of Python without thread support (as is obvious from the code snippet
above). I'm sure there are cases where things can / do go wrong, but for
general use it appears to be just peachy.

So ... my question is, what are the issues behind this check? Does it
indicate that there will be a problem with this condition in all cases?
My understanding is that it's to do with the way Python doesn't use the
full capabilities of platform threading libraries, and has some shared
globals that could cause issues. Correct? If so, is there a way around
this?

All I'm looking to do is to create a clean sub-interpreter state, run a
script in it (in the main thread, with nothing else running) then
dispose of the interpreter at script exit. It's desirable to keep the
main interpreter usable as well, but there will never be more than one
sub-interpreter, and there will never be Python code running in the main
and sub interpreters at the same time. Does the existence of this check
mean that what I'm trying to do is incorrect or unsafe? If not, might it
be possible to provide apps with a way to disable this check (think an
"I know what I'm doing" flag)? Is there another, saner way to do what I
want?

This post describes a similar issue to mine, though their goals are
different, and I don't think the solution will work for me:
http://groups.google.com.au/groups?h....net.au&rnum=7

This message describes the issue I'm seeing:
http://groups.google.com.au/groups?q...4ax.com&rnum=5

Another related message:
http://groups.google.com.au/groups?q...hon.org&rnum=1

Someone says it's just broken:
http://groups.google.com.au/groups?q...lin.de&rnum=17
I've tried one other approach that doesn't involve
Py_NewInterpreter/Py_EndInterpreter, but didn't have much success. What
I tried to do was run each script with a new global dict, so that they
at least had separate global namespaces (though they'd still be able to
influence the next script's interpreter state / module state). If I
recall correctly I ended up with code like this:

execfile(filename, {'__builtins__'=__builtins__,
'__name__':'__main__',
'__file__':filename})

being called from PyRun_String.

This appeared to work fine, but turned out to leak memory like a sieve.
Objects in the script's global namespace weren't being disposed of when
the script terminated. Consequently, if I had a script with one line:

x = x = ' '*200000000

then each time I ran the script the app would gobble a large chunk more
memory and not release it. If I wrote a script that very carefully
deleted everything it put in the top-level namespace before it exited,
such as all variables, imports, classes, and functions, I still leaked a
little memory and a few references, but nothing much. Unfortunately,
doing that is also rather painful at best and seems _really_ clumsy.

It looked to me after some testing with a debug build like the global
dictionaries that were being created for each execfile() call were not
being disposed of after the call terminated, even though no code I was
aware of continued to hold references to them. Circular references? Do I
have to manually invoke the cyclic reference cleanup code in Python when
embedding?

I'm sorry for the lack of detail provided in the discussion of this
approach. It was a while ago. If folks here think it's viable I can go
back and get some more hard data.

With the 'new globals dict' approach, it was also possible for people to
mangle modules and for the next script to see the changes. If there's a
way to re-init modules between runs (at least the built-in ones like
sys, __builtins__, etc, plus the app's extension module and any modules
written in Python), that'd be fantastic.
If there's some way to do achieve what I want to do - get scripts to
execute in private or mostly-private environments in the main thread of
an application - I'd be overjoyed to hear it. I'm very sorry for the
mammoth message, and hope I've made some sense and provided enough
information without boring you all to tears. It's clear that there's
been quite a bit of interest in this topic from my digging through the
list archives, but I just wasn't able to find a clear, definitive
answer.

Phew. To anybody who got this far, thankyou very much for your time and
patience.

Jul 18 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Terry Hancock | last post by:
This question was brought up by packagers trying to set policy for including Python modules in Debian Gnu/Linux: Are the .pyc / .pyo files safely architecture independent? (I.e. are they now,...
1
by: Maciej Sobczak | last post by:
Hi, I'm interested in embedding the Python interpreter in a C++ application. What I miss is the possibility to create many different interpreters, so that the stuff that is running in one...
0
by: Konrad Hinsen | last post by:
I am looking for a way to have several threads, each of which running an independent Python interpreter. Most of the work is done by the module "code", but I also need to provide each interpreter...
2
by: bmatt | last post by:
I am trying to support multiple interpreter instances within a single main application thread. The reason I would like separate interpreters is because objects in my system can be extended with...
3
by: adsheehan | last post by:
Hi, Does anyone know the reasoning or pros/cons for either (in a multi-threaded C++ app): - creating many sub-interpreters (Py_NewInterpreter) with a thread state each Or
0
by: gabriel.becedillas | last post by:
Hi, At the company I work for we've embedded Python 2.4.1 in a C++ application. We execute multiple scripts concurrenlty, each one in its own interpreter (created using Py_NewInterpreter()). We...
0
by: gnrgattadi | last post by:
Hi All, Im Unable to get desktop handle in vc++.net.Is there any method existed in .net that will provide desktop handle like in vc++ GetDesktopWindow(). In my Project I have to make A...
3
by: Marcin Kalicinski | last post by:
How do I use multiple Python interpreters within the same process? I know there's a function Py_NewInterpreter. However, how do I use functions like Py_RunString etc. with it? They don't take any...
7
by: skip | last post by:
This question was posed to me today. Given a C/C++ program we can clearly embed a Python interpreter in it. Is it possible to fire up multiple interpreters in multiple threads? For example: ...
114
by: Andy | last post by:
Dear Python dev community, I'm CTO at a small software company that makes music visualization software (you can check us out at www.soundspectrum.com). About two years ago we went with decision...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.