Using python for writing models: How to run models in restricted python mode?

vinjvinj

I have an application which allows multiple users to write models.
These models get distributed on a grid of compute engines. users submit
their models through a web interface. I want to

1. restrict the user from doing any file io, exec, import, eval, etc. I
was thinking of writing a plugin for pylint to do all the checks? Is
this is a good way given that there is no restricted python. What are
the things I should serach for in python code

2. restrict the amount of memory a module uses as well. For instance
how can I restrict a user from doing a = range(10000000000) or similar
tasks so that my whole compute farm does not come down.

Thanks for your help

Nov 7 '05 #1

Subscribe Post Reply

2063

Mike Meyer

"vinjvinj" <vi******@gmail.com> writes:

1. restrict the user from doing any file io, exec, import, eval, etc. I
was thinking of writing a plugin for pylint to do all the checks? Is
this is a good way given that there is no restricted python. What are
the things I should serach for in python code
Um - I've got a restricted python module: rexec.py. Of course, it
doesn't work correctly, in that it isn't really secure. Python is very
powerful, and creating a secure sandbox is difficult - so much so that
the task has never been accomplished. If you want something that will
keep the obvious things from working, rexec.py might be for you - but
don't kid yourself that it's secure. If you need real security, I'd
consider switching to Jython, which at least has a VM which was
designed with building such sandboxes as a possibility.
2. restrict the amount of memory a module uses as well. For instance
how can I restrict a user from doing a = range(10000000000) or similar
tasks so that my whole compute farm does not come down.

This is equivalent to trying to limit the amount of CPU time the
module uses, which is better known as the halting problem. There's no
algorithmic solution to that. If you want verify that some module will
only use so much memory before executing it, the best you can do is
verify that they don't do anything obvious. If you want to restrict
them while they are running, you can probably get the OS to
help. Exactly how will depend on your requirements, and the OS
involved.

<Mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Nov 7 '05 #2

vinjvinj

While I understand 2 is very hard (if not impossible) to do in single
unix process. I'm not sure why 1 would be hard to do. Since I have
complete control to what code I can allow or not allow on my grid. Can
i not just search for certain strings and disallow the model if it
fails certain conditions. It might not be 100% secure but will it not
get me at 90%...

Nov 7 '05 #3

Mike Meyer

"vinjvinj" <vi******@gmail.com> writes:

While I understand 2 is very hard (if not impossible) to do in single
unix process. I'm not sure why 1 would be hard to do. Since I have
complete control to what code I can allow or not allow on my grid. Can
i not just search for certain strings and disallow the model if it
fails certain conditions. It might not be 100% secure but will it not
get me at 90%...

Sure you can search for certain strings. Python lets you build strings
dynamically, so you'd have to search for every possible way to create
those strings. Further, Python provides lots of tools for
introspection, meaning there are lots of ways to find these
"forbidden" objects other than mentioning their name.

You can get to *every* builtin function through any python module. For
instance, are you going to prevent them from using regular
rexpressions? If not, consider:

getattr(re, ''.join([chr(x + 1) for x in [94, 94, 97, 116, 104, 107, 115, 104, 109, 114, 94, 94]]))['fi' + 'le'] is open True

String searches only prevent the most obvious abuses, and may well
miss things that are merely not quite so obvious. If you think of your
"security" as a notice to the end user that they are doing something
wrong, as opposed to a tool that will prevent them from doing it, then
you'll have the right idea. In which case, I'd still recommend looking
into the rexec module.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Nov 7 '05 #4

Steven D'Aprano

vinjvinj wrote:

While I understand 2 is very hard (if not impossible) to do in single
unix process. I'm not sure why 1 would be hard to do. Since I have
complete control to what code I can allow or not allow on my grid. Can
i not just search for certain strings and disallow the model if it
fails certain conditions. It might not be 100% secure but will it not
get me at 90%...

You might be able to think of and disallow the most
obvious security holes, but how confident are you that
you will think of the bad code that your users will
think of?

Are you concerned about malicious users, or just
incompetent users?

I suspect your best bet might be to write a
mini-language using Python, and get your users to use
that. You will take a small performance hit, but
security will be very much improved.

What do others think?
--
Steven.

Nov 8 '05 #5

Paul Rubin

Steven D'Aprano <st***@REMOVEMEcyber.com.au> writes:

I suspect your best bet might be to write a mini-language using
Python, and get your users to use that. You will take a small
performance hit, but security will be very much improved.

What do others think?

That is the only approach that makes any sense. Even with restricted
execution there's no way to stop memory exhaustion with restricted
Python statements. Consider

xxx = 'x'*10000000000

Nov 8 '05 #6

vinjvinj

I'm more worried about incompetent users then malicious users. I'm
going to take the following steps:

1. My users will be paying a decent amount of money to run models on
the compute grid. If they are intentionaly writing malicious code then
their account will be disabled.

2. Since their models will be fairly basic.
- No imports in the code.
- No special charters allowed.
- No access to special builtins.

The users write functions which get called man many times with
different variables. I'm not sure how this would work with the rexec
module especially since I'll be passing values to th functions and the
functions will be returning either None, yes, or False.

3. Pylint has a pretty cool way to write your onw custom plugins. You
can write custom handlers for each sort of available node at:
http://www.python.org/doc/current/li...piler.ast.html
this will allow me to compile a module and give users feedback on what
is wrong and what is not allowed.

4. I'll set up a test sandbox where the models will be run with a
smaller dataaset before then can be pushed into production. if the
models pass the sandbox test then they will be run in production.

I'm going to have write some custom performance monitoring functions to
get notified when some models are running for ever and be able to
terminate them.

vinjvinj

Nov 8 '05 #7

vinjvinj

I have so many things to do to get this to production and writing a
mini language would be a full project in itself. :-<.

Is there an easy way to do this? If not, I'll go with the steps
outlined in my other post.

vinjvinj

Nov 8 '05 #8

Jeffrey Schwab

vinjvinj wrote:

I have so many things to do to get this to production and writing a
mini language would be a full project in itself. :-<.

Is there an easy way to do this? If not, I'll go with the steps
outlined in my other post.

Do you really think it will be faster to start parsing Python code,
looking for potentially dangerous constructs?

Nov 8 '05 #9

vinjvinj

No. I was hoping to leverage the work done for restricted pythonscript
by zope at:

http://www.zope.org/Control_Panel/Pr...ythonScript.py

which is similar to what I want to do as well.

vinjvinj

Nov 8 '05 #10

Paul Rubin

"vinjvinj" <vi******@gmail.com> writes:

No. I was hoping to leverage the work done for restricted pythonscript
by zope at:

http://www.zope.org/Control_Panel/Pr...ythonScript.py

How does Pythonscript deal with
xxx = 'x' * 1000000000
as a memory DOS attack?

Nov 8 '05 #11

vinjvinj

This can not be done at compile time but can be cought at execution
time on linux by the following recipe:

http://aspn.activestate.com/ASPN/Coo.../Recipe/307871

vinjvinj

Nov 8 '05 #12

Magnus Lycka

vinjvinj wrote:

I have an application which allows multiple users to write models.
These models get distributed on a grid of compute engines. users submit
their models through a web interface. I want to

1. restrict the user from doing any file io, exec, import, eval, etc. I
was thinking of writing a plugin for pylint to do all the checks? Is
this is a good way given that there is no restricted python. What are
the things I should serach for in python code
I'm not sure why you want to prevent e.g. all file io. Let the jobs run
as users with very limited permissions.
2. restrict the amount of memory a module uses as well. For instance
how can I restrict a user from doing a = range(10000000000) or similar
tasks so that my whole compute farm does not come down.

Use Sun Grid Engine. http://gridengine.sunsource.net/documentation.html

Nov 8 '05 #13

Jeremy Sanders

vinjvinj wrote:

2. restrict the amount of memory a module uses as well. For instance
how can I restrict a user from doing a = range(10000000000) or similar
tasks so that my whole compute farm does not come down.

The safest way to do this in unix is to run the model in a separate process,
and use ulimit (or the resource module) to limit the memory usage.

--
Jeremy Sanders
http://www.jeremysanders.net/

Nov 9 '05 #14

vinjvinj

Unfortunately this in not an options since all the processes share
objects in memory which are about 1gig for each node. Having a copy of
this in each user process is just not an options. I think I'm going to
use RestrictedPython from zope3 svn which should take care of 70-80 %
of the problem.

Nov 9 '05 #15

Jeremy Sanders

vinjvinj wrote:

Unfortunately this in not an options since all the processes share
objects in memory which are about 1gig for each node. Having a copy of
this in each user process is just not an options. I think I'm going to
use RestrictedPython from zope3 svn which should take care of 70-80 %
of the problem.

I wonder whether it is possible to fork() the program, restricting the
memory usuage for the forked program. In most unix variants, forked
programs share memory until that memory is written to. Of course this may
not be useful if there's data going back and forth all the time.

--
Jeremy Sanders
http://www.jeremysanders.net/

Nov 10 '05 #16

by: Stumped and Confused | last post by:

Hello, I really, really, need some help here - I've spent hours trying to find a solution. In a nutshell, I'm trying to have a user input a value in form's textfield. The value should then be...

Javascript

PEP 353: Using ssize_t as the index type

by: Martin v. Löwis | last post by:

I've been working on PEP 353 for some time now. Please comment, in particular if you are using 64-bit systems. Regards, Martin PEP: 353 Title: Using ssize_t as the index type Version:...

Python

how exactly do binary files work in python?

by: John Salerno | last post by:

In C#, writing to a binary file wrote the actual data types into the file (integers, etc.). Is this not how Python binary files work? I tried to write integers into a file, but the write method...

Python

Using python for a CAD program

by: 63q2o4i02 | last post by:

Hi, I'm interested in using python to start writing a CAD program for electrical design. I just got done reading Steven Rubin's book, I've used "real" EDA tools, and I have an MSEE, so I know what...

Python

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Using python for writing models: How to run models in restricted python mode?

Similar topics