Sanitizing untrusted code for eval()

Jim Washington

I'm still working on yet another parser for JSON (http://json.org). It's
called minjson, and it's tolerant on input, strict on output, and pretty
fast. The only problem is, it uses eval(). It's important to sanitize the
incoming untrusted code before sending it to eval(). Because eval() is
evil http://blogs.msdn.com/ericlippert/ar.../01/53329.aspx
apparently in every language.

A search for potential trouble with eval() in python turned up the
following.

1. Multiplication and exponentiation, particularly in concert with
strings, can do a DoS on the server, e.g., 'x'*9**99**999**9999

2. lambda can cause mischief, and therefore is right out.

3. Introspection can expose other methods to the untrusted code. e.g.,
{}.__class__.__bases__[0].__subclasses__... can climb around in the
object hierarchy and execute arbitrary methods.

4. List comprehensions might be troublesome, though it's not clear to me
how a DoS or exploit is possible with these. But presuming potential
trouble, 'for' is also right out. It's not in the JSON spec anyway.

So, the above seems to indicate disallowing "*", "__", "lambda", and "for"
anywhere outside a string in the untrusted code. Raise an error before
sending to eval().

I'm using eval() with proper __builtins__ and locals, e.g.,

result =eval(aString,
{"__builtins__":{'True':True,'False':False,'None': None}},
{'null':None,'true':True,'false':False})

I am familiar with this thread:
http://groups-beta.google.com/group/...cc21b95af0d9cc

Does anyone know of any other "gotchas" with eval() I have not found? Or
is eval() simply too evil?

-Jim Washington

Aug 22 '05 #1

Subscribe Post Reply

5450

Benji York

Jim Washington wrote:

I'm still working on yet another parser for JSON (http://json.org).
Hi, Jim.
The only problem is, it uses eval(). It's important to sanitize the
incoming untrusted code before sending it to eval(). Does anyone know of any other "gotchas" with eval() I have not found? Or
is eval() simply too evil?

I'd say that eval is just too evil.

I do wonder if it would be possible to use eval by working from the
other direction. Instead of trying to filter out dangerous things, only
allow a *very* strict set of things in.

For example, since your doing JSON, you don't even need to allow
multiplication. If you only allowed dictionaries with string keys and a
restricted set of types as values, you'd be pretty close. But once
you're at that point you might as well use your own parser and not use
eval at all. <shrug>
--
Benji York

Aug 22 '05 #2

A.M. Kuchling

On Mon, 22 Aug 2005 13:55:45 GMT,
Jim Washington <jw*****@vt.edu> wrote:

I'm still working on yet another parser for JSON (http://json.org).

See http://python.ca/nas/log/200507/index.html#21_001 for another parser. I
don't know if it uses eval() or not, but would bet on "not" because Neil is
pretty security-conscious.

--amk

Aug 22 '05 #3

Diez B. Roggisch

> Does anyone know of any other "gotchas" with eval() I have not found? Or

is eval() simply too evil?

Yes - and from what I can see on the JSON-Page, it should be _way_
easier to simply write a parser your own - that ensures that only you
decide what python code gets called.

Diez
_

Aug 22 '05 #4

Scott David Daniels

Diez B. Roggisch wrote:

Does anyone know of any other "gotchas" with eval() I have not found? Or
is eval() simply too evil?

Yes - and from what I can see on the JSON-Page, it should be _way_
easier to simply write a parser your own - that ensures that only you
decide what python code gets called.

Diez
_

Another thing you can do is use the compile message and then only allow
certain bytecodes. Of course this approach means you need to implement
this in a major version-dependent fashion, but it saves you the work of
mapping source code to python. Eventually there will be another form
available (the AST form), but that will show up no earlier than 2.5.
As a matter of pure practicality, it turns out you can probably use
almost the same code to look at 2.3 and 2.4 byte codes.
--Scott David Daniels
Sc***********@Acm.Org

Aug 22 '05 #5

Fredrik Lundh

Jim Washington wrote:

4. List comprehensions might be troublesome, though it's not clear to me
how a DoS or exploit is possible with these.
see item 1.
Or is eval() simply too evil?

yes.

however, running a tokenizer over the source string and rejecting any string
that contains unknown tokens (i.e. anything that's not a literal, comma,
colon,
or square or curly bracket) before evaluation might be good enough.

(you can use Python's standard tokenizer module, or rip out the relevant
parts
from it and use the RE engine directly)

</F>

Aug 22 '05 #6

Diez B. Roggisch

> Another thing you can do is use the compile message and then only allow

certain bytecodes. Of course this approach means you need to implement
this in a major version-dependent fashion, but it saves you the work of
mapping source code to python. Eventually there will be another form
available (the AST form), but that will show up no earlier than 2.5.
As a matter of pure practicality, it turns out you can probably use
almost the same code to look at 2.3 and 2.4 byte codes.

I don't know much about python byte code, but from the JASON-HP - which
features the grammar for JASON on the first page - I'm under the strong
impression that abusing the python parser by whatever means, including
the byte-code ahck you propse, is way more complicated than writing a
small parser - I don't know pyparsing, but I know spark, and it would be
a matter of 30 lines of code. And 100% no loopholes...

Additionally, having a parser allows you to spit out meaningful errors -
whilst mapping byte code back to input lines is certainly not easy, if
feasible at all.

Regards,

Diez

Aug 22 '05 #7

Jim Washington

On Mon, 22 Aug 2005 22:12:25 +0200, Fredrik Lundh wrote:

however, running a tokenizer over the source string and rejecting any string
that contains unknown tokens (i.e. anything that's not a literal, comma,
colon,
or square or curly bracket) before evaluation might be good enough.

(you can use Python's standard tokenizer module, or rip out the relevant
parts
from it and use the RE engine directly)

This seems like the right compromise, and not too difficult.
OOTB, tokenize burns a couple of additional milliseconds per read,
but maybe I can start there and optimize, as you say, and be a bit more
sure that python's parser is not abused into submission.

BTW, this afternoon I sent a couple of hours of random junk to eval()
just to see what would be accepted.

I did not know before that

5|3 = 7
6^3 = 5
~6 = -7
()and aslfsdf = ()

Amusing stuff.

Thanks!

-Jim Washington

Aug 23 '05 #8

Paul McGuire

Here's the pyparsing rendition - about 24 lines of code, and another 30
for testing.
For reference, here's the JSON "bnf":

object
{ members }
{}
members
string : value
members , string : value
array
[ elements ]
[]
elements
value
elements , value
value
string
number
object
array
true
false
null

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

from pyparsing import *

TRUE = Keyword("true")
FALSE = Keyword("false")
NULL = Keyword("null")

jsonString = dblQuotedString.setParseAction( removeQuotes )
jsonNumber = Combine( Optional('-') + ( '0' | Word('123456789',nums) )
+
Optional( '.' + Word(nums) ) +
Optional( Word('eE',exact=1) + Word(nums+'+-',nums)
) )

jsonObject = Forward()
jsonValue = Forward()
jsonElements = delimitedList( jsonValue )
jsonArray = Group( Suppress('[') + jsonElements + Suppress(']') )
jsonValue << ( jsonString | jsonNumber | jsonObject | jsonArray | TRUE
| FALSE | NULL )
memberDef = Group( jsonString + Suppress(':') + jsonValue )
jsonMembers = delimitedList( memberDef )
jsonObject << Dict( Suppress('{') + jsonMembers + Suppress('}') )

lineComment = '//' + restOfLine
jsonComment = FollowedBy('/') + ( cStyleComment | lineComment )
jsonObject.ignore( jsonComment )

testdata = """
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": [{
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef":
"A meta-markup language, used to create markup languages such as
DocBook.",
"GlossSeeAlso": ["GML", "XML", "markup"]
}]
}
}
}
"""

results = jsonObject.parseString(testdata)

import pprint
pprint.pprint( results.asList() )
print results.glossary.title
print results.glossary.GlossDiv
print results.glossary.GlossDiv.GlossList.keys()

Prints out (I've inserted blank lines to separate the output from the
different print statements):
[['glossary',
['title', 'example glossary'],
['GlossDiv',
['title', 'S'],
['GlossList',
[['ID', 'SGML'],
['SortAs', 'SGML'],
['GlossTerm', 'Standard Generalized Markup Language'],
['Acronym', 'SGML'],
['Abbrev', 'ISO 8879:1986'],
['GlossDef',
'A meta-markup language, used to create markup languages such as
DocBook.'],
['GlossSeeAlso', ['GML', 'XML', 'markup']]]]]]]

example glossary

[['title', 'S'], ['GlossList', [['ID', 'SGML'], ['SortAs', 'SGML'],
['GlossTerm', 'Standard Generalized Markup Language'], ['Acronym',
'SGML'], ['Abbrev', 'ISO 8879:1986'], ['GlossDef', 'A meta-markup
language, used to create markup languages such as DocBook.'],
['GlossSeeAlso', ['GML', 'XML', 'markup']]]]]

['GlossSeeAlso', 'GlossDef', 'Acronym', 'GlossTerm', 'SortAs',
'Abbrev', 'ID']

Aug 23 '05 #9

Alan Kennedy

[Jim Washington]

I'm still working on yet another parser for JSON (http://json.org). It's
called minjson, and it's tolerant on input, strict on output, and pretty
fast. The only problem is, it uses eval(). It's important to sanitize the
incoming untrusted code before sending it to eval().

I think that you shouldn't need eval to parse JSON.

For a discussion of the use of eval in pyjsonrpc, between me and the
author, Jan-Klaas Kollhof, see the content of the following links. A
discussion of the relative time *in*efficiency of eval is also included:
it is much faster to use built-in functions such str and float to
convert from JSON text/tokens to strings and numbers.

http://mail.python.org/pipermail/pyt...ry/265805.html
http://groups.yahoo.com/group/json-rpc/message/55

Pyjsonrpc uses the python tokeniser to split up JSON strings, which
means that you cannot be strict about things like double (") vs. single
(') quotes, etc.

JSON is so simple, I think it best to write a tokeniser and parser for
it, either using a parsing library, or just coding your own.

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan

Aug 23 '05 #10

by: manstey | last post by:

Hi, I have a text file called a.txt: # comments I read it using this:

Python

Code Access Security - Assert problem

by: alunharford | last post by:

I'm writing an application that is trusted, but I want it to run some untrusted code, and I don't understand how I do that. I'm including an example. I want to trust my class, TrustedClass, to do...

C# / C Sharp

Executing Untrusted Code

by: Ben | last post by:

Hello, I've been developing apps in Delphi for years and have just started writing my first big project in c# + ms .net and have some questions about security and untrusted code. I've got an...

.NET Framework

an eval()-like exec()

by: Abel Daniel | last post by:

Hi! A python interactive interpreter works by having the user type in some code, compiling and running that code, then printing the results. For printing, the results are turned into strings. ...

Python

onclick="toggleDisplay('<%# Eval("description")%>

by: thaytu888888 | last post by:

Here is my codes in aspx page: <td colspan="2" class="main_menu" runat="server" onclick='toggleDisplay(<%#Eval("description")%>);'><%#Eval("description")%></td> Here is in "View source": ...

.NET Framework

Hard to understand 'eval'

by: TheSaint | last post by:

Hi, It seems to be strange that give me syntax error inside an eval statement. I'm looking at it carefully but I can't see any flaw. Here it's part of the code: for nn in stn_items: value=...

Python

Is a closure's scope accessible by untrusted code?

by: Andrey Fedorov | last post by:

Is the scope of a closure accessible after it's been created? Is it safe against XSS to use closures to store "private" auth tokens? In particular, in... ....can untrusted code access...

Javascript

Restricted Execution of untrusted code

by: Emanuele D'Arrigo | last post by:

I noticed that this issue has been discussed in this newsgroup periodically over the years and I seem to understand that - comprehensive- safe/restricted execution of untrusted code in python is...

Python

Using eval, or something like it...

by: r0g | last post by:

Hi There, I know you can use eval to dynamically generate the name of a function you may want to call. Can it (or some equivalent method) also be used to do the same thing for the variables of a...

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Sanitizing untrusted code for eval()

Similar topics