is there a safe marshaler?

Irmen de Jong

Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.
I need a fast and safe (secure) marshaler.
Is xdrlib the only option?
I would expect that it is fast and safe because
it (the xdr spec) has been around for so long.

Or are there better options (perhaps 3rd party libraries)?

Thanks

Irmen.

Jul 18 '05 #1

Subscribe Reply

2540

Pierre Barbier de Reuille

Irmen de Jong a écrit :

Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.
I need a fast and safe (secure) marshaler.
Is xdrlib the only option?
I would expect that it is fast and safe because
it (the xdr spec) has been around for so long.

Or are there better options (perhaps 3rd party libraries)?

Thanks

Irmen.

What exactly do you mean by "safe" ? Do you want to ensure your objects
cannot receive corrupted data ? Do you want to ensure no code will be
evaluated during the unmarshalling ?

Please, be more precise,

Pierre

Jul 18 '05 #2

guido

Irmen de Jong wrote:

Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.
I think marshal could be fixed; the only unsafety I'm aware of is that
it doesn't always act rationally when confronted with incorrect input
like bad type codes or truncated input. It only receives instances of
the built-in types and it never executes user code as a result of
unmarshalling.

Perhaps someone would be interested in submitting a patch to the
unmarshalling code? Since this is a security fix we'd even accept a fix
for 2.3.
I need a fast and safe (secure) marshaler.
Is xdrlib the only option?
I would expect that it is fast and safe because
it (the xdr spec) has been around for so long.

I don't expect that to be particularly fast, since it mostly operates
at Python speed. I think it could be safe but I would still do a
thorough code review if I were you -- the code is older than my
awareness of the vulnerabilities inherent in this kind of remote data
transfer.

--Guido

Jul 18 '05 #3

Irmen de Jong

Pierre Barbier de Reuille wrote:

Irmen de Jong a écrit :
Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.
I need a fast and safe (secure) marshaler.
Is xdrlib the only option?
I would expect that it is fast and safe because
it (the xdr spec) has been around for so long.

Or are there better options (perhaps 3rd party libraries)?

Thanks

Irmen.

What exactly do you mean by "safe" ? Do you want to ensure your objects
cannot receive corrupted data ? Do you want to ensure no code will be
evaluated during the unmarshalling ?

"safe (secure)"
But to be more precise, let's look at the security warning that
is in the marshal documentation:
"The marshal module is not intended to be secure against erroneous or
maliciously constructed data. Never unmarshal data received from an
untrusted or unauthenticated source."

So essentially I want the opposite of that ;-)

I want a marshalar that is okay to use where the data it processes
comes from unknown, external sources (untrusted). It should not crash
on corrupt data and it should not execute arbitrary code when
unmarshaling, so that it is safe against hacking attempts.

Oh, preferrably, it should be fast :)
Some XML-ish thing may be secure but is likely to be not fast at all.

Ideally it should be able to transfer user defined Python types,
but if it is like marshal (can only marshal builtin types) that's
okay too.

--Irmen

Jul 18 '05 #4

Irmen de Jong

Hello Guido

gu***@python.org wrote:

Irmen de Jong wrote:
Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.

I think marshal could be fixed; the only unsafety I'm aware of is that
it doesn't always act rationally when confronted with incorrect input
like bad type codes or truncated input. It only receives instances of
the built-in types and it never executes user code as a result of
unmarshalling.

So it is not vulnerable in the way that pickle is? That's a start.
The security warning in the marsal doc then makes it sound worse than
it is...
Perhaps someone would be interested in submitting a patch to the
unmarshalling code? Since this is a security fix we'd even accept a fix
for 2.3.
That would be nice indeed :)

I need a fast and safe (secure) marshaler.
Is xdrlib the only option?
I would expect that it is fast and safe because
it (the xdr spec) has been around for so long.

I don't expect that to be particularly fast, since it mostly operates
at Python speed.

Ah, I wasn't aware that xdrlib was implemented in Python :)
I thought it used a (standard?) C-implementation.
But I now see that it's a Python module (utilizing struct).
I think it could be safe but I would still do a
thorough code review if I were you -- the code is older than my
awareness of the vulnerabilities inherent in this kind of remote data
transfer.

Thanks for the warning.

--Irmen de Jong

Jul 18 '05 #5

On Feb 10, 2005, at 15:01, Irmen de Jong wrote:

Is xdrlib the only option?
I would expect that it is fast and safe because
it (the xdr spec) has been around for so long.

XDR? Like Sun's "XDR: External Data Representation standard"?

http://www.faqs.org/rfcs/rfc1014.html
http://www.faqs.org/rfcs/rfc1832.html

How does XDR copes with Unicode these days?

Alternatively, perhaps there is a ASN.1 DER library in python?

http://asn1.elibel.tm.fr/en/standards/index.htm

Cheers

--
PA, Onnay Equitursay
http://alt.textdrive.com/

Jul 18 '05 #6

Alan Kennedy

[Irmen de Jong]

Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.
I need a fast and safe (secure) marshaler.

Hi Irmen,

I'm not necessarily proposing a solution to your problem, but am
interested in your requirement. Is this for pyro?

In the light of pyro, would something JSON be suitable for your need? I
only came across it a week ago (when someone else posted about it here
on c.l.py), and am intrigued by it.

http://json.org

What I find particularly intriguing is the JSON-RPC protocol, which
looks like a nice lightweight alternative to XML-RPC.

http://oss.metaparadigm.com/jsonrpc/

Also interesting is the browser embeddable JSON-RPC client written in
javascript, for which you can see a demo here

http://oss.metaparadigm.com/jsonrpc/demos.html

I thought you might be interested.

regards,

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan

Jul 18 '05 #7

Alan Kennedy

[Alan Kennedy]

What I find particularly intriguing is the JSON-RPC protocol, which
looks like a nice lightweight alternative to XML-RPC.

http://oss.metaparadigm.com/jsonrpc/

Also interesting is the browser embeddable JSON-RPC client written in
javascript, for which you can see a demo here

http://oss.metaparadigm.com/jsonrpc/demos.html

I should have mentioned as well that there is a python JSON-RPC server
implementation, which incudes a complete JSON<-->python-objects codec.

http://www.json-rpc.org/pyjsonrpc/index.xhtml

regards,

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan

Jul 18 '05 #8

Irmen de Jong

PA wrote:

On Feb 10, 2005, at 15:01, Irmen de Jong wrote:
Is xdrlib the only option?
I would expect that it is fast and safe because
it (the xdr spec) has been around for so long.

XDR? Like Sun's "XDR: External Data Representation standard"?

http://www.faqs.org/rfcs/rfc1014.html
http://www.faqs.org/rfcs/rfc1832.html

Not "like", but "the".
Or at least, a subset. (the xdrlib module documentation says
"It supports most of the data types described in the RFC").

How does XDR copes with Unicode these days?
Not directly, it seems that you have to encode
your unicode strings yourself first .

Alternatively, perhaps there is a ASN.1 DER library in python?

http://asn1.elibel.tm.fr/en/standards/index.htm

I don't know. Is there?
PS the xdr format is not self-describing in the way that
marshal and pickle streams are. That is a big limitiation
for what I need it for so xdr seems to drop off my radar.
Is an ASN.1 stream self-describing?

--Irmen

Jul 18 '05 #9

Irmen de Jong

Alan Kennedy wrote:

[Irmen de Jong]
Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.
I need a fast and safe (secure) marshaler.

Hi Irmen,

I'm not necessarily proposing a solution to your problem, but am
interested in your requirement. Is this for pyro?

Yes and No.
Yes, I'm investigating possible marshaling alternatives
(others than pickle which Pyro uses right now).
No, I'm not changing Pyro yet. It's just that I want to
investigate possible *secure* alternatives to the current
implementation.
(Note that a secure version would also mean that Pyro's
advanced features such as mobile code should go the way
of the dodo, and I don't want to do this yet).
In the light of pyro, would something JSON be suitable for your need? I
only came across it a week ago (when someone else posted about it here
on c.l.py), and am intrigued by it.

http://json.org
Looks very interesting indeed, but in what way would this be
more secure than say, pickle or marshal?
A quick glance at some docs reveal that they are using eval
to process the data... ouch.

I thought you might be interested.

I certainly am but for different reasons.

--Irmen

Jul 18 '05 #10

On Feb 10, 2005, at 22:21, Irmen de Jong wrote:

PS the xdr format is not self-describing in the way that
marshal and pickle streams are. That is a big limitiation
for what I need it for so xdr seems to drop off my radar.
Is an ASN.1 stream self-describing?

Not sure how much "self-describing" you want it to be, but, yes it can
be as formal as you want it to be...

"... Abstract Syntax Notation One (ASN.1) is a formal language for
abstractly describing messages... "

Sorry if this is off-topic, I didn't follow the thread from the very
beginning, but wouldn't something like YAML work for you perhaps?

http://yaml.org/

Or even something more, er, exotic:

https://alt.textdrive.com/pl/

Cheers

--
PA, Onnay Equitursay
http://alt.textdrive.com/

Jul 18 '05 #11

Irmen de Jong

PA wrote:

Sorry if this is off-topic, I didn't follow the thread from the very
beginning, but wouldn't something like YAML work for you perhaps?

http://yaml.org/
Perhaps, but the spec makes my skin crawl.
Also, it seems ill-fit for efficient machine-to-machine
communication (yaml seems to be designed to be easily (?) read/edited
by humans, a thing which I don't require at all).
https://alt.textdrive.com/pl/

Naah.

--Irmen

Jul 18 '05 #12

On Feb 10, 2005, at 22:55, Irmen de Jong wrote:

Also, it seems ill-fit for efficient machine-to-machine
communication...

Well, then, if you are looking for industrial strength quality, ASN.1
is the way to go. After all, a good chunk of the telecom infrastructure
is using it.

Cheers

--
PA, Onnay Equitursay
http://alt.textdrive.com/

Jul 18 '05 #13

On Feb 10, 2005, at 22:55, Irmen de Jong wrote:

Perhaps, but the spec makes my skin crawl.

Perhaps I could interest you in JSON then:

"It is easy for humans to read and write. It is easy for machines to
parse and generate. "

http://www.crockford.com/JSON/index.html

Cheers

--
PA, Onnay Equitursay
http://alt.textdrive.com/

Jul 18 '05 #14

Alan Kennedy

[Irmen de Jong]

I need a fast and safe (secure) marshaler.

[Alan Kennedy] ...., would something JSON be suitable for your need?

http://json.org

[Irmen de Jong] Looks very interesting indeed, but in what way would this be
more secure than say, pickle or marshal?
A quick glance at some docs reveal that they are using eval
to process the data... ouch.

Well, the python JSON codec provided appears to use eval, which might
make it *seem* unsecure.

http://www.json-rpc.org/pyjsonrpc/index.xhtml

But a more detailed examination of the code indicates, to this reader at
least, that it can be made completely secure very easily. The designer
of the code could very easily have not used eval, and possibly didn't do
so simply because he wasn't thinking in security terms.

The codec uses tokenize.generate_tokens to split up the JSON string into
tokens to be interpreted as python objects. tokenize.generate_tokens
generates a series of textual name/value pairs, so nothing insecure
there: the content of the token/strings is not executed.

Each of the tokens is then passed to a "parseValue" function, which is
defined thusly:

#===================

def parseValue(self, tkns):
(ttype, tstr, ps, pe, lne) = tkns.next()
if ttype in [token.STRING, token.NUMBER]:
return eval(tstr)
elif ttype == token.NAME:
return self.parseName(tstr)
elif ttype == token.OP:
if tstr == "-":
return - self.parseValue(tkns)
elif tstr == "[":
return self.parseArray(tkns)
elif tstr == "{":
return self.parseObj(tkns)
elif tstr in ["}", "]"]:
return EndOfSeq
elif tstr == ",":
return SeqSep
else:
raise "expected '[' or '{' but found: '%s'" % tstr
else:
return EmptyValue

#===================

As you can see, eval is *only* called when the next token in the stream
is either a string or a number, so it's really just a very simple code
shortcut to get a value from a string or number.

If one defined the function like this (not tested!), to remove the eval,
I think it should be safe.

#===================

default_number_type = float
#default_number_type = int

def parseValue(self, tkns):
(ttype, tstr, ps, pe, lne) = tkns.next()
if ttype in [token.STRING]:
return tstr
if ttype in [token.NUMBER]:
return default_number_type(tstr)
elif ttype == token.NAME:
return self.parseName(tstr)
elif ttype == token.OP:
if tstr == "-":
return - self.parseValue(tkns)
elif tstr == "[":
return self.parseArray(tkns)
elif tstr == "{":
return self.parseObj(tkns)
elif tstr in ["}", "]"]:
return EndOfSeq
elif tstr == ",":
return SeqSep
else:
raise "expected '[' or '{' but found: '%s'" % tstr
else:
return EmptyValue

#===================

The only other use of eval is also only for string types, i.e. in the
parseObj function:

#===================
def parseObj(self, tkns):
obj = {}
nme =""
try:
while 1:
(ttype, tstr, ps, pe, lne) = tkns.next()
if ttype == token.STRING:
nme = eval(tstr)
(ttype, tstr, ps, pe, lne) = tkns.next()
if tstr == ":":
v = self.parseValue(tkns)
# Remainder of this function elided
#===================

Which could similarly be replaced with direct use of the string itself,
rather than eval'ing it. (Although one might want to look at encoding
issues: I haven't looked at JSON-RPC enough to know how it proposes to
handle string encodings.)

So I don't think there any serious security issues here: the
"simplicity" of the JSON grammar is what attracted me to it in the first
place, especially since there are already robust and efficient lexers
and parsers already available built-in to python and javascript (and
javascript interpreters are getting pretty ubiquitous these days).

And it's certainly the case that if the only available python impl of
JSON/RPC is not secure, it is possible to write one that is both
efficient and secure.

Hopefully there isn't some glaring security hole that I've missed:
doubtless I'll find out real soon ;-) Gotta love full disclosure.

regards,

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan

Jul 18 '05 #15

Irmen de Jong

Hi Alan

Alan Kennedy wrote:

Well, the python JSON codec provided appears to use eval, which might
make it *seem* unsecure.

http://www.json-rpc.org/pyjsonrpc/index.xhtml

But a more detailed examination of the code indicates, to this reader at
least, that it can be made completely secure very easily. The designer
of the code could very easily have not used eval, and possibly didn't do
so simply because he wasn't thinking in security terms. [...]

Very interesting indeed.
So I don't think there any serious security issues here: the
"simplicity" of the JSON grammar is what attracted me to it in the first
place, especially since there are already robust and efficient lexers
and parsers already available built-in to python and javascript (and
javascript interpreters are getting pretty ubiquitous these days).
The cross-platform/language aspect is quite nice indeed.
And it's certainly the case that if the only available python impl of
JSON/RPC is not secure, it is possible to write one that is both
efficient and secure.

I think we (?) should do this then, and send it to the author
of the original version so that he can make an improved version
available? I think there are more people interested in a secure
marshaling implementation than just me :)
I'll still have to look at Twisted's Jelly.
Thanks for your analysis,
--Irmen

Jul 18 '05 #16

Alan Kennedy

[Alan Kennedy]

Well, the python JSON codec provided appears to use eval, which might
make it *seem* unsecure.

http://www.json-rpc.org/pyjsonrpc/index.xhtml

But a more detailed examination of the code indicates, to this reader
at least, that it can be made completely secure very easily. The
designer of the code could very easily have not used eval, and
possibly didn't do so simply because he wasn't thinking in security
terms.

[Irmen de Jong] I think we (?) should do this then, and send it to the author
of the original version so that he can make an improved version
available? I think there are more people interested in a secure
marshaling implementation than just me :)
I should learn to keep my mouth zipped :-L

OK, I really don't have time for a detailed examination of either the
JSON spec or the python impl of same. And I *definitely* don't have time
for a detailed security audit, much though I'd love to.

But I'll try to help: the code changes are really very simple. So I've
edited the single affected file, json.py, and here's a patch: But be
warned that I haven't even run this code!

Index: json.py
================================================== =================
--- json.py (revision 2)
+++ json.py (working copy)
@@ -66,8 +66,10 @@

def parseValue(self, tkns):
(ttype, tstr, ps, pe, lne) = tkns.next()
- if ttype in [token.STRING, token.NUMBER]:
- return eval(tstr)
+ if ttype == token.STRING:
+ return unicode(tstr)
+ if ttype == token.NUMBER:
+ return float(tstr)
elif ttype == token.NAME:
return self.parseName(tstr)
elif ttype == token.OP:
@@ -110,7 +112,12 @@
while 1:
(ttype, tstr, ps, pe, lne) = tkns.next()
if ttype == token.STRING:
- nme = eval(tstr)
+ possible_ident = unicode(tstr)
+ try:
+ # Python identifiers have to be ascii
+ nme = possible_ident.encode('ascii')
+ except UnicodeEncodeError:
+ raise "Non-ascii identifier"
(ttype, tstr, ps, pe, lne) = tkns.next()
if tstr == ":":
v = self.parseValue(tkns)

I'll leave contacting the author to you, if you wish.
I'll still have to look at Twisted's Jelly.

Hmmm, s-expressions, interesting. But you'd have to write your own
s-expression parser and jelly RPC client to get up and running in other
languages.

regards,

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan

Jul 18 '05 #17

cmkl

Irmen de Jong <ir**********@xs4all.nl> wrote in message news:<42***********************@news.xs4all.nl>...

Pierre Barbier de Reuille wrote:
Irmen de Jong a écrit :
Pickle and marshal are not safe. They can do harmful
things if fed maliciously constructed data.
That is a pity, because marshal is fast.
I need a fast and safe (secure) marshaler.
Is xdrlib the only option?
I would expect that it is fast and safe because
it (the xdr spec) has been around for so long.

Or are there better options (perhaps 3rd party libraries)?

Thanks

Irmen.

What exactly do you mean by "safe" ? Do you want to ensure your objects
cannot receive corrupted data ? Do you want to ensure no code will be
evaluated during the unmarshalling ?

"safe (secure)"
But to be more precise, let's look at the security warning that
is in the marshal documentation:
"The marshal module is not intended to be secure against erroneous or
maliciously constructed data. Never unmarshal data received from an
untrusted or unauthenticated source."

So essentially I want the opposite of that ;-)

I want a marshalar that is okay to use where the data it processes
comes from unknown, external sources (untrusted). It should not crash
on corrupt data and it should not execute arbitrary code when
unmarshaling, so that it is safe against hacking attempts.

Oh, preferrably, it should be fast :)
Some XML-ish thing may be secure but is likely to be not fast at all.

Ideally it should be able to transfer user defined Python types,
but if it is like marshal (can only marshal builtin types) that's
okay too.

--Irmen

I'm just curious,

but can't effbot's fast cElementree be used for PYROs XML_PICKLE
and would it be safe and fast enough?

Carl

Jul 18 '05 #18

Irmen de Jong

cmkl wrote:

but can't effbot's fast cElementree be used for PYROs XML_PICKLE
and would it be safe and fast enough?

ElementTree's not a marshaler.
Or has it object (de)serialization included?

--Irmen

Jul 18 '05 #19

Skip Montanaro

Carl> but can't effbot's fast cElementree be used for PYROs XML_PICKLE
Carl> and would it be safe and fast enough?

It's not clear to me that if marshal is unsafe how XML could be safe. In
this context they are both just serializations of basic Python data
structures.

Skip

Jul 18 '05 #20

Aahz

In article <ma***************************************@python. org>,
Skip Montanaro <sk**@pobox.com> wrote:

Carl> but can't effbot's fast cElementree be used for PYROs XML_PICKLE
Carl> and would it be safe and fast enough?

It's not clear to me that if marshal is unsafe how XML could be safe. In
this context they are both just serializations of basic Python data
structures.

The difference is that parsing XML -- even badly malformed -- won't
crash Python.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR

Jul 18 '05 #21

Fredrik Lundh

Irmen de Jong wrote:

but can't effbot's fast cElementree be used for PYROs XML_PICKLE and would it be safe and fast
enough?

ElementTree's not a marshaler.
Or has it object (de)serialization included?

nope. building a serialization layer on top of it is pretty trivial, and the result
is pretty fast, but nowhere close to C speed.

</F>

Jul 18 '05 #22

Fredrik Lundh

(repost; gmane seems to have eaten my original post)

Aahz wrote:

It's not clear to me that if marshal is unsafe how XML could be safe. In
this context they are both just serializations of basic Python data
structures.

The difference is that parsing XML -- even badly malformed -- won't
crash Python.

optimist.

import os
os.path.getsize("BL.xml") 1302 from xml.dom import minidom
x = minidom.parse("BL.xml")

(have patience. have lots of patience.)

</F>

Jul 18 '05 #23

Irmen de Jong

Fredrik Lundh wrote:

import os
os.path.getsize("BL.xml")
1302
from xml.dom import minidom
x = minidom.parse("BL.xml")

(have patience. have lots of patience.)

Hehe, the XML killer file "BillionLaughs"... correct?

--Irmen

Jul 18 '05 #24

Irmen de Jong

Alan Kennedy wrote:

I should learn to keep my mouth zipped :-L
:-D
OK, I really don't have time for a detailed examination of either the
JSON spec or the python impl of same. And I *definitely* don't have time
for a detailed security audit, much though I'd love to.

No problem. The patch you wrote is a very good start, I think!!

Interestingly enough, I just ran across "Flatten":
http://sourceforge.net/project/showf...ckage_id=91311

"...which aids in serializing/unserializing networked data securely,
without having to fear execution of code or the like."

Sounds promising!
--Irmen

Jul 18 '05 #25

Fredrik Lundh

Aahz wrote:

It's not clear to me that if marshal is unsafe how XML could be safe. In
this context they are both just serializations of basic Python data
structures.

The difference is that parsing XML -- even badly malformed -- won't
crash Python.

optimist.

import os
os.path.getsize("BL.xml") 1302 from xml.dom import minidom
x = minidom.parse("BL.xml")

(have patience. have lots of patience.)

</F>

Jul 18 '05 #26

Fredrik Lundh

(repost; gmane seems to have eaten my original post)

Irmen de Jong wrote:

I think marshal could be fixed; the only unsafety I'm aware of is that
it doesn't always act rationally when confronted with incorrect input
like bad type codes or truncated input. It only receives instances of
the built-in types and it never executes user code as a result of
unmarshalling.

So it is not vulnerable in the way that pickle is? That's a start.
The security warning in the marsal doc then makes it sound worse than
it is...

the problem is that the following may or may not reach the "done!" statement,
somewhat depending on python version, memory allocator, and what data you
pass to dumps.

import marshal

data = marshal.dumps((1, 2, 3, "hello", 4, 5, 6))

for i in range(len(data), -1, -1):
try:
print marshal.loads(data[:i])
except EOFError:
print "EOFError"
except ValueError:
print "ValueError"

print "done!"

(try different data combinations, to see how far you get on your platform...)

fixing this should be relatively easy, and should result in a safe unmarshaller (your
application will still have to limit the amount of data fed into load/loads, of course).

</F>

Jul 18 '05 #27

Irmen de Jong

Fredrik Lundh wrote:

the problem is that the following may or may not reach the "done!" statement,
somewhat depending on python version, memory allocator, and what data you
pass to dumps.

import marshal

data = marshal.dumps((1, 2, 3, "hello", 4, 5, 6))

for i in range(len(data), -1, -1):
try:
print marshal.loads(data[:i])
except EOFError:
print "EOFError"
except ValueError:
print "ValueError"

print "done!"

(try different data combinations, to see how far you get on your platform...)
Python 2.4 on my windows box crashes with
Fatal Python error: PyString_InternInPlace: strings only please!

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
c:\> _
So indeed it seems that marshal is not safe yet :-|

fixing this should be relatively easy, and should result in a safe unmarshaller (your
application will still have to limit the amount of data fed into load/loads, of course).

Okay.

--Irmen

Jul 18 '05 #28

Alan Kennedy

[Irmen de Jong]

Interestingly enough, I just ran across "Flatten":
http://sourceforge.net/project/showf...ckage_id=91311

"...which aids in serializing/unserializing networked data securely,
without having to fear execution of code or the like."

Sounds promising!

Well, I'm always dubious of OSS projects that don't even have any bugs
reported, let alone fixed: no patches submitted, etc, etc.

http://sourceforge.net/tracker/?group_id=82591

Though maybe I'm missing something obvious?

--
alan kennedy
------------------------------------------------------
email alan: http://xhaus.com/contact/alan

Jul 18 '05 #29

Irmen de Jong

Alan Kennedy wrote:

[Irmen de Jong]
Interestingly enough, I just ran across "Flatten":
http://sourceforge.net/project/showf...ckage_id=91311

"...which aids in serializing/unserializing networked data securely,
without having to fear execution of code or the like."

Sounds promising!

Well, I'm always dubious of OSS projects that don't even have any bugs
reported, let alone fixed: no patches submitted, etc, etc.

http://sourceforge.net/tracker/?group_id=82591

Though maybe I'm missing something obvious?

Perhaps the SF trackers are simply not used for that project?
Consider my own project:
http://sourceforge.net/tracker/?group_id=18837
I can assure you that I have fixed and applied a huge
amount of bugs and patches during the lifetime of the project.
They are just not entered in the trackers, except for a few.

--Irmen

Jul 18 '05 #30

Paul Rubin

"gu***@python.org" <gv********@gmail.com> writes:

Pickle and marshal are not safe. They can do harmful things if fed
maliciously constructed data. That is a pity, because marshal is fast.

I think marshal could be fixed; the only unsafety I'm aware of is that
it doesn't always act rationally when confronted with incorrect input
like bad type codes or truncated input. It only receives instances of
the built-in types and it never executes user code as a result of
unmarshalling.

There's another issue with marshal that makes it unsuitable for Pyro,
which is that its data format is (for legitimate reasons) not
guaranteed to be the same across different Python releases. That
means that if the two ends of the Pyro application aren't using the
same Python version, they might not be able to interoperate.

I don't remember if marshal strings contain a version number. If they
do, then the non-interoperating versions can notice the
incompatibility and raise an appropriate error. If they don't, then
undefined behavior and possible security holes could result, unless
Pyro takes special measures to notice the possibility.

See SF bugs #467384 and #471893 for some further discussion.

Jul 18 '05 #31

Irmen de Jong

Paul Rubin wrote:

There's another issue with marshal that makes it unsuitable for Pyro,
which is that its data format is (for legitimate reasons) not
guaranteed to be the same across different Python releases. That
means that if the two ends of the Pyro application aren't using the
same Python version, they might not be able to interoperate.

Paul, the default serialization protocol that Pyro uses is pickle
(with the highest available protocol number). So there is a risk
already that it doesn't interoperate with older Python versions,
unless you configure the max pickle protocol or switch to using
one of the supported XML serializations.
For mobile code, Pyro relies on the transfer of the actual
bytecode and this won't work at all no matter what if you use
different Python versions. Unless the bytecode happens to be
the same (consider yourself lucky).

--Irmen

Jul 18 '05 #32

Paul Rubin

Irmen de Jong <ir**********@xs4all.nl> writes:

There's another issue with marshal that makes it unsuitable for Pyro,
which is that its data format is (for legitimate reasons) not
guaranteed to be the same across different Python releases. That
means that if the two ends of the Pyro application aren't using the
same Python version, they might not be able to interoperate.

Paul, the default serialization protocol that Pyro uses is pickle
(with the highest available protocol number). So there is a risk
already that it doesn't interoperate with older Python versions,
unless you configure the max pickle protocol or switch to using
one of the supported XML serializations.

Yes, however, you can at least set the protocol level. Marshal doesn't
give you that option.

What do you do about the security issue if you're using pickle? Do
you have to trust the other end to not send you malicious pickles?

Jul 18 '05 #33

Irmen de Jong

Paul Rubin wrote:

Yes, however, you can at least set the protocol level. Marshal doesn't
give you that option.
That's right. So good for Pyro then :)
It works most of the time, even across different Python versions,
unless using mobile code.
What do you do about the security issue if you're using pickle? Do
you have to trust the other end to not send you malicious pickles?

I do nothing about it.
Yes, you have to trust the other end.
So you have to use your own -or Pyro's- authentication/authorization
logic to make sure that the other end can be trusted.
You could use SSL with certificates for instance.

In fact, this is the reason why I started this thread.
I wanted to discover some possibilities to replace pickle
by another thing, so that Pyro becomes 'safe' at the wire
protocol level.
But further discussion on the Pyro mailing list sort of
made it clear that this is not desirable.

--Irmen

Jul 18 '05 #34

Paul Rubin

Irmen de Jong <ir**********@xs4all.nl> writes:

What do you do about the security issue if you're using pickle? Do
you have to trust the other end to not send you malicious pickles?
I do nothing about it.
Yes, you have to trust the other end.
So you have to use your own -or Pyro's- authentication/authorization
logic to make sure that the other end can be trusted.
You could use SSL with certificates for instance.

Well, ok, if you trust then other end then I think it's enough to just
authenticate all the pickles (say using hmac.py) without needing
something as heavyweight as SSL. If you use SSL you need something
like m2crypto since the SSL option in the socket module doesn't check
certificates, IIRC.
In fact, this is the reason why I started this thread.
I wanted to discover some possibilities to replace pickle
by another thing, so that Pyro becomes 'safe' at the wire
protocol level.
But further discussion on the Pyro mailing list sort of
made it clear that this is not desirable.

Why do you say it's not desirable? Don't competing protocols like RMI
try to stay safe from malicious peers? Why should I not want to
expose a Pyro service to the internet? It's a natural thing to want
to do.

Jul 18 '05 #35

Irmen de Jong

> Well, ok, if you trust then other end then I think it's enough to just

authenticate all the pickles (say using hmac.py) without needing
something as heavyweight as SSL.
An interesting idea that hadn't crossed my mind yet.
Pyro *does* already have connection authentication that uses md5
(and hmac since 3.5beta) with a shared secret, but after that,
the communication is done in plaintext so to speak.
If you use SSL you need something
like m2crypto since the SSL option in the socket module doesn't check
certificates, IIRC.

I'm using m2crypto for this kind of SSL, yes.
(sadly it has a bug in its API that is triggerd by the current
Pyro version on some platforms like Linux).

In fact, this is the reason why I started this thread.
I wanted to discover some possibilities to replace pickle
by another thing, so that Pyro becomes 'safe' at the wire
protocol level.
But further discussion on the Pyro mailing list sort of
made it clear that this is not desirable.

Why do you say it's not desirable? Don't competing protocols like RMI
try to stay safe from malicious peers? Why should I not want to
expose a Pyro service to the internet? It's a natural thing to want
to do.

You should not want to expose a Pyro service to the internet because
Python doesn't have Java's security model and sandboxing, that are
used with RMI. Pyro has a few features that are very powerful
but also require the use of intrinsic insecure Python code (namely,
pickle, and marshal).
Just look at the recent security advisory about the XMLRPC server
that comes with Python.... it's much more primitive than Pyro is,
but even that one was insecure.

I wouldn't put a Java RMI server or xyz CORBA server or whatever
kind of unrestricted API open on the internet anyway.
Am I rational or paranoid?

--Irmen

Jul 18 '05 #36

Paul Rubin

Irmen de Jong <ir**********@xs4all.nl> writes:

Well, ok, if you trust then other end then I think it's enough to just
authenticate all the pickles (say using hmac.py) without needing
something as heavyweight as SSL.
An interesting idea that hadn't crossed my mind yet. Pyro *does*
already have connection authentication that uses md5 (and hmac since
3.5beta) with a shared secret, but after that, the communication is
done in plaintext so to speak.

Yes, that's what I meant, using hmac to authenticate using a shared secret,
sending the rest in the clear. Note you should also put sequence numbers
in the messages, to stop the attacker from fooling you by selectively
deleting or replaying messages.
You should not want to expose a Pyro service to the internet because
Python doesn't have Java's security model and sandboxing, that are
used with RMI. Pyro has a few features that are very powerful
but also require the use of intrinsic insecure Python code (namely,
pickle, and marshal).
Can you say some more about this? Does RMI really rely on sandboxes,
if you don't send code around, but just expose operations on server
side objects?

I don't think marshal is inherently insecure, since the unmarshaller
doesn't itself execute any marshalled code. It apparently has some
bugs that can confuse it if you send it a malformed marshalled string,
but those can be fixed. Pickle is inherently insecure because of how
it calls class constructors.
Just look at the recent security advisory about the XMLRPC server
that comes with Python.... it's much more primitive than Pyro is,
but even that one was insecure.
I haven't looked at that bug carefully yet but yes, anything exposed
to the internet has to be done very carefully, and XMLRPC missed something.
I wouldn't put a Java RMI server or xyz CORBA server or whatever
kind of unrestricted API open on the internet anyway.
Am I rational or paranoid?

I haven't used Java enough to advise you on this, but I thought they
were supposed to be ok to expose to the internet. Certainly the whole
idea of .NET is to let you securely provide RPC services (excuse me
for a moment while I try to stop laughing for mentioning security and
Microsoft in the same sentence). And lots of people use things like
SOAP for that.

Jul 18 '05 #37

Irmen de Jong

Paul Rubin wrote:

Yes, that's what I meant, using hmac to authenticate using a shared secret,
sending the rest in the clear. Note you should also put sequence numbers
in the messages, to stop the attacker from fooling you by selectively
deleting or replaying messages.
Thanks for the tip. I'll think about this.

You should not want to expose a Pyro service to the internet because
Python doesn't have Java's security model and sandboxing, that are
used with RMI. Pyro has a few features that are very powerful
but also require the use of intrinsic insecure Python code (namely,
pickle, and marshal).

Can you say some more about this? Does RMI really rely on sandboxes,
if you don't send code around, but just expose operations on server
side objects?

Well, my experience with RMI is very limited (and from a few years ago)
but I remember that you are required to set a security manager on your
RMI objects. I always used Java's default rmi security manager but I
honestly don't know what it actually does :-D

Other than that, it would be interesting to know if the RMP or IIOP
protocols have any problems with malicious packets? I don't know
them well enough to say anything about this.
I don't think marshal is inherently insecure, since the unmarshaller
doesn't itself execute any marshalled code. It apparently has some
bugs that can confuse it if you send it a malformed marshalled string,
but those can be fixed. Pickle is inherently insecure because of how
it calls class constructors.

Yep, that's what I now know too from the other replies in this thread.

Just look at the recent security advisory about the XMLRPC server
that comes with Python.... it's much more primitive than Pyro is,
but even that one was insecure.

I haven't looked at that bug carefully yet but yes, anything exposed
to the internet has to be done very carefully, and XMLRPC missed something.

What I know of it is that you had the possibility to arbitrarily follow
attribute paths, including attributes that should rather be kept hidden.

I wouldn't put a Java RMI server or xyz CORBA server or whatever
kind of unrestricted API open on the internet anyway.
Am I rational or paranoid?

I haven't used Java enough to advise you on this, but I thought they
were supposed to be ok to expose to the internet. Certainly the whole
idea of .NET is to let you securely provide RPC services (excuse me
for a moment while I try to stop laughing for mentioning security and
Microsoft in the same sentence). And lots of people use things like
SOAP for that.

I label things like SOAP and XML-RPC much different than RMI or Pyro,
because they (SOAP) are much more "distant" from the actual
programming language and environment beneath them. I don't know if
this is good thinking or not but the fact that RMI and Pyro expose
language features directly, and SOAP not, makes that I reason about them
differently.

Then again, Pyro allows you to use two forms of XML serialization
on the wire (instead of pickle), which may or may not move it much closer
to SOAP and the likes. But there are other reasons for not wanting
a Pyro server exposed on the internet. Such as the lack of a good
security analisys of Pyro. Perhaps it suffers from similar holes
as XMLRPC until recently...

Furthermore there are practical issues such as having to
open a buch of new ports in your firewall. In my experience
this is very hard to get done, sadly, in contrast to just
exposing a "web-service" (in whatever form) on port 80 HTTP.
--Irmen

Jul 18 '05 #38

Fredrik Lundh

Irmen de Jong wrote:

I haven't looked at that bug carefully yet but yes, anything exposed
to the internet has to be done very carefully, and XML-RPC missed
something.

What I know of it is that you had the possibility to arbitrarily follow
attribute paths, including attributes that should rather be kept hidden.

the bug had nothing to do with the XML-RPC protocol itself; it was a
weakness in the SimpleXMLRPCServer framework which used reflection
to automatically publish instance methods (if you use getattr repeatedly on
an instance, you can access a lot more than just attributes and methods...)

how do you publish "RPC endpoints" in Pyro?

</F>

Jul 18 '05 #39

Irmen de Jong

Fredrik Lundh wrote:

the bug had nothing to do with the XML-RPC protocol itself;
True, sorry for the confusion. I should have written it more precisely.
it was a
weakness in the SimpleXMLRPCServer framework which used reflection
to automatically publish instance methods (if you use getattr repeatedly on
an instance, you can access a lot more than just attributes and methods...)

how do you publish "RPC endpoints" in Pyro?

By reflection :-) return getattr(self,method) (*args,**keywords)
But Pyro currently treats attribute lookups differently.
It either ignores them completely (you have to enable remote-attribute
access explicitly) or returns attributes as 'local' objects.
What I mean is that you can access a remote attribute of a Pyro object,
but only one level deep. There is no repeated (nested) remote attribute
lookup. It's quite difficult to explain, if you want more details please
read the relevant section in the Pyro manual:
http://pyro.sourceforge.net/manual/7...ml#nestedattrs
As far as I can see, Pyro is safe from the XMLRPCServer weakness.

Interestingly, I have been thinking for a long time to add nested
remote attribute lookup to Pyro. I know know that this is perhaps
not a really good idea :)
--Irmen

Jul 18 '05 #40

Paul Rubin

Irmen de Jong <ir**********@xs4all.nl> writes:

Note you should also put sequence numbers in the messages, to stop
the attacker from fooling you by selectively deleting or replaying
messages.
Thanks for the tip. I'll think about this.

Hmm, you also want a random blob in each packet (including the session
start) included in the authentication of the next packet, so the
attacker can't cut and paste messages from old sessions into the
current ones. You know, by the time you're through designing this you
may be better off just using SSL and getting it over with. It's very
easy to make mistakes designing these types of protocols. There are
some reasonable examples in "Applied Cryptography", but maybe you
don't want to deal with this stuff.
Well, my experience with RMI is very limited (and from a few years ago)
but I remember that you are required to set a security manager on your
RMI objects. I always used Java's default rmi security manager but I
honestly don't know what it actually does :-D
Thanks, I should try to find out more about this. I'm about to be
doing some stuff with an existing RMI app and I better make sure it's
not already vulnerable.
I label things like SOAP and XML-RPC much different than RMI or Pyro,
because they (SOAP) are much more "distant" from the actual
programming language and environment beneath them. I don't know if
this is good thinking or not but the fact that RMI and Pyro expose
language features directly, and SOAP not, makes that I reason about them
differently.
Hmm, I sort of understand this, but not completely. Does DCOM or .NET
expose language features directly?
Then again, Pyro allows you to use two forms of XML serialization
on the wire (instead of pickle), which may or may not move it much closer
to SOAP and the likes. But there are other reasons for not wanting
a Pyro server exposed on the internet. Such as the lack of a good
security analisys of Pyro. Perhaps it suffers from similar holes
as XMLRPC until recently...
I've been meaning to look at Pyro and will certainly let you know if I
spot any problems, but of course there might be some that I don't find.
Furthermore there are practical issues such as having to
open a buch of new ports in your firewall. In my experience
this is very hard to get done, sadly, in contrast to just
exposing a "web-service" (in whatever form) on port 80 HTTP.

Yes, though RMI requires the same.

Jul 18 '05 #41

Irmen de Jong

Paul Rubin wrote:

Hmm, you also want a random blob in each packet (including the session
start) included in the authentication of the next packet, so the
attacker can't cut and paste messages from old sessions into the
current ones. You know, by the time you're through designing this you
may be better off just using SSL and getting it over with. It's very
easy to make mistakes designing these types of protocols. There are
some reasonable examples in "Applied Cryptography", but maybe you
don't want to deal with this stuff.
Heh, indeed I rather don't.
I know a bit about this stuff, but not nearly enough to come
up with a water tight design by myself, so it's much easier
and safer to rely on trusted work by others.

I label things like SOAP and XML-RPC much different than RMI or Pyro,
because they (SOAP) are much more "distant" from the actual
programming language and environment beneath them. I don't know if
this is good thinking or not but the fact that RMI and Pyro expose
language features directly, and SOAP not, makes that I reason about them
differently.

Hmm, I sort of understand this, but not completely. Does DCOM or .NET
expose language features directly?

..NET: no idea
DCOM: as it is based on DCE/RPC, I would say: no. There's this MIDL
thing sitting in between and stuff like that. There's no such thing
as a specific class id and/or method name and/or parameter list that
directly maps onto an object.method in the programming environment.

I must confess, this stuff is getting all rather messy and probably not
worth to try to make such a distinction between all the RPC protocols :-)
I've been meaning to look at Pyro and will certainly let you know if I
spot any problems, but of course there might be some that I don't find.

I would appreciate it.

Furthermore there are practical issues such as having to
open a buch of new ports in your firewall. In my experience
this is very hard to get done, sadly, in contrast to just
exposing a "web-service" (in whatever form) on port 80 HTTP.

Yes, though RMI requires the same.

Precisely. There is this tunneling thing, but I never got it to work.
In the end, using a SSH tunnel may prove to be even easier :-D
(just let sshd listen on port 80 and you're set)
--Irmen

Jul 18 '05 #42

Paul Rubin

Irmen de Jong <ir**********@xs4all.nl> writes:

I know a bit about this stuff, but not nearly enough to come
up with a water tight design by myself, so it's much easier
and safer to rely on trusted work by others.
Yeah, at this point I think it's safest to just use SSL. If I use
Pyro for anything I'll probably do it that way.
DCOM: as it is based on DCE/RPC, I would say: no. There's this MIDL
thing sitting in between and stuff like that. There's no such thing
as a specific class id and/or method name and/or parameter list that
directly maps onto an object.method in the programming environment.
Hmm, ok, maybe we need something like that for Python, perhaps as a
Pyro extension.
Precisely. There is this tunneling thing, but I never got it to work.
In the end, using a SSH tunnel may prove to be even easier :-D
(just let sshd listen on port 80 and you're set)

I think if you want to get serious about authentication, SSL has more
of a developed infrastructure. Frankly I've never understood why ssh
caught on instead of telnet over SSL. See stunnel.org for a simple
SSL tunnel.

Jul 18 '05 #43

is there a safe marshaler?

Similar topics