By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,761 Members | 1,193 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,761 IT Pros & Developers. It's quick & easy.

Marshal Obj is String or Binary?

P: n/a
Hi,

The example below shows that result of a marshaled data structure is
nothing but a string
data = {2:'two', 3:'three'}
import marshal
bytes = marshal.dumps(data)
type(bytes) <type 'str'> bytes

'{i\x02\x00\x00\x00t\x03\x00\x00\x00twoi\x03\x00\x 00\x00t\x05\x00\x00\x00three0'

Now, I need to store this data safely in my database as CLEAR TEXT, not
BLOB. It seems to me that it should work just fine since it is string
anyways. So, why does O'reilly's Python Cookbook is insisting in saving
it as a binary file and BLOB type?

Am I missing out something?

Thanks,
Mike

Jan 13 '06 #1
Share this Question
Share on Google+
21 Replies


P: n/a
In <11**********************@g47g2000cwa.googlegroups .com>, Mike wrote:
The example below shows that result of a marshaled data structure is
nothing but a string
data = {2:'two', 3:'three'}
import marshal
bytes = marshal.dumps(data)
type(bytes) <type 'str'> bytes

'{i\x02\x00\x00\x00t\x03\x00\x00\x00twoi\x03\x00\x 00\x00t\x05\x00\x00\x00three0'

Now, I need to store this data safely in my database as CLEAR TEXT, not
BLOB. It seems to me that it should work just fine since it is string
anyways. So, why does O'reilly's Python Cookbook is insisting in saving
it as a binary file and BLOB type?

Am I missing out something?


Yes, that a string is *binary* data. But only a subset of strings is safe
to use as `TEXT` in databases. Do you see all those '\x??' escapes?
'\x00' is *one* byte! A byte with the value zero. Something your DB
doesn't allow in a `TEXT` type.

Ciao,
Marc 'BlackJack' Rintsch
Jan 13 '06 #2

P: n/a
Wait a sec. \x00 may represent a byte when unmarshaled, but as long as
marshal likes it as \x00, I think my db is capable of storing \ x 0 0
characters. What is the problem? Is it that \? I could escape that...
actually I think my django framework already does that for me.

Thanks,
Mike

Jan 14 '06 #3

P: n/a
Wait a sec. \x00 may represent a byte when unmarshaled, but as long as
marshal likes it as \x00, I think my db is capable of storing \ x 0 0
characters. What is the problem? Is it that \? I could escape that...
actually I think my django framework already does that for me.

Thanks,
Mike

Jan 14 '06 #4

P: n/a
Try...
for i in bytes: print ord(i)
or
len(bytes)


What you see isn't always what you have. Your database is capable of
storing \ x 0 0 characters, but your string contains a single byte of
value zero. When Python displays the string representation to you, it
escapes the values so they can be displayed.

casevh

Jan 14 '06 #5

P: n/a
ca****@comcast.net wrote:
Try...
for i in bytes: print ord(i)
or
len(bytes)
What you see isn't always what you have. Your database is capable of
storing \ x 0 0 characters, but your string contains a single byte of
value zero. When Python displays the string representation to you, it
escapes the values so they can be displayed.


He can still store the repr of the string into the database, and then
reconstruct it with eval:
bytes = "\x00\x01\x02"
bytes '\x00\x01\x02' len(bytes) 3 ord(bytes[0]) 0 rb = repr(bytes)
rb "'\\x00\\x01\\x02'" len(rb) 14 rb[0] "'" rb[1] '\\' rb[2] 'x' rb[3] '0' rb[4] '0' bytes2 = eval(rb)
bytes == bytes2

True

--
Giovanni Bajo
Jan 14 '06 #6

P: n/a
Thanks everyone. It seems broken storing complex structures as escaped
strings, but I think I'll take my changes.

Thanks,
Mike

Jan 14 '06 #7

P: n/a
On Fri, 13 Jan 2006 22:20:27 -0800, Mike wrote:
Thanks everyone. It seems broken storing complex structures as escaped
strings, but I think I'll take my changes.

Have you read the marshal reference?

http://docs.python.org/lib/module-marshal.html

marshal doesn't store data as escaped strings, it stores them as binary
strings. When you print the binary string to the console, unprintable
characters are shown escaped.

I'm guessing you probably want to use pickle instead of marshal. marshal
is intended only for dealing with .pyc files, and has some important
limitations. pickle is intended to be a general purpose serializer.
--
Steve.

Jan 14 '06 #8

P: n/a
Max
Giovanni Bajo wrote:

What you see isn't always what you have. Your database is capable of
storing \ x 0 0 characters, but your string contains a single byte of
value zero. When Python displays the string representation to you, it
escapes the values so they can be displayed.

He can still store the repr of the string into the database, and then
reconstruct it with eval:


Yes, but len(repr('\x00')) is 4, while len('\x00') is 1. So if he uses
BLOB his data will take almost a quarter of the space, compared to your
method (stored as TEXT).

--Max
Jan 14 '06 #9

P: n/a
On Sat, 14 Jan 2006 12:36:59 +0200, Max wrote:
He can still store the repr of the string into the database, and then
reconstruct it with eval:

Yes, but len(repr('\x00')) is 4, while len('\x00') is 1.


Incorrect:
len(repr('\x00')) 6 repr('\x00') "'\\x00'"
So if he uses
BLOB his data will take almost a quarter of the space, compared to your
method (stored as TEXT).


Also incorrect. That depends utterly on which particular characters end up
in the serialised data. You may or may not be able to predict what that
mix may be.

# nothing but printable data
s = ''.join(['a' for i in range(256)])
len(s) 256 len(repr(s)) 258
# nothing but unprintable data
s = ''.join(['\0' for i in range(256)])
len(s) 256 len(repr(s)) 1026
# one particular mix of both printable and unprintable data
s = ''.join([chr(i) for i in range(256)])
len(s) 256 len(repr(s)) 737
# a different mix of both printable and unprintable data
s = '+'.join([chr(i) for i in range(128)])
len(s) 255 len(repr(s))

352

--
Steven.

Jan 14 '06 #10

P: n/a
Max wrote:
What you see isn't always what you have. Your database is capable of
storing \ x 0 0 characters, but your string contains a single byte
of value zero. When Python displays the string representation to
you, it escapes the values so they can be displayed.

He can still store the repr of the string into the database, and then
reconstruct it with eval:


Yes, but len(repr('\x00')) is 4, while len('\x00') is 1. So if he uses
BLOB his data will take almost a quarter of the space, compared to
your method (stored as TEXT).


Sure, but he didn't ask for the best strategy to store the data into the
database, he specified very clearly that he *can't* use BLOB, and asked how to
tuse TEXT.
--
Giovanni Bajo
Jan 14 '06 #11

P: n/a
Thanks everyone.

Why Marshal & not Pickle: Well, Marshal is supposed to be faster. But
then, if I wanted to do the whole repr()-eval() hack, I am already
defeating the purpose by refusing to save bytes as bytes in terms of
both size and speed.

At this point, I am considering one of the following:
- Save my structure as binary data, and reference the file from my db
- Find a clean method of saving bytes into my db

Thanks again,
Mike

Jan 14 '06 #12

P: n/a
"Giovanni Bajo" <no***@sorry.com> writes:
ca****@comcast.net wrote:
Try...
> for i in bytes: print ord(i)

or
> len(bytes)

What you see isn't always what you have. Your database is capable of
storing \ x 0 0 characters, but your string contains a single byte of
value zero. When Python displays the string representation to you, it
escapes the values so they can be displayed.

He can still store the repr of the string into the database, and then
reconstruct it with eval:


repr and eval are overkill for this, and as as result create a
security hole. Using encode('string-escape') and
decode('string-escape') will do the same job without the security
hole:
bytes = '\x00\x01\x02'
bytes '\x00\x01\x02' ord(bytes[0]) 0 rb = bytes.encode('string-escape')
rb '\\x00\\x01\\x02' len(rb) 12 rb[0] '\\' bytes2 = rb.decode('string-escape')
bytes == bytes2 True


<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jan 14 '06 #13

P: n/a
On Sat, 14 Jan 2006 13:50:24 -0800, Mike wrote:
Thanks everyone.

Why Marshal & not Pickle: Well, Marshal is supposed to be faster.
Faster than cPickle?

Even faster would be to write your code in assembly, and dump that
ridiculously bloated database and just write everything to raw bytes on
an unformatted disk. Of course, it might take the programmer a thousand
times longer to actually write the program, and there will probably be
hundreds of bugs in it, but the important thing is that you'll save three
or four milliseconds at runtime.

Right?

Unless you've actually done proper measurements of the time taken, with
realistic sample data, worrying about saving a byte here and a
millisecond there is just wasting your time, and is often
counter-productive. Optimization without measurement is as likely to
result in slower, fatter performance as it is faster and leaner.

marshal is not designed to be portable across versions. Do you *really*
think it is a good idea to tie the data in your database to one specific
version of Python?

But
then, if I wanted to do the whole repr()-eval() hack, I am already
defeating the purpose by refusing to save bytes as bytes in terms of
both size and speed.

At this point, I am considering one of the following:
- Save my structure as binary data, and reference the file from my db
- Find a clean method of saving bytes into my db


Your database either can handle binary data, or it can't.

If it can, then just use pickle with a binary protocol and be done with it.

If it can't, then just use pickle with a plain text protocol and be done
with it.

Either way, you have to find a way to translate your Python data
structures into something that you can feed to the database. Your database
can't automatically suck data structures out of Python's working memory!
So why re-invent the wheel? marshal is not recommended, but if you can
live with the limitations of marshal then it might do the job. But trying
to optimise code that hasn't even been written yet is a sure way to
trouble.
--
Steven.

Jan 14 '06 #14

P: n/a
Mike wrote:
Hi,

The example below shows that result of a marshaled data structure is
nothing but a string

data = {2:'two', 3:'three'}
import marshal
bytes = marshal.dumps(data)
type(bytes)
<type 'str'>
bytes


'{i\x02\x00\x00\x00t\x03\x00\x00\x00twoi\x03\x00\x 00\x00t\x05\x00\x00\x00three0'

Now, I need to store this data safely in my database as CLEAR TEXT, not
BLOB. It seems to me that it should work just fine since it is string
anyways. So, why does O'reilly's Python Cookbook is insisting in saving
it as a binary file and BLOB type?

Well, the Cookbook isn't an exhaustive list of everything you can do
with Python, it's just a record of some of the things people *have* done.

I presume your database has no datatype that will store binary data of
indeterminate length? Clearly that would be the most satisfactory solution.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Jan 15 '06 #15

P: n/a
> Even faster would be to write your code in assembly, and dump that
ridiculously bloated database and just write everything to raw bytes on
an unformatted disk. Of course, it might take the programmer a thousand
times longer to actually write the program, and there will probably be
hundreds of bugs in it, but the important thing is that you'll save three
or four milliseconds at runtime. Right?
Correct. I didn't quite see the issue as assembly vs. python, having
direct translation to programming hours. The structure in mind is meant
to act as a dictionary to extend my db with a few table fields that
could vary from one record to another and won't be queried for.
Considering everytime my record is loaded, it pickle or marshal data
has to be decoded, I figured the faster alternative should be better.
With the incompatibility issue, I figured the day I upgrade my python,
I would write a python script to upgrade the data. I take my word back.
Your database either can handle binary data, or it can't.
It can. It's my web framework that doesn't.
If it can, then just use pickle with a binary protocol and be done with it.
That I will do.
Either way, you have to find a way to translate your Python data
structures into something that you can feed to the database. Your database
can't automatically suck data structures out of Python's working memory!
So why re-invent the wheel? marshal is not recommended, but if you can
live with the limitations of marshal then it might do the job. But trying
to optimise code that hasn't even been written yet is a sure way to
trouble.


Thanks. Will do.

Regards,
Mike

Jan 15 '06 #16

P: n/a
> Even faster would be to write your code in assembly, and dump that
ridiculously bloated database and just write everything to raw bytes on
an unformatted disk. Of course, it might take the programmer a thousand
times longer to actually write the program, and there will probably be
hundreds of bugs in it, but the important thing is that you'll save three
or four milliseconds at runtime. Right?
Correct. I didn't quite see the issue as assembly vs. python, having
direct translation to programming hours. The structure in mind is meant
to act as a dictionary to extend my db with a few table fields that
could vary from one record to another and won't be queried for.
Considering everytime my record is loaded, it pickle or marshal data
has to be decoded, I figured the faster alternative should be better.
With the incompatibility issue, I figured the day I upgrade my python,
I would write a python script to upgrade the data. I take my word back.
Your database either can handle binary data, or it can't.
It can. It's my web framework that doesn't.
If it can, then just use pickle with a binary protocol and be done with it.
That I will do.
Either way, you have to find a way to translate your Python data
structures into something that you can feed to the database. Your database
can't automatically suck data structures out of Python's working memory!
So why re-invent the wheel? marshal is not recommended, but if you can
live with the limitations of marshal then it might do the job. But trying
to optimise code that hasn't even been written yet is a sure way to
trouble.


Thanks. Will do.

Regards,
Mike

Jan 15 '06 #17

P: n/a
> Well, the Cookbook isn't an exhaustive list of everything you can do
with Python, it's just a record of some of the things people *have* done.
Considering I am a newbie, it's a good start for me...
I presume your database has no datatype that will store binary data of
indeterminate length? Clearly that would be the most satisfactory solution.


PostgreSQL. I think the only two thing it doesn't do is wash my car and
code my software. Well, that's up until you use it in conjunction with
Django, then the only work left is to wash my car, which I can't care
less either. We'll wait for some rain :)

Mike

Jan 15 '06 #18

P: n/a
Mike wrote:
Well, the Cookbook isn't an exhaustive list of everything you can do
with Python, it's just a record of some of the things people *have* done.

Considering I am a newbie, it's a good start for me...

I presume your database has no datatype that will store binary data of
indeterminate length? Clearly that would be the most satisfactory solution.

PostgreSQL. I think the only two thing it doesn't do is wash my car and
code my software. Well, that's up until you use it in conjunction with
Django, then the only work left is to wash my car, which I can't care
less either. We'll wait for some rain :)

So this question was primarily theoretical, right?

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Jan 15 '06 #19

P: n/a
> So this question was primarily theoretical, right?

Theoretical? not really Steve. I wanted to use django's wonderful db
framework to save a structure into my postgresql. Except there is no
direct BLOB support for it yet. There, I was trying to explore my
options with saving this structure in clear text.

Thanks,
Mike

Jan 15 '06 #20

P: n/a
Mike wrote:
So this question was primarily theoretical, right?

Theoretical? not really Steve. I wanted to use django's wonderful db
framework to save a structure into my postgresql. Except there is no
direct BLOB support for it yet. There, I was trying to explore my
options with saving this structure in clear text.

Right, NOW I understand.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Jan 15 '06 #21

P: n/a
"Mike" <mi**********@hotmail.com> writes:
Correct. I didn't quite see the issue as assembly vs. python, having
direct translation to programming hours.... I figured the day I
upgrade my python, I would write a python script to upgrade the
data. I take my word back.


Writing that script sounds potentially capable of consuming some
programming hours. If the marshal format changes, it might not be
easy to have the old and new marshal modules in the same Python
instance. You might need two separate interpreters (old and new), to
demarshal your objects in the old interpreter and communicate them
somehow to the new interpreter (e.g. through pickle and sockets) for
re-marshalling. Migrating a database is a big pain in the neck
already without such extra complication.
Jan 15 '06 #22

This discussion thread is closed

Replies have been disabled for this discussion.