Marshal Obj is String or Binary?

Hi,

The example below shows that result of a marshaled data structure is
nothing but a string

data = {2:'two', 3:'three'}
import marshal
bytes = marshal.dumps(data)
type(bytes) <type 'str'> bytes

'{i\x02\x00\x00\x00t\x03\x00\x00\x00twoi\x03\x00\x 00\x00t\x05\x00\x00\x00three0'

Now, I need to store this data safely in my database as CLEAR TEXT, not
BLOB. It seems to me that it should work just fine since it is string
anyways. So, why does O'reilly's Python Cookbook is insisting in saving
it as a binary file and BLOB type?

Am I missing out something?

Thanks,
Mike

Jan 13 '06 #1

Subscribe Post Reply

2528

Marc 'BlackJack' Rintsch

In <11**********************@g47g2000cwa.googlegroups .com>, Mike wrote:

The example below shows that result of a marshaled data structure is
nothing but a string
data = {2:'two', 3:'three'}
import marshal
bytes = marshal.dumps(data)
type(bytes) <type 'str'> bytes

'{i\x02\x00\x00\x00t\x03\x00\x00\x00twoi\x03\x00\x 00\x00t\x05\x00\x00\x00three0'

Now, I need to store this data safely in my database as CLEAR TEXT, not
BLOB. It seems to me that it should work just fine since it is string
anyways. So, why does O'reilly's Python Cookbook is insisting in saving
it as a binary file and BLOB type?

Am I missing out something?

Yes, that a string is *binary* data. But only a subset of strings is safe
to use as `TEXT` in databases. Do you see all those '\x??' escapes?
'\x00' is *one* byte! A byte with the value zero. Something your DB
doesn't allow in a `TEXT` type.

Ciao,
Marc 'BlackJack' Rintsch

Jan 13 '06 #2

Mike

Wait a sec. \x00 may represent a byte when unmarshaled, but as long as
marshal likes it as \x00, I think my db is capable of storing \ x 0 0
characters. What is the problem? Is it that \? I could escape that...
actually I think my django framework already does that for me.

Thanks,
Mike

Jan 14 '06 #3

Mike

Jan 14 '06 #4

casevh

Try...

for i in bytes: print ord(i)
or
len(bytes)

What you see isn't always what you have. Your database is capable of
storing \ x 0 0 characters, but your string contains a single byte of
value zero. When Python displays the string representation to you, it
escapes the values so they can be displayed.

casevh

Jan 14 '06 #5

Giovanni Bajo

ca****@comcast.net wrote:

Try...
for i in bytes: print ord(i)
or
len(bytes)
What you see isn't always what you have. Your database is capable of
storing \ x 0 0 characters, but your string contains a single byte of
value zero. When Python displays the string representation to you, it
escapes the values so they can be displayed.

He can still store the repr of the string into the database, and then
reconstruct it with eval:

bytes = "\x00\x01\x02"
bytes '\x00\x01\x02' len(bytes) 3 ord(bytes[0]) 0 rb = repr(bytes)
rb "'\\x00\\x01\\x02'" len(rb) 14 rb[0] "'" rb[1] '\\' rb[2] 'x' rb[3] '0' rb[4] '0' bytes2 = eval(rb)
bytes == bytes2

True

--
Giovanni Bajo

Jan 14 '06 #6

Mike

Thanks everyone. It seems broken storing complex structures as escaped
strings, but I think I'll take my changes.

Thanks,
Mike

Jan 14 '06 #7

Steven D'Aprano

On Fri, 13 Jan 2006 22:20:27 -0800, Mike wrote:

Thanks everyone. It seems broken storing complex structures as escaped
strings, but I think I'll take my changes.

Have you read the marshal reference?

http://docs.python.org/lib/module-marshal.html

marshal doesn't store data as escaped strings, it stores them as binary
strings. When you print the binary string to the console, unprintable
characters are shown escaped.

I'm guessing you probably want to use pickle instead of marshal. marshal
is intended only for dealing with .pyc files, and has some important
limitations. pickle is intended to be a general purpose serializer.
--
Steve.

Jan 14 '06 #8

Max

Giovanni Bajo wrote:

What you see isn't always what you have. Your database is capable of
storing \ x 0 0 characters, but your string contains a single byte of
value zero. When Python displays the string representation to you, it
escapes the values so they can be displayed.

He can still store the repr of the string into the database, and then
reconstruct it with eval:

Yes, but len(repr('\x00')) is 4, while len('\x00') is 1. So if he uses
BLOB his data will take almost a quarter of the space, compared to your
method (stored as TEXT).

--Max

Jan 14 '06 #9

Steven D'Aprano

On Sat, 14 Jan 2006 12:36:59 +0200, Max wrote:

He can still store the repr of the string into the database, and then
reconstruct it with eval:

Yes, but len(repr('\x00')) is 4, while len('\x00') is 1.

Incorrect:

len(repr('\x00')) 6 repr('\x00') "'\\x00'"
So if he uses
BLOB his data will take almost a quarter of the space, compared to your
method (stored as TEXT).

Also incorrect. That depends utterly on which particular characters end up
in the serialised data. You may or may not be able to predict what that
mix may be.

# nothing but printable data
s = ''.join(['a' for i in range(256)])
len(s) 256 len(repr(s)) 258
# nothing but unprintable data
s = ''.join(['\0' for i in range(256)])
len(s) 256 len(repr(s)) 1026
# one particular mix of both printable and unprintable data
s = ''.join([chr(i) for i in range(256)])
len(s) 256 len(repr(s)) 737
# a different mix of both printable and unprintable data
s = '+'.join([chr(i) for i in range(128)])
len(s) 255 len(repr(s))

352

--
Steven.

Jan 14 '06 #10

Giovanni Bajo

Max wrote:

What you see isn't always what you have. Your database is capable of
storing \ x 0 0 characters, but your string contains a single byte
of value zero. When Python displays the string representation to
you, it escapes the values so they can be displayed.

He can still store the repr of the string into the database, and then
reconstruct it with eval:

Yes, but len(repr('\x00')) is 4, while len('\x00') is 1. So if he uses
BLOB his data will take almost a quarter of the space, compared to
your method (stored as TEXT).

Sure, but he didn't ask for the best strategy to store the data into the
database, he specified very clearly that he *can't* use BLOB, and asked how to
tuse TEXT.
--
Giovanni Bajo

Jan 14 '06 #11

Mike

Thanks everyone.

Why Marshal & not Pickle: Well, Marshal is supposed to be faster. But
then, if I wanted to do the whole repr()-eval() hack, I am already
defeating the purpose by refusing to save bytes as bytes in terms of
both size and speed.

At this point, I am considering one of the following:
- Save my structure as binary data, and reference the file from my db
- Find a clean method of saving bytes into my db

Thanks again,
Mike

Jan 14 '06 #12

Mike Meyer

"Giovanni Bajo" <no***@sorry.com> writes:

ca****@comcast.net wrote:
Try...
> for i in bytes: print ord(i)

or
> len(bytes)

What you see isn't always what you have. Your database is capable of
storing \ x 0 0 characters, but your string contains a single byte of
value zero. When Python displays the string representation to you, it
escapes the values so they can be displayed.

He can still store the repr of the string into the database, and then
reconstruct it with eval:

repr and eval are overkill for this, and as as result create a
security hole. Using encode('string-escape') and
decode('string-escape') will do the same job without the security
hole:

bytes = '\x00\x01\x02'
bytes '\x00\x01\x02' ord(bytes[0]) 0 rb = bytes.encode('string-escape')
rb '\\x00\\x01\\x02' len(rb) 12 rb[0] '\\' bytes2 = rb.decode('string-escape')
bytes == bytes2 True

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Jan 14 '06 #13

Steven D'Aprano

On Sat, 14 Jan 2006 13:50:24 -0800, Mike wrote:

Thanks everyone.

Why Marshal & not Pickle: Well, Marshal is supposed to be faster.
Faster than cPickle?

Even faster would be to write your code in assembly, and dump that
ridiculously bloated database and just write everything to raw bytes on
an unformatted disk. Of course, it might take the programmer a thousand
times longer to actually write the program, and there will probably be
hundreds of bugs in it, but the important thing is that you'll save three
or four milliseconds at runtime.

Right?

Unless you've actually done proper measurements of the time taken, with
realistic sample data, worrying about saving a byte here and a
millisecond there is just wasting your time, and is often
counter-productive. Optimization without measurement is as likely to
result in slower, fatter performance as it is faster and leaner.

marshal is not designed to be portable across versions. Do you *really*
think it is a good idea to tie the data in your database to one specific
version of Python?

But
then, if I wanted to do the whole repr()-eval() hack, I am already
defeating the purpose by refusing to save bytes as bytes in terms of
both size and speed.

At this point, I am considering one of the following:
- Save my structure as binary data, and reference the file from my db
- Find a clean method of saving bytes into my db

Your database either can handle binary data, or it can't.

If it can, then just use pickle with a binary protocol and be done with it.

If it can't, then just use pickle with a plain text protocol and be done
with it.

Either way, you have to find a way to translate your Python data
structures into something that you can feed to the database. Your database
can't automatically suck data structures out of Python's working memory!
So why re-invent the wheel? marshal is not recommended, but if you can
live with the limitations of marshal then it might do the job. But trying
to optimise code that hasn't even been written yet is a sure way to
trouble.
--
Steven.

Jan 14 '06 #14

Steve Holden

Mike wrote:

Hi,

The example below shows that result of a marshaled data structure is
nothing but a string

data = {2:'two', 3:'three'}
import marshal
bytes = marshal.dumps(data)
type(bytes)
<type 'str'>
bytes

'{i\x02\x00\x00\x00t\x03\x00\x00\x00twoi\x03\x00\x 00\x00t\x05\x00\x00\x00three0'

Now, I need to store this data safely in my database as CLEAR TEXT, not
BLOB. It seems to me that it should work just fine since it is string
anyways. So, why does O'reilly's Python Cookbook is insisting in saving
it as a binary file and BLOB type?

Well, the Cookbook isn't an exhaustive list of everything you can do
with Python, it's just a record of some of the things people *have* done.

I presume your database has no datatype that will store binary data of
indeterminate length? Clearly that would be the most satisfactory solution.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Jan 15 '06 #15

Mike

> Even faster would be to write your code in assembly, and dump that

ridiculously bloated database and just write everything to raw bytes on
an unformatted disk. Of course, it might take the programmer a thousand
times longer to actually write the program, and there will probably be
hundreds of bugs in it, but the important thing is that you'll save three
or four milliseconds at runtime. Right?
Correct. I didn't quite see the issue as assembly vs. python, having
direct translation to programming hours. The structure in mind is meant
to act as a dictionary to extend my db with a few table fields that
could vary from one record to another and won't be queried for.
Considering everytime my record is loaded, it pickle or marshal data
has to be decoded, I figured the faster alternative should be better.
With the incompatibility issue, I figured the day I upgrade my python,
I would write a python script to upgrade the data. I take my word back.
Your database either can handle binary data, or it can't.
It can. It's my web framework that doesn't.
If it can, then just use pickle with a binary protocol and be done with it.
That I will do.
Either way, you have to find a way to translate your Python data
structures into something that you can feed to the database. Your database
can't automatically suck data structures out of Python's working memory!
So why re-invent the wheel? marshal is not recommended, but if you can
live with the limitations of marshal then it might do the job. But trying
to optimise code that hasn't even been written yet is a sure way to
trouble.

Thanks. Will do.

Regards,
Mike

Jan 15 '06 #16

Mike

> Even faster would be to write your code in assembly, and dump that

ridiculously bloated database and just write everything to raw bytes on
an unformatted disk. Of course, it might take the programmer a thousand
times longer to actually write the program, and there will probably be
hundreds of bugs in it, but the important thing is that you'll save three
or four milliseconds at runtime. Right?
Correct. I didn't quite see the issue as assembly vs. python, having
direct translation to programming hours. The structure in mind is meant
to act as a dictionary to extend my db with a few table fields that
could vary from one record to another and won't be queried for.
Considering everytime my record is loaded, it pickle or marshal data
has to be decoded, I figured the faster alternative should be better.
With the incompatibility issue, I figured the day I upgrade my python,
I would write a python script to upgrade the data. I take my word back.
Your database either can handle binary data, or it can't.
It can. It's my web framework that doesn't.
If it can, then just use pickle with a binary protocol and be done with it.
That I will do.
Either way, you have to find a way to translate your Python data
structures into something that you can feed to the database. Your database
can't automatically suck data structures out of Python's working memory!
So why re-invent the wheel? marshal is not recommended, but if you can
live with the limitations of marshal then it might do the job. But trying
to optimise code that hasn't even been written yet is a sure way to
trouble.

Thanks. Will do.

Regards,
Mike

Jan 15 '06 #17

Mike

> Well, the Cookbook isn't an exhaustive list of everything you can do

with Python, it's just a record of some of the things people *have* done.
Considering I am a newbie, it's a good start for me...
I presume your database has no datatype that will store binary data of
indeterminate length? Clearly that would be the most satisfactory solution.

PostgreSQL. I think the only two thing it doesn't do is wash my car and
code my software. Well, that's up until you use it in conjunction with
Django, then the only work left is to wash my car, which I can't care
less either. We'll wait for some rain :)

Mike

Jan 15 '06 #18

Steve Holden

Mike wrote:

Well, the Cookbook isn't an exhaustive list of everything you can do
with Python, it's just a record of some of the things people *have* done.

Considering I am a newbie, it's a good start for me...

I presume your database has no datatype that will store binary data of
indeterminate length? Clearly that would be the most satisfactory solution.

PostgreSQL. I think the only two thing it doesn't do is wash my car and
code my software. Well, that's up until you use it in conjunction with
Django, then the only work left is to wash my car, which I can't care
less either. We'll wait for some rain :)

So this question was primarily theoretical, right?

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Jan 15 '06 #19

Mike

> So this question was primarily theoretical, right?

Theoretical? not really Steve. I wanted to use django's wonderful db
framework to save a structure into my postgresql. Except there is no
direct BLOB support for it yet. There, I was trying to explore my
options with saving this structure in clear text.

Thanks,
Mike

Jan 15 '06 #20

Steve Holden

Mike wrote:

So this question was primarily theoretical, right?

Theoretical? not really Steve. I wanted to use django's wonderful db
framework to save a structure into my postgresql. Except there is no
direct BLOB support for it yet. There, I was trying to explore my
options with saving this structure in clear text.

Right, NOW I understand.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Jan 15 '06 #21

Paul Rubin

"Mike" <mi**********@hotmail.com> writes:

Correct. I didn't quite see the issue as assembly vs. python, having
direct translation to programming hours.... I figured the day I
upgrade my python, I would write a python script to upgrade the
data. I take my word back.

Writing that script sounds potentially capable of consuming some
programming hours. If the marshal format changes, it might not be
easy to have the old and new marshal modules in the same Python
instance. You might need two separate interpreters (old and new), to
demarshal your objects in the old interpreter and communicate them
somehow to the new interpreter (e.g. through pickle and sockets) for
re-marshalling. Migrating a database is a big pain in the neck
already without such extra complication.

Jan 15 '06 #22

by: syd | last post by:

Hello all, In my project, I have container classes holding lists of item classes. For example, a container class myLibrary might hold a list of item classes myNation and associated variables...

Python

remoting general marshal

by: Tom | last post by:

I think I'm still a little rough on the principle and understanding of Marshal by value and Marshal by reference after reading various materials. my understanding of Marshal by value is that the...

C# / C Sharp

Marshal.AllocHGlobal failure after calling GetTokenInformation

by: dhornyak | last post by:

I have been banging my head against the wall for a while now, and can't seem to id the problem. I've been through a ton of posts and the code doesn't seem any different. Can anybody see it? When...

C# / C Sharp

The following Marshal code almost works

by: Just Me | last post by:

The following almost works. The problem is Marshal.PtrToStringAuto seems to terminate at the first null so I don't get the full string. Any suggestions on how to fix this? Or how to improve the...

Visual Basic .NET

Marshal large byte array into array of structs?

by: twawsico | last post by:

I have a piece of code that needs to read the contents of a binary file (that I've created with another app) into an array of structures. The binary data in the file represents just a series of...

Visual Basic .NET

Bad marshal data

by: Michael McGarry | last post by:

Hi, I am using the marshal module in python to save a data structure to a file. It does not appear to be portable. The data is saved on a Linux machine. Loading that same data on a Mac gives me...

Python

marshal binary data file, written with C++, with my C# code

by: Vertilka | last post by:

I need to read binary data file written by C++ program, using my C# application. How do i marshal the bytes i read with my C# code to .NET types. The data is numbers (integers float doubles...

C# / C Sharp

Marshal::StringToXXX question

by: Goran | last post by:

Hi all! I need to pass managed String from to C-style APIs. I see I can use Marshal::StringToXXX functions. Is this the best we have? I understand this will allocate a new string and create copy...

.NET Framework

marshal bug?

by: Anurag | last post by:

I have been chasing a problem in my code since hours and it bolis down to this import marshal marshal.dumps(str(123)) != marshal.dumps(str("123")) Can someone please tell me why? when...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Marshal Obj is String or Binary?

Similar topics