Hi,
The example below shows that result of a marshaled data structure is
nothing but a string data = {2:'two', 3:'three'} import marshal bytes = marshal.dumps(data) type(bytes)
<type 'str'> bytes
'{i\x02\x00\x00\x00t\x03\x00\x00\x00twoi\x03\x00\x 00\x00t\x05\x00\x00\x00three0'
Now, I need to store this data safely in my database as CLEAR TEXT, not
BLOB. It seems to me that it should work just fine since it is string
anyways. So, why does O'reilly's Python Cookbook is insisting in saving
it as a binary file and BLOB type?
Am I missing out something?
Thanks,
Mike 21 2528
In <11**********************@g47g2000cwa.googlegroups .com>, Mike wrote: The example below shows that result of a marshaled data structure is nothing but a string
data = {2:'two', 3:'three'} import marshal bytes = marshal.dumps(data) type(bytes) <type 'str'> bytes
'{i\x02\x00\x00\x00t\x03\x00\x00\x00twoi\x03\x00\x 00\x00t\x05\x00\x00\x00three0'
Now, I need to store this data safely in my database as CLEAR TEXT, not BLOB. It seems to me that it should work just fine since it is string anyways. So, why does O'reilly's Python Cookbook is insisting in saving it as a binary file and BLOB type?
Am I missing out something?
Yes, that a string is *binary* data. But only a subset of strings is safe
to use as `TEXT` in databases. Do you see all those '\x??' escapes?
'\x00' is *one* byte! A byte with the value zero. Something your DB
doesn't allow in a `TEXT` type.
Ciao,
Marc 'BlackJack' Rintsch
Wait a sec. \x00 may represent a byte when unmarshaled, but as long as
marshal likes it as \x00, I think my db is capable of storing \ x 0 0
characters. What is the problem? Is it that \? I could escape that...
actually I think my django framework already does that for me.
Thanks,
Mike
Wait a sec. \x00 may represent a byte when unmarshaled, but as long as
marshal likes it as \x00, I think my db is capable of storing \ x 0 0
characters. What is the problem? Is it that \? I could escape that...
actually I think my django framework already does that for me.
Thanks,
Mike
Try... for i in bytes: print ord(i)
or
len(bytes)
What you see isn't always what you have. Your database is capable of
storing \ x 0 0 characters, but your string contains a single byte of
value zero. When Python displays the string representation to you, it
escapes the values so they can be displayed.
casevh ca****@comcast.net wrote: Try...
for i in bytes: print ord(i) or len(bytes) What you see isn't always what you have. Your database is capable of storing \ x 0 0 characters, but your string contains a single byte of value zero. When Python displays the string representation to you, it escapes the values so they can be displayed.
He can still store the repr of the string into the database, and then
reconstruct it with eval: bytes = "\x00\x01\x02" bytes
'\x00\x01\x02' len(bytes)
3 ord(bytes[0])
0 rb = repr(bytes) rb
"'\\x00\\x01\\x02'" len(rb)
14 rb[0]
"'" rb[1]
'\\' rb[2]
'x' rb[3]
'0' rb[4]
'0' bytes2 = eval(rb) bytes == bytes2
True
--
Giovanni Bajo
Thanks everyone. It seems broken storing complex structures as escaped
strings, but I think I'll take my changes.
Thanks,
Mike
On Fri, 13 Jan 2006 22:20:27 -0800, Mike wrote: Thanks everyone. It seems broken storing complex structures as escaped strings, but I think I'll take my changes.
Have you read the marshal reference? http://docs.python.org/lib/module-marshal.html
marshal doesn't store data as escaped strings, it stores them as binary
strings. When you print the binary string to the console, unprintable
characters are shown escaped.
I'm guessing you probably want to use pickle instead of marshal. marshal
is intended only for dealing with .pyc files, and has some important
limitations. pickle is intended to be a general purpose serializer.
--
Steve.
Giovanni Bajo wrote: What you see isn't always what you have. Your database is capable of storing \ x 0 0 characters, but your string contains a single byte of value zero. When Python displays the string representation to you, it escapes the values so they can be displayed.
He can still store the repr of the string into the database, and then reconstruct it with eval:
Yes, but len(repr('\x00')) is 4, while len('\x00') is 1. So if he uses
BLOB his data will take almost a quarter of the space, compared to your
method (stored as TEXT).
--Max
On Sat, 14 Jan 2006 12:36:59 +0200, Max wrote: He can still store the repr of the string into the database, and then reconstruct it with eval:
Yes, but len(repr('\x00')) is 4, while len('\x00') is 1.
Incorrect: len(repr('\x00'))
6 repr('\x00')
"'\\x00'"
So if he uses BLOB his data will take almost a quarter of the space, compared to your method (stored as TEXT).
Also incorrect. That depends utterly on which particular characters end up
in the serialised data. You may or may not be able to predict what that
mix may be.
# nothing but printable data
s = ''.join(['a' for i in range(256)]) len(s)
256 len(repr(s))
258
# nothing but unprintable data
s = ''.join(['\0' for i in range(256)]) len(s)
256 len(repr(s))
1026
# one particular mix of both printable and unprintable data
s = ''.join([chr(i) for i in range(256)]) len(s)
256 len(repr(s))
737
# a different mix of both printable and unprintable data
s = '+'.join([chr(i) for i in range(128)]) len(s)
255 len(repr(s))
352
--
Steven.
Max wrote: What you see isn't always what you have. Your database is capable of storing \ x 0 0 characters, but your string contains a single byte of value zero. When Python displays the string representation to you, it escapes the values so they can be displayed.
He can still store the repr of the string into the database, and then reconstruct it with eval:
Yes, but len(repr('\x00')) is 4, while len('\x00') is 1. So if he uses BLOB his data will take almost a quarter of the space, compared to your method (stored as TEXT).
Sure, but he didn't ask for the best strategy to store the data into the
database, he specified very clearly that he *can't* use BLOB, and asked how to
tuse TEXT.
--
Giovanni Bajo
Thanks everyone.
Why Marshal & not Pickle: Well, Marshal is supposed to be faster. But
then, if I wanted to do the whole repr()-eval() hack, I am already
defeating the purpose by refusing to save bytes as bytes in terms of
both size and speed.
At this point, I am considering one of the following:
- Save my structure as binary data, and reference the file from my db
- Find a clean method of saving bytes into my db
Thanks again,
Mike
"Giovanni Bajo" <no***@sorry.com> writes: ca****@comcast.net wrote: Try...> for i in bytes: print ord(i) or> len(bytes) What you see isn't always what you have. Your database is capable of storing \ x 0 0 characters, but your string contains a single byte of value zero. When Python displays the string representation to you, it escapes the values so they can be displayed. He can still store the repr of the string into the database, and then reconstruct it with eval:
repr and eval are overkill for this, and as as result create a
security hole. Using encode('string-escape') and
decode('string-escape') will do the same job without the security
hole: bytes = '\x00\x01\x02' bytes
'\x00\x01\x02' ord(bytes[0])
0 rb = bytes.encode('string-escape') rb
'\\x00\\x01\\x02' len(rb)
12 rb[0]
'\\' bytes2 = rb.decode('string-escape') bytes == bytes2
True
<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
On Sat, 14 Jan 2006 13:50:24 -0800, Mike wrote: Thanks everyone.
Why Marshal & not Pickle: Well, Marshal is supposed to be faster.
Faster than cPickle?
Even faster would be to write your code in assembly, and dump that
ridiculously bloated database and just write everything to raw bytes on
an unformatted disk. Of course, it might take the programmer a thousand
times longer to actually write the program, and there will probably be
hundreds of bugs in it, but the important thing is that you'll save three
or four milliseconds at runtime.
Right?
Unless you've actually done proper measurements of the time taken, with
realistic sample data, worrying about saving a byte here and a
millisecond there is just wasting your time, and is often
counter-productive. Optimization without measurement is as likely to
result in slower, fatter performance as it is faster and leaner.
marshal is not designed to be portable across versions. Do you *really*
think it is a good idea to tie the data in your database to one specific
version of Python?
But then, if I wanted to do the whole repr()-eval() hack, I am already defeating the purpose by refusing to save bytes as bytes in terms of both size and speed.
At this point, I am considering one of the following: - Save my structure as binary data, and reference the file from my db - Find a clean method of saving bytes into my db
Your database either can handle binary data, or it can't.
If it can, then just use pickle with a binary protocol and be done with it.
If it can't, then just use pickle with a plain text protocol and be done
with it.
Either way, you have to find a way to translate your Python data
structures into something that you can feed to the database. Your database
can't automatically suck data structures out of Python's working memory!
So why re-invent the wheel? marshal is not recommended, but if you can
live with the limitations of marshal then it might do the job. But trying
to optimise code that hasn't even been written yet is a sure way to
trouble.
--
Steven.
Mike wrote: Hi,
The example below shows that result of a marshaled data structure is nothing but a string
data = {2:'two', 3:'three'} import marshal bytes = marshal.dumps(data) type(bytes) <type 'str'> bytes
'{i\x02\x00\x00\x00t\x03\x00\x00\x00twoi\x03\x00\x 00\x00t\x05\x00\x00\x00three0'
Now, I need to store this data safely in my database as CLEAR TEXT, not BLOB. It seems to me that it should work just fine since it is string anyways. So, why does O'reilly's Python Cookbook is insisting in saving it as a binary file and BLOB type?
Well, the Cookbook isn't an exhaustive list of everything you can do
with Python, it's just a record of some of the things people *have* done.
I presume your database has no datatype that will store binary data of
indeterminate length? Clearly that would be the most satisfactory solution.
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/
> Even faster would be to write your code in assembly, and dump that ridiculously bloated database and just write everything to raw bytes on an unformatted disk. Of course, it might take the programmer a thousand times longer to actually write the program, and there will probably be hundreds of bugs in it, but the important thing is that you'll save three or four milliseconds at runtime.
Right?
Correct. I didn't quite see the issue as assembly vs. python, having
direct translation to programming hours. The structure in mind is meant
to act as a dictionary to extend my db with a few table fields that
could vary from one record to another and won't be queried for.
Considering everytime my record is loaded, it pickle or marshal data
has to be decoded, I figured the faster alternative should be better.
With the incompatibility issue, I figured the day I upgrade my python,
I would write a python script to upgrade the data. I take my word back.
Your database either can handle binary data, or it can't.
It can. It's my web framework that doesn't.
If it can, then just use pickle with a binary protocol and be done with it.
That I will do.
Either way, you have to find a way to translate your Python data structures into something that you can feed to the database. Your database can't automatically suck data structures out of Python's working memory! So why re-invent the wheel? marshal is not recommended, but if you can live with the limitations of marshal then it might do the job. But trying to optimise code that hasn't even been written yet is a sure way to trouble.
Thanks. Will do.
Regards,
Mike
> Even faster would be to write your code in assembly, and dump that ridiculously bloated database and just write everything to raw bytes on an unformatted disk. Of course, it might take the programmer a thousand times longer to actually write the program, and there will probably be hundreds of bugs in it, but the important thing is that you'll save three or four milliseconds at runtime.
Right?
Correct. I didn't quite see the issue as assembly vs. python, having
direct translation to programming hours. The structure in mind is meant
to act as a dictionary to extend my db with a few table fields that
could vary from one record to another and won't be queried for.
Considering everytime my record is loaded, it pickle or marshal data
has to be decoded, I figured the faster alternative should be better.
With the incompatibility issue, I figured the day I upgrade my python,
I would write a python script to upgrade the data. I take my word back.
Your database either can handle binary data, or it can't.
It can. It's my web framework that doesn't.
If it can, then just use pickle with a binary protocol and be done with it.
That I will do.
Either way, you have to find a way to translate your Python data structures into something that you can feed to the database. Your database can't automatically suck data structures out of Python's working memory! So why re-invent the wheel? marshal is not recommended, but if you can live with the limitations of marshal then it might do the job. But trying to optimise code that hasn't even been written yet is a sure way to trouble.
Thanks. Will do.
Regards,
Mike
> Well, the Cookbook isn't an exhaustive list of everything you can do with Python, it's just a record of some of the things people *have* done.
Considering I am a newbie, it's a good start for me...
I presume your database has no datatype that will store binary data of indeterminate length? Clearly that would be the most satisfactory solution.
PostgreSQL. I think the only two thing it doesn't do is wash my car and
code my software. Well, that's up until you use it in conjunction with
Django, then the only work left is to wash my car, which I can't care
less either. We'll wait for some rain :)
Mike
Mike wrote: Well, the Cookbook isn't an exhaustive list of everything you can do with Python, it's just a record of some of the things people *have* done.
Considering I am a newbie, it's a good start for me...
I presume your database has no datatype that will store binary data of indeterminate length? Clearly that would be the most satisfactory solution.
PostgreSQL. I think the only two thing it doesn't do is wash my car and code my software. Well, that's up until you use it in conjunction with Django, then the only work left is to wash my car, which I can't care less either. We'll wait for some rain :)
So this question was primarily theoretical, right?
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/
> So this question was primarily theoretical, right?
Theoretical? not really Steve. I wanted to use django's wonderful db
framework to save a structure into my postgresql. Except there is no
direct BLOB support for it yet. There, I was trying to explore my
options with saving this structure in clear text.
Thanks,
Mike
Mike wrote: So this question was primarily theoretical, right?
Theoretical? not really Steve. I wanted to use django's wonderful db framework to save a structure into my postgresql. Except there is no direct BLOB support for it yet. There, I was trying to explore my options with saving this structure in clear text.
Right, NOW I understand.
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/
"Mike" <mi**********@hotmail.com> writes: Correct. I didn't quite see the issue as assembly vs. python, having direct translation to programming hours.... I figured the day I upgrade my python, I would write a python script to upgrade the data. I take my word back.
Writing that script sounds potentially capable of consuming some
programming hours. If the marshal format changes, it might not be
easy to have the old and new marshal modules in the same Python
instance. You might need two separate interpreters (old and new), to
demarshal your objects in the old interpreter and communicate them
somehow to the new interpreter (e.g. through pickle and sockets) for
re-marshalling. Migrating a database is a big pain in the neck
already without such extra complication. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: syd |
last post by:
Hello all,
In my project, I have container classes holding lists of item classes.
For example, a container class myLibrary might hold a list of item
classes myNation and associated variables...
|
by: Tom |
last post by:
I think I'm still a little rough on the principle and understanding of
Marshal by value and Marshal by reference after reading various materials.
my understanding of Marshal by value is that the...
|
by: dhornyak |
last post by:
I have been banging my head against the wall for a while now, and can't
seem to id the problem. I've been through a ton of posts and the code
doesn't seem any different. Can anybody see it?
When...
|
by: Just Me |
last post by:
The following almost works.
The problem is Marshal.PtrToStringAuto seems to terminate at the first null
so I don't get the full string.
Any suggestions on how to fix this?
Or how to improve the...
|
by: twawsico |
last post by:
I have a piece of code that needs to read the contents of a binary file
(that I've created with another app) into an array of structures. The
binary data in the file represents just a series of...
|
by: Michael McGarry |
last post by:
Hi,
I am using the marshal module in python to save a data structure to a
file. It does not appear to be portable. The data is saved on a Linux
machine. Loading that same data on a Mac gives me...
|
by: Vertilka |
last post by:
I need to read binary data file written by C++ program, using my C#
application.
How do i marshal the bytes i read with my C# code to .NET types.
The data is numbers (integers float doubles...
|
by: Goran |
last post by:
Hi all!
I need to pass managed String from to C-style APIs. I see I can use
Marshal::StringToXXX functions. Is this the best we have? I understand
this will allocate a new string and create copy...
|
by: Anurag |
last post by:
I have been chasing a problem in my code since hours and it bolis down
to this
import marshal
marshal.dumps(str(123)) != marshal.dumps(str("123"))
Can someone please tell me why?
when...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
| |