471,338 Members | 1,001 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,338 software developers and data experts.

Packing a simple dictionary into a string - extending struct?

Hello

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

I can of course use pickle, but it has two big faults for me.
1. It should not be used with untrusted data.
2. I want non-Python programs to be able to read and write these
dictionaries.

I don't want to use XML because:
1. It is verbose.
2. It forces other applications to load an XML parser.

I've written, in about 80 lines, Python code that will pack and unpack (to
use the language of the struct module) such a dictionary. And then I
thought I might be reinventing the wheel. But so far I've not found
anything much like this out there. (The closest is work related to 'binary
XML' - http://en.wikipedia.org/wiki/Binary_XML.)

So, what I'm looking for is something like and extension of struct that
allows dictionaries to be stored. Does anyone know of any related work?

--
Jonathan Fine
Jun 20 '07 #1
9 3418
In <f5**********@south.jnrs.ja.net>, Jonathan Fine wrote:
I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).
Maybe you can use ConfigObj_ or JSON_ to store that data. Another format
mentioned in the binary XML article you've linked in your post is
`ASN.1`_. And there's a secure alternative to `pickle` called cerealizer_.

... _`ASN.1`: http://pyasn1.sourceforge.net/
... _cerealizer: http://home.gna.org/oomadness/en/cerealizer/
... _ConfigObj: http://www.voidspace.org.uk/python/configobj.html
... _JSON: http://www.json.org/

Ciao,
Marc 'BlackJack' Rintsch
Jun 20 '07 #2
On 6/20/07, Jonathan Fine <J.****@open.ac.ukwrote:
Hello

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

I can of course use pickle, but it has two big faults for me.
1. It should not be used with untrusted data.
2. I want non-Python programs to be able to read and write these
dictionaries.

I don't want to use XML because:
1. It is verbose.
2. It forces other applications to load an XML parser.

I've written, in about 80 lines, Python code that will pack and unpack (to
use the language of the struct module) such a dictionary. And then I
thought I might be reinventing the wheel. But so far I've not found
anything much like this out there. (The closest is work related to 'binary
XML' - http://en.wikipedia.org/wiki/Binary_XML.)

So, what I'm looking for is something like and extension of struct that
allows dictionaries to be stored. Does anyone know of any related work?
What about JSON? You can serialize your dictionary, for example, in
JSON format and then unserialize it in any language that has a JSON
parser (unless it is Javascript).

--
http://srid.nearfar.org/
Jun 20 '07 #3
What about JSON? You can serialize your dictionary, for example, in
JSON format and then unserialize it in any language that has a JSON
parser (unless it is Javascript).
There is an implementation available for python called simplejson, available
through easy_install.

Diez
Jun 20 '07 #4
On Jun 20, 9:19 pm, "Jonathan Fine" <J.F...@open.ac.ukwrote:
Hello

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

I can of course use pickle, but it has two big faults for me.
1. It should not be used with untrusted data.
2. I want non-Python programs to be able to read and write these
dictionaries.

I don't want to use XML because:
1. It is verbose.
2. It forces other applications to load an XML parser.

I've written, in about 80 lines, Python code that will pack and unpack (to
use the language of the struct module) such a dictionary. And then I
thought I might be reinventing the wheel. But so far I've not found
anything much like this out there. (The closest is work related to 'binary
XML' -http://en.wikipedia.org/wiki/Binary_XML.)

So, what I'm looking for is something like and extension of struct that
allows dictionaries to be stored. Does anyone know of any related work?
C:\junk>copy con adict.csv
k1,v1
k2,v2
k3,v3
^Z
1 file(s) copied.

C:\junk>\python25\python
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>import csv
adict = dict(csv.reader(open('adict.csv', 'rb')))
adict
{'k3': 'v3', 'k2': 'v2', 'k1': 'v1'}
>>csv.writer(open('bdict.csv', 'wb')).writerows(adict.iteritems())
^Z
C:\junk>type bdict.csv
k3,v3
k2,v2
k1,v1

C:\junk>

Easy enough?
HTH,
John

Jun 20 '07 #5
"Sridhar Ratna" <sr***********@gmail.comwrote in message
What about JSON? You can serialize your dictionary, for example, in
JSON format and then unserialize it in any language that has a JSON
parser (unless it is Javascript).
Thank you for this suggestion. The growing adoption of JSON in Ajax
programming is a strong argument for my using it in my application, although
I think I'd prefer something a little more binary.

So it looks like I'll be using JSON.

Thanks.
Jonathan
Jun 20 '07 #6
On Jun 20, 12:19 pm, "Jonathan Fine" <J.F...@open.ac.ukwrote:
Hello

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

I can of course use pickle, but it has two big faults for me.
1. It should not be used with untrusted data.
2. I want non-Python programs to be able to read and write these
dictionaries.

I don't want to use XML because:
1. It is verbose.
2. It forces other applications to load an XML parser.

I've written, in about 80 lines, Python code that will pack and unpack (to
use the language of the struct module) such a dictionary. And then I
thought I might be reinventing the wheel. But so far I've not found
anything much like this out there. (The closest is work related to 'binary
XML' -http://en.wikipedia.org/wiki/Binary_XML.)

So, what I'm looking for is something like and extension of struct that
allows dictionaries to be stored. Does anyone know of any related work?

--
Jonathan Fine
You could use YAML or KSON then compress the output if size is an
issue.

- Paddy.

Jun 20 '07 #7
Jonathan Fine wrote:
Thank you for this suggestion. The growing adoption of JSON in Ajax
programming is a strong argument for my using it in my application, although
I think I'd prefer something a little more binary.

So it looks like I'll be using JSON.
Well, I tried. But I came across two problems (see below).

First, there's bloat. For binary byte data, one average one
character becomes just over 4.

Second, there's the inconvenience. I can't simple take a
sequence of bytes and encode them using JSON. I have to
turn them into Unicode first. And I guess there's a similar
problem at the other end.

So I'm going with me own solution:
http://mathtran.cvs.sourceforge.net/....1&view=markup

It seems to be related to cerializer:
http://home.gna.org/oomadness/en/cerealizer/index.html

It seems to me that JSON works well for Unicode text, but not
with binary data. Indeed, Unicode hides the binary form of
the stored data, presenting only the code points. But I don't
have Unicode strings!

Here's my test script, which is why I'm not using JSON:
===
import simplejson

x = u''
for i in range(256):
x += unichr(i)

print len(simplejson.dumps(x)), '\n'

simplejson.dumps(chr(128))
===

Here's the output
===
1046 # 256 bytes =256 * 4 + 34 bytes

Traceback (most recent call last):
<snip>
File "/usr/lib/python2.4/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
unexpected code byte
===

--
Jonathan

Jun 22 '07 #8
On Jun 22, 5:08 pm, Jonathan Fine <j...@pytex.orgwrote:
Jonathan Fine wrote:
Thank you for this suggestion. The growing adoption of JSON in Ajax
programming is a strong argument for my using it in my application, although
I think I'd prefer something a little more binary.
So it looks like I'll be using JSON.

Well, I tried. But I came across two problems (see below).

First, there's bloat. For binary byte data, one average one
character becomes just over 4.

Second, there's the inconvenience. I can't simple take a
sequence of bytes and encode them using JSON. I have to
turn them into Unicode first. And I guess there's a similar
problem at the other end.

So I'm going with me own solution:http://mathtran.cvs.sourceforge.net/...t.py?revision=...
def unpack(bytes, unpack_entry=unpack_entry):
'''Return dictionary gotten by unpacking supplied bytes.
Both keys and values in the returned dictionary are byte-strings.
'''
bytedict = {}
ptr = 0
while 1:
key, val, ptr = unpack_entry(bytes, ptr)
bytedict[key] = val
if ptr == len(bytes):
break
# That's beautiful code -- as pretty as a cane-toad.
# Well-behaved too, a very elegant response to unpack(pack({}))
# Try this:
blen = len(bytes)
while ptr < blen:
key, val, ptr = unpack_entry(bytes, ptr)
bytedict[key] = val

return bytedict

HTH,
John

Jun 22 '07 #9
John Machin wrote:
def unpack(bytes, unpack_entry=unpack_entry):
'''Return dictionary gotten by unpacking supplied bytes.
Both keys and values in the returned dictionary are byte-strings.
'''
bytedict = {}
ptr = 0
while 1:
key, val, ptr = unpack_entry(bytes, ptr)
bytedict[key] = val
if ptr == len(bytes):
break
# That's beautiful code -- as pretty as a cane-toad.
Well, it's nearly right. It has a transposition error.
# Well-behaved too, a very elegant response to unpack(pack({}))
Yes, you're right. An attempt to read bytes that aren't there.
# Try this:
blen = len(bytes)
while ptr < blen:
key, val, ptr = unpack_entry(bytes, ptr)
bytedict[key] = val

return bytedict
I've committed such a change. Thank you.

--
Jonathan

Jun 22 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Matthew Barnes | last post: by
3 posts views Thread by vishnu | last post: by
4 posts views Thread by Ross | last post: by
5 posts views Thread by John Baro | last post: by
reply views Thread by Sebastjan Trepca | last post: by
4 posts views Thread by yogi_bear_79 | last post: by
3 posts views Thread by lye85 | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.