473,466 Members | 1,324 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Packing a simple dictionary into a string - extending struct?

Hello

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

I can of course use pickle, but it has two big faults for me.
1. It should not be used with untrusted data.
2. I want non-Python programs to be able to read and write these
dictionaries.

I don't want to use XML because:
1. It is verbose.
2. It forces other applications to load an XML parser.

I've written, in about 80 lines, Python code that will pack and unpack (to
use the language of the struct module) such a dictionary. And then I
thought I might be reinventing the wheel. But so far I've not found
anything much like this out there. (The closest is work related to 'binary
XML' - http://en.wikipedia.org/wiki/Binary_XML.)

So, what I'm looking for is something like and extension of struct that
allows dictionaries to be stored. Does anyone know of any related work?

--
Jonathan Fine
Jun 20 '07 #1
9 3626
In <f5**********@south.jnrs.ja.net>, Jonathan Fine wrote:
I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).
Maybe you can use ConfigObj_ or JSON_ to store that data. Another format
mentioned in the binary XML article you've linked in your post is
`ASN.1`_. And there's a secure alternative to `pickle` called cerealizer_.

... _`ASN.1`: http://pyasn1.sourceforge.net/
... _cerealizer: http://home.gna.org/oomadness/en/cerealizer/
... _ConfigObj: http://www.voidspace.org.uk/python/configobj.html
... _JSON: http://www.json.org/

Ciao,
Marc 'BlackJack' Rintsch
Jun 20 '07 #2
On 6/20/07, Jonathan Fine <J.****@open.ac.ukwrote:
Hello

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

I can of course use pickle, but it has two big faults for me.
1. It should not be used with untrusted data.
2. I want non-Python programs to be able to read and write these
dictionaries.

I don't want to use XML because:
1. It is verbose.
2. It forces other applications to load an XML parser.

I've written, in about 80 lines, Python code that will pack and unpack (to
use the language of the struct module) such a dictionary. And then I
thought I might be reinventing the wheel. But so far I've not found
anything much like this out there. (The closest is work related to 'binary
XML' - http://en.wikipedia.org/wiki/Binary_XML.)

So, what I'm looking for is something like and extension of struct that
allows dictionaries to be stored. Does anyone know of any related work?
What about JSON? You can serialize your dictionary, for example, in
JSON format and then unserialize it in any language that has a JSON
parser (unless it is Javascript).

--
http://srid.nearfar.org/
Jun 20 '07 #3
What about JSON? You can serialize your dictionary, for example, in
JSON format and then unserialize it in any language that has a JSON
parser (unless it is Javascript).
There is an implementation available for python called simplejson, available
through easy_install.

Diez
Jun 20 '07 #4
On Jun 20, 9:19 pm, "Jonathan Fine" <J.F...@open.ac.ukwrote:
Hello

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

I can of course use pickle, but it has two big faults for me.
1. It should not be used with untrusted data.
2. I want non-Python programs to be able to read and write these
dictionaries.

I don't want to use XML because:
1. It is verbose.
2. It forces other applications to load an XML parser.

I've written, in about 80 lines, Python code that will pack and unpack (to
use the language of the struct module) such a dictionary. And then I
thought I might be reinventing the wheel. But so far I've not found
anything much like this out there. (The closest is work related to 'binary
XML' -http://en.wikipedia.org/wiki/Binary_XML.)

So, what I'm looking for is something like and extension of struct that
allows dictionaries to be stored. Does anyone know of any related work?
C:\junk>copy con adict.csv
k1,v1
k2,v2
k3,v3
^Z
1 file(s) copied.

C:\junk>\python25\python
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>import csv
adict = dict(csv.reader(open('adict.csv', 'rb')))
adict
{'k3': 'v3', 'k2': 'v2', 'k1': 'v1'}
>>csv.writer(open('bdict.csv', 'wb')).writerows(adict.iteritems())
^Z
C:\junk>type bdict.csv
k3,v3
k2,v2
k1,v1

C:\junk>

Easy enough?
HTH,
John

Jun 20 '07 #5
"Sridhar Ratna" <sr***********@gmail.comwrote in message
What about JSON? You can serialize your dictionary, for example, in
JSON format and then unserialize it in any language that has a JSON
parser (unless it is Javascript).
Thank you for this suggestion. The growing adoption of JSON in Ajax
programming is a strong argument for my using it in my application, although
I think I'd prefer something a little more binary.

So it looks like I'll be using JSON.

Thanks.
Jonathan
Jun 20 '07 #6
On Jun 20, 12:19 pm, "Jonathan Fine" <J.F...@open.ac.ukwrote:
Hello

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

I can of course use pickle, but it has two big faults for me.
1. It should not be used with untrusted data.
2. I want non-Python programs to be able to read and write these
dictionaries.

I don't want to use XML because:
1. It is verbose.
2. It forces other applications to load an XML parser.

I've written, in about 80 lines, Python code that will pack and unpack (to
use the language of the struct module) such a dictionary. And then I
thought I might be reinventing the wheel. But so far I've not found
anything much like this out there. (The closest is work related to 'binary
XML' -http://en.wikipedia.org/wiki/Binary_XML.)

So, what I'm looking for is something like and extension of struct that
allows dictionaries to be stored. Does anyone know of any related work?

--
Jonathan Fine
You could use YAML or KSON then compress the output if size is an
issue.

- Paddy.

Jun 20 '07 #7
Jonathan Fine wrote:
Thank you for this suggestion. The growing adoption of JSON in Ajax
programming is a strong argument for my using it in my application, although
I think I'd prefer something a little more binary.

So it looks like I'll be using JSON.
Well, I tried. But I came across two problems (see below).

First, there's bloat. For binary byte data, one average one
character becomes just over 4.

Second, there's the inconvenience. I can't simple take a
sequence of bytes and encode them using JSON. I have to
turn them into Unicode first. And I guess there's a similar
problem at the other end.

So I'm going with me own solution:
http://mathtran.cvs.sourceforge.net/....1&view=markup

It seems to be related to cerializer:
http://home.gna.org/oomadness/en/cerealizer/index.html

It seems to me that JSON works well for Unicode text, but not
with binary data. Indeed, Unicode hides the binary form of
the stored data, presenting only the code points. But I don't
have Unicode strings!

Here's my test script, which is why I'm not using JSON:
===
import simplejson

x = u''
for i in range(256):
x += unichr(i)

print len(simplejson.dumps(x)), '\n'

simplejson.dumps(chr(128))
===

Here's the output
===
1046 # 256 bytes =256 * 4 + 34 bytes

Traceback (most recent call last):
<snip>
File "/usr/lib/python2.4/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
unexpected code byte
===

--
Jonathan

Jun 22 '07 #8
On Jun 22, 5:08 pm, Jonathan Fine <j...@pytex.orgwrote:
Jonathan Fine wrote:
Thank you for this suggestion. The growing adoption of JSON in Ajax
programming is a strong argument for my using it in my application, although
I think I'd prefer something a little more binary.
So it looks like I'll be using JSON.

Well, I tried. But I came across two problems (see below).

First, there's bloat. For binary byte data, one average one
character becomes just over 4.

Second, there's the inconvenience. I can't simple take a
sequence of bytes and encode them using JSON. I have to
turn them into Unicode first. And I guess there's a similar
problem at the other end.

So I'm going with me own solution:http://mathtran.cvs.sourceforge.net/...t.py?revision=...
def unpack(bytes, unpack_entry=unpack_entry):
'''Return dictionary gotten by unpacking supplied bytes.
Both keys and values in the returned dictionary are byte-strings.
'''
bytedict = {}
ptr = 0
while 1:
key, val, ptr = unpack_entry(bytes, ptr)
bytedict[key] = val
if ptr == len(bytes):
break
# That's beautiful code -- as pretty as a cane-toad.
# Well-behaved too, a very elegant response to unpack(pack({}))
# Try this:
blen = len(bytes)
while ptr < blen:
key, val, ptr = unpack_entry(bytes, ptr)
bytedict[key] = val

return bytedict

HTH,
John

Jun 22 '07 #9
John Machin wrote:
def unpack(bytes, unpack_entry=unpack_entry):
'''Return dictionary gotten by unpacking supplied bytes.
Both keys and values in the returned dictionary are byte-strings.
'''
bytedict = {}
ptr = 0
while 1:
key, val, ptr = unpack_entry(bytes, ptr)
bytedict[key] = val
if ptr == len(bytes):
break
# That's beautiful code -- as pretty as a cane-toad.
Well, it's nearly right. It has a transposition error.
# Well-behaved too, a very elegant response to unpack(pack({}))
Yes, you're right. An attempt to read bytes that aren't there.
# Try this:
blen = len(bytes)
while ptr < blen:
key, val, ptr = unpack_entry(bytes, ptr)
bytedict[key] = val

return bytedict
I've committed such a change. Thank you.

--
Jonathan

Jun 22 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Matthew Barnes | last post by:
I was wondering if there would be any interest in extending the struct.unpack format notation to be able to express groups of data with parenthesis. For example: >>> data =...
3
by: vishnu | last post by:
hi friends here is small program : int main() { struct rec { unsigned short rno; }; union
4
by: Ross | last post by:
Anyone have some code to help me understand how I can convert a given hex string, say "0009dbaa00004c00000000fb82ca621c," into a struct of this form: struct uniqueid { ulong32_t word1;...
5
by: John Baro | last post by:
I have a richtextbox which I want the "literal" rtf of. richtextbox.rtf returns {\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033\\uc1 }\r\n\0 when i put this into a string I get...
5
by: Adam Clauss | last post by:
I am attempting to set the text on a richedit control in another application using EM_SETTEXTEX:...
0
by: Sebastjan Trepca | last post by:
Hi, is there any library or some way to parse dictionary string with list, string and int objects into a real Python dictionary? For example: >>> my_dict =...
4
by: hugob0ss | last post by:
Hi, i'm with a problem here that i can't understand what it is. Hi have this code struct SF { std::string mnemonic;//mnemonic that represents it std::string name;//a descriptive name ...
4
by: yogi_bear_79 | last post by:
I have a simple string (i.e. February 27, 2008) that I need to split into three parts. The month, day, and year. Splitting into a string array would work, and I could convert day and years to...
3
by: lye85 | last post by:
#include <stdio.h> #include <stdlib.h> #include <string.h> struct account { char *AccName; int Age; double AccBalance; struct account *Next;
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.