By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,715 Members | 768 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,715 IT Pros & Developers. It's quick & easy.

compressed serialization module

P: n/a
I used pickle and found the file was saved in text format. I wonder
whether anyone is familiar with a good compact off-the-shelf module
available that will save in compressed format... or maybe an opinion
on a smart approach for making a custom one? Appreciate it! I'm a
bit of a n00b but have been looking around. I found a serialize.py
but it seems like overkill.

Mark
Nov 17 '08 #1
Share this Question
Share on Google+
10 Replies


P: n/a
On Nov 17, 2008, at 10:47 AM, Mark wrote:
I used pickle and found the file was saved in text format. I wonder
whether anyone is familiar with a good compact off-the-shelf module
available that will save in compressed format... or maybe an opinion
on a smart approach for making a custom one?
Well, here's a thought: create a zip file (using the standard zipfile
module), and pickle your data into that.

HTH,
- Joe
Nov 17 '08 #2

P: n/a
>I used pickle and found the file was saved in text format. I wonder
whether anyone is familiar with a good compact off-the-shelf module
available that will save in compressed format... or maybe an opinion
on a smart approach for making a custom one?
JoeWell, here's a thought: create a zip file (using the standard
Joezipfile module), and pickle your data into that.

Also, specify a pickle binary protool. Here's a silly example:
>>len(pickle.dumps([1,2,3], pickle.HIGHEST_PROTOCOL))
14
>>len(pickle.dumps([1,2,3], 0))
18

Skip
Nov 17 '08 #3

P: n/a
sk**@pobox.com <sk**@pobox.comwrote:
>
>I used pickle and found the file was saved in text format. I wonder
>whether anyone is familiar with a good compact off-the-shelf module
>available that will save in compressed format... or maybe an opinion
>on a smart approach for making a custom one?

JoeWell, here's a thought: create a zip file (using the standard
Joezipfile module), and pickle your data into that.

Also, specify a pickle binary protool. Here's a silly example:
>>len(pickle.dumps([1,2,3], pickle.HIGHEST_PROTOCOL))
14
>>len(pickle.dumps([1,2,3], 0))
18
Or even
>>L = range(100)
a = pickle.dumps(L)
len(a)
496
>>b = a.encode("bz2")
len(b)
141
>>c = b.decode("bz2")
M = pickle.loads(c)
M == L
True
>>>

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Nov 17 '08 #4

P: n/a

Thanks guys. This is for serializing to disk. I was hoping to not
have to use too many intermediate steps, but I couldn't figure out how
to pickle data into zipfile without using either intermediate string
or file. That's cool here's what I'll probably settle on (tested) -
now just need to reverse steps for the open function.

def saveOjb(self, dataObj):
fName = self.version + '_' + self.modname + '.dat'
f = open(fName, 'w')
dStr = pickle.dumps(dataObj)
c = dStr.encode("bz2")
pickle.dump(c, f, pickle.HIGHEST_PROTOCOL)
f.close()

I'm glad to see that "encode()" is not one of the string ops on the
deprecate list (using Python 2.5).

Thx,
Mark
Nov 17 '08 #5

P: n/a

Markdef saveOjb(self, dataObj):
Mark fName = self.version + '_' + self.modname + '.dat'
Mark f = open(fName, 'w')
Mark dStr = pickle.dumps(dataObj)
Mark c = dStr.encode("bz2")
Mark pickle.dump(c, f, pickle.HIGHEST_PROTOCOL)
Mark f.close()

Hmmm... Why pickle it twice?

def saveOjb(self, dataObj):
fName = self.version + '_' + self.modname + '.dat'
f = open(fName, 'wb')
f.write(pickle.dumps(dataObj, pickle.HIGHEST_PROTOCOL).encode("bz2"))
f.close()

Skip
Nov 17 '08 #6

P: n/a
On Nov 17, 3:08*pm, s...@pobox.com wrote:
* * Markdef saveOjb(self, dataObj):
* * Mark* * fName = self.version + '_' + self.modname + '.dat'
* * Mark* * f = open(fName, 'w')
* * Mark* * dStr = pickle.dumps(dataObj)
* * Mark* * c = dStr.encode("bz2")
* * Mark* * pickle.dump(c, f, pickle.HIGHEST_PROTOCOL)
* * Mark* * f.close()

Hmmm... *Why pickle it twice?

* * def saveOjb(self, dataObj):
* * * * fName = self.version + '_' + self.modname + '.dat'
* * * * f = open(fName, 'wb')
* * * * f.write(pickle.dumps(dataObj, pickle.HIGHEST_PROTOCOL).encode("bz2"))
* * * * f.close()

Skip

I wasn't sure whether the string object was still a string after
"encode" is called... at least whether it's still an ascii string.
And if not, whether it could be used w/ dumps. I tested your
variation and it works the same. I guess your "write" is doing the
same as my "dump", but may be more efficient. Thanks.
Nov 18 '08 #7

P: n/a
Mark wrote:
Thanks guys. This is for serializing to disk. I was hoping to not
have to use too many intermediate steps
You should be able to use a gzip.GzipFile
or bz2.BZ2File and pickle straight into it.

--
Greg
Nov 18 '08 #8

P: n/a
greg <gr**@cosc.canterbury.ac.nzwrote:
Mark wrote:
Thanks guys. This is for serializing to disk. I was hoping to not
have to use too many intermediate steps

You should be able to use a gzip.GzipFile
or bz2.BZ2File and pickle straight into it.
Good idea - that will be much more memory efficient. Eg
>>import bz2
import pickle
L = range(100)
>>f = bz2.BZ2File("z.dat", "wb")
pickle.dump(L, f)
f.close()
>>f = bz2.BZ2File("z.dat", "rb")
M = pickle.load(f)
f.close()
>>M == L
True
>>>
(Note that basic pickle protocol is likely to be more compressible
than the binary version!)

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Nov 18 '08 #9

P: n/a
Nick Craig-Wood wrote:
(Note that basic pickle protocol is likely to be more compressible
than the binary version!)
Although the binary version may be more compact to
start with. It would be interesting to compare the
two and see which one wins.

--
Greg

Nov 19 '08 #10

P: n/a
greg <gr**@cosc.canterbury.ac.nzwrote:
Nick Craig-Wood wrote:
(Note that basic pickle protocol is likely to be more compressible
than the binary version!)

Although the binary version may be more compact to
start with. It would be interesting to compare the
two and see which one wins.
It is very data dependent of course, but in this case the binary
version wins...

However there is exactly the same amount of information in the text
pickle and the binary pickle, so in theory a perfect compressor will
compress each to exactly the same size ;-)
>>import os
import bz2
import pickle
L = range(1000000)
f = bz2.BZ2File("z.dat", "wb")
pickle.dump(L, f)
f.close()
os.path.getsize("z.dat")
1055197L
>>f = bz2.BZ2File("z1.dat", "wb")
pickle.dump(L, f, -1)
f.close()
os.path.getsize("z1.dat")
524741L
>>>
Practical considerations might be that bz2 is quite CPU expensive. It
also has quite a large overhead

eg
>>len("a".encode("bz2"))
37

So if you are compressing lots of small things, zip is a better
protocol
>>len("a".encode("zip"))
9

It is also much faster!

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Nov 19 '08 #11

This discussion thread is closed

Replies have been disabled for this discussion.