473,396 Members | 1,707 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

compressed serialization module

I used pickle and found the file was saved in text format. I wonder
whether anyone is familiar with a good compact off-the-shelf module
available that will save in compressed format... or maybe an opinion
on a smart approach for making a custom one? Appreciate it! I'm a
bit of a n00b but have been looking around. I found a serialize.py
but it seems like overkill.

Mark
Nov 17 '08 #1
10 2650
On Nov 17, 2008, at 10:47 AM, Mark wrote:
I used pickle and found the file was saved in text format. I wonder
whether anyone is familiar with a good compact off-the-shelf module
available that will save in compressed format... or maybe an opinion
on a smart approach for making a custom one?
Well, here's a thought: create a zip file (using the standard zipfile
module), and pickle your data into that.

HTH,
- Joe
Nov 17 '08 #2
>I used pickle and found the file was saved in text format. I wonder
whether anyone is familiar with a good compact off-the-shelf module
available that will save in compressed format... or maybe an opinion
on a smart approach for making a custom one?
JoeWell, here's a thought: create a zip file (using the standard
Joezipfile module), and pickle your data into that.

Also, specify a pickle binary protool. Here's a silly example:
>>len(pickle.dumps([1,2,3], pickle.HIGHEST_PROTOCOL))
14
>>len(pickle.dumps([1,2,3], 0))
18

Skip
Nov 17 '08 #3
sk**@pobox.com <sk**@pobox.comwrote:
>
>I used pickle and found the file was saved in text format. I wonder
>whether anyone is familiar with a good compact off-the-shelf module
>available that will save in compressed format... or maybe an opinion
>on a smart approach for making a custom one?

JoeWell, here's a thought: create a zip file (using the standard
Joezipfile module), and pickle your data into that.

Also, specify a pickle binary protool. Here's a silly example:
>>len(pickle.dumps([1,2,3], pickle.HIGHEST_PROTOCOL))
14
>>len(pickle.dumps([1,2,3], 0))
18
Or even
>>L = range(100)
a = pickle.dumps(L)
len(a)
496
>>b = a.encode("bz2")
len(b)
141
>>c = b.decode("bz2")
M = pickle.loads(c)
M == L
True
>>>

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Nov 17 '08 #4

Thanks guys. This is for serializing to disk. I was hoping to not
have to use too many intermediate steps, but I couldn't figure out how
to pickle data into zipfile without using either intermediate string
or file. That's cool here's what I'll probably settle on (tested) -
now just need to reverse steps for the open function.

def saveOjb(self, dataObj):
fName = self.version + '_' + self.modname + '.dat'
f = open(fName, 'w')
dStr = pickle.dumps(dataObj)
c = dStr.encode("bz2")
pickle.dump(c, f, pickle.HIGHEST_PROTOCOL)
f.close()

I'm glad to see that "encode()" is not one of the string ops on the
deprecate list (using Python 2.5).

Thx,
Mark
Nov 17 '08 #5

Markdef saveOjb(self, dataObj):
Mark fName = self.version + '_' + self.modname + '.dat'
Mark f = open(fName, 'w')
Mark dStr = pickle.dumps(dataObj)
Mark c = dStr.encode("bz2")
Mark pickle.dump(c, f, pickle.HIGHEST_PROTOCOL)
Mark f.close()

Hmmm... Why pickle it twice?

def saveOjb(self, dataObj):
fName = self.version + '_' + self.modname + '.dat'
f = open(fName, 'wb')
f.write(pickle.dumps(dataObj, pickle.HIGHEST_PROTOCOL).encode("bz2"))
f.close()

Skip
Nov 17 '08 #6
On Nov 17, 3:08*pm, s...@pobox.com wrote:
* * Markdef saveOjb(self, dataObj):
* * Mark* * fName = self.version + '_' + self.modname + '.dat'
* * Mark* * f = open(fName, 'w')
* * Mark* * dStr = pickle.dumps(dataObj)
* * Mark* * c = dStr.encode("bz2")
* * Mark* * pickle.dump(c, f, pickle.HIGHEST_PROTOCOL)
* * Mark* * f.close()

Hmmm... *Why pickle it twice?

* * def saveOjb(self, dataObj):
* * * * fName = self.version + '_' + self.modname + '.dat'
* * * * f = open(fName, 'wb')
* * * * f.write(pickle.dumps(dataObj, pickle.HIGHEST_PROTOCOL).encode("bz2"))
* * * * f.close()

Skip

I wasn't sure whether the string object was still a string after
"encode" is called... at least whether it's still an ascii string.
And if not, whether it could be used w/ dumps. I tested your
variation and it works the same. I guess your "write" is doing the
same as my "dump", but may be more efficient. Thanks.
Nov 18 '08 #7
Mark wrote:
Thanks guys. This is for serializing to disk. I was hoping to not
have to use too many intermediate steps
You should be able to use a gzip.GzipFile
or bz2.BZ2File and pickle straight into it.

--
Greg
Nov 18 '08 #8
greg <gr**@cosc.canterbury.ac.nzwrote:
Mark wrote:
Thanks guys. This is for serializing to disk. I was hoping to not
have to use too many intermediate steps

You should be able to use a gzip.GzipFile
or bz2.BZ2File and pickle straight into it.
Good idea - that will be much more memory efficient. Eg
>>import bz2
import pickle
L = range(100)
>>f = bz2.BZ2File("z.dat", "wb")
pickle.dump(L, f)
f.close()
>>f = bz2.BZ2File("z.dat", "rb")
M = pickle.load(f)
f.close()
>>M == L
True
>>>
(Note that basic pickle protocol is likely to be more compressible
than the binary version!)

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Nov 18 '08 #9
Nick Craig-Wood wrote:
(Note that basic pickle protocol is likely to be more compressible
than the binary version!)
Although the binary version may be more compact to
start with. It would be interesting to compare the
two and see which one wins.

--
Greg

Nov 19 '08 #10
greg <gr**@cosc.canterbury.ac.nzwrote:
Nick Craig-Wood wrote:
(Note that basic pickle protocol is likely to be more compressible
than the binary version!)

Although the binary version may be more compact to
start with. It would be interesting to compare the
two and see which one wins.
It is very data dependent of course, but in this case the binary
version wins...

However there is exactly the same amount of information in the text
pickle and the binary pickle, so in theory a perfect compressor will
compress each to exactly the same size ;-)
>>import os
import bz2
import pickle
L = range(1000000)
f = bz2.BZ2File("z.dat", "wb")
pickle.dump(L, f)
f.close()
os.path.getsize("z.dat")
1055197L
>>f = bz2.BZ2File("z1.dat", "wb")
pickle.dump(L, f, -1)
f.close()
os.path.getsize("z1.dat")
524741L
>>>
Practical considerations might be that bz2 is quite CPU expensive. It
also has quite a large overhead

eg
>>len("a".encode("bz2"))
37

So if you are compressing lots of small things, zip is a better
protocol
>>len("a".encode("zip"))
9

It is also much faster!

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Nov 19 '08 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Dennis Hotson | last post by:
Hi, I'm trying to write a function that adds a file-like-object to a compressed tarfile... eg ".tar.gz" or ".tar.bz2" I've had a look at the tarfile module but the append mode doesn't support...
5
by: PerryG | last post by:
We have a .NET 1.1 client which is sending a gzipped soap request using HttpWebRequest to an Apache server. The Apache server is using a the 'mod_deflate' server to decompress the incoming...
6
by: M Ali | last post by:
Hi, We have a c++ application that accesses many different modules. Each of these modules is responsible for it's own serialization. We have seperate olestorages for each of the module. The...
2
by: Robert Magnusson | last post by:
Hi all, I have a healthy class defined that happily serializes and deserializes from the underlying XML file. The problem I hit is that, as soon as I add an implicit conversion in any of the...
8
by: robert | last post by:
Hello, I want to put (incrementally) changed/new files from a big file tree "directly,compressed and password-only-encrypted" to a remote backup server incrementally via FTP,SFTP or DAV.... At...
0
by: Google Groups | last post by:
Hi, I have the following configuraiton: Server side: Apache 2.x with deflate module configured for cotet-stream. Client Side: IE 6.0.2 SP2 dot net framework v1.1.4322
1
by: Ritesh Raj Sarraf | last post by:
Hi, The program downloads the files from the internet and compresses them to a single zip archive using compress_the_file(). Upon running syncer() which calls the decompress_the_file(), the...
3
by: Marc | last post by:
Hi, I am trying to serialize a data structure -- a list (of custom class) -- in one application, then read it in with another application. My serialize and deserialize subs are in a module that...
9
by: flebber | last post by:
I was working at creating a simple program that would read the content of a playlist file( in this case *.k3b") and write it out . the compressed "*.k3b" file has two file and the one I was trying...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.