I used pickle and found the file was saved in text format. I wonder
whether anyone is familiar with a good compact off-the-shelf module
available that will save in compressed format... or maybe an opinion
on a smart approach for making a custom one? Appreciate it! I'm a
bit of a n00b but have been looking around. I found a serialize.py
but it seems like overkill.
Mark 10 2650
On Nov 17, 2008, at 10:47 AM, Mark wrote:
I used pickle and found the file was saved in text format. I wonder
whether anyone is familiar with a good compact off-the-shelf module
available that will save in compressed format... or maybe an opinion
on a smart approach for making a custom one?
Well, here's a thought: create a zip file (using the standard zipfile
module), and pickle your data into that.
HTH,
- Joe
>I used pickle and found the file was saved in text format. I wonder whether anyone is familiar with a good compact off-the-shelf module available that will save in compressed format... or maybe an opinion on a smart approach for making a custom one?
JoeWell, here's a thought: create a zip file (using the standard
Joezipfile module), and pickle your data into that.
Also, specify a pickle binary protool. Here's a silly example:
>>len(pickle.dumps([1,2,3], pickle.HIGHEST_PROTOCOL))
14
>>len(pickle.dumps([1,2,3], 0))
18
Skip sk**@pobox.com <sk**@pobox.comwrote:
>
>I used pickle and found the file was saved in text format. I wonder
>whether anyone is familiar with a good compact off-the-shelf module
>available that will save in compressed format... or maybe an opinion
>on a smart approach for making a custom one?
JoeWell, here's a thought: create a zip file (using the standard
Joezipfile module), and pickle your data into that.
Also, specify a pickle binary protool. Here's a silly example:
>>len(pickle.dumps([1,2,3], pickle.HIGHEST_PROTOCOL))
14
>>len(pickle.dumps([1,2,3], 0))
18
Or even
>>L = range(100) a = pickle.dumps(L) len(a)
496
>>b = a.encode("bz2") len(b)
141
>>c = b.decode("bz2") M = pickle.loads(c) M == L
True
>>>
--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Thanks guys. This is for serializing to disk. I was hoping to not
have to use too many intermediate steps, but I couldn't figure out how
to pickle data into zipfile without using either intermediate string
or file. That's cool here's what I'll probably settle on (tested) -
now just need to reverse steps for the open function.
def saveOjb(self, dataObj):
fName = self.version + '_' + self.modname + '.dat'
f = open(fName, 'w')
dStr = pickle.dumps(dataObj)
c = dStr.encode("bz2")
pickle.dump(c, f, pickle.HIGHEST_PROTOCOL)
f.close()
I'm glad to see that "encode()" is not one of the string ops on the
deprecate list (using Python 2.5).
Thx,
Mark
Markdef saveOjb(self, dataObj):
Mark fName = self.version + '_' + self.modname + '.dat'
Mark f = open(fName, 'w')
Mark dStr = pickle.dumps(dataObj)
Mark c = dStr.encode("bz2")
Mark pickle.dump(c, f, pickle.HIGHEST_PROTOCOL)
Mark f.close()
Hmmm... Why pickle it twice?
def saveOjb(self, dataObj):
fName = self.version + '_' + self.modname + '.dat'
f = open(fName, 'wb')
f.write(pickle.dumps(dataObj, pickle.HIGHEST_PROTOCOL).encode("bz2"))
f.close()
Skip
On Nov 17, 3:08*pm, s...@pobox.com wrote:
* * Markdef saveOjb(self, dataObj):
* * Mark* * fName = self.version + '_' + self.modname + '.dat'
* * Mark* * f = open(fName, 'w')
* * Mark* * dStr = pickle.dumps(dataObj)
* * Mark* * c = dStr.encode("bz2")
* * Mark* * pickle.dump(c, f, pickle.HIGHEST_PROTOCOL)
* * Mark* * f.close()
Hmmm... *Why pickle it twice?
* * def saveOjb(self, dataObj):
* * * * fName = self.version + '_' + self.modname + '.dat'
* * * * f = open(fName, 'wb')
* * * * f.write(pickle.dumps(dataObj, pickle.HIGHEST_PROTOCOL).encode("bz2"))
* * * * f.close()
Skip
I wasn't sure whether the string object was still a string after
"encode" is called... at least whether it's still an ascii string.
And if not, whether it could be used w/ dumps. I tested your
variation and it works the same. I guess your "write" is doing the
same as my "dump", but may be more efficient. Thanks.
Mark wrote:
Thanks guys. This is for serializing to disk. I was hoping to not
have to use too many intermediate steps
You should be able to use a gzip.GzipFile
or bz2.BZ2File and pickle straight into it.
--
Greg
greg <gr**@cosc.canterbury.ac.nzwrote:
Mark wrote:
Thanks guys. This is for serializing to disk. I was hoping to not
have to use too many intermediate steps
You should be able to use a gzip.GzipFile
or bz2.BZ2File and pickle straight into it.
Good idea - that will be much more memory efficient. Eg
>>import bz2 import pickle L = range(100)
>>f = bz2.BZ2File("z.dat", "wb") pickle.dump(L, f) f.close()
>>f = bz2.BZ2File("z.dat", "rb") M = pickle.load(f) f.close()
>>M == L
True
>>>
(Note that basic pickle protocol is likely to be more compressible
than the binary version!)
--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Nick Craig-Wood wrote:
(Note that basic pickle protocol is likely to be more compressible
than the binary version!)
Although the binary version may be more compact to
start with. It would be interesting to compare the
two and see which one wins.
--
Greg
greg <gr**@cosc.canterbury.ac.nzwrote:
Nick Craig-Wood wrote:
(Note that basic pickle protocol is likely to be more compressible
than the binary version!)
Although the binary version may be more compact to
start with. It would be interesting to compare the
two and see which one wins.
It is very data dependent of course, but in this case the binary
version wins...
However there is exactly the same amount of information in the text
pickle and the binary pickle, so in theory a perfect compressor will
compress each to exactly the same size ;-)
>>import os import bz2 import pickle L = range(1000000) f = bz2.BZ2File("z.dat", "wb") pickle.dump(L, f) f.close() os.path.getsize("z.dat")
1055197L
>>f = bz2.BZ2File("z1.dat", "wb") pickle.dump(L, f, -1) f.close() os.path.getsize("z1.dat")
524741L
>>>
Practical considerations might be that bz2 is quite CPU expensive. It
also has quite a large overhead
eg
>>len("a".encode("bz2"))
37
So if you are compressing lots of small things, zip is a better
protocol
>>len("a".encode("zip"))
9
It is also much faster!
--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Dennis Hotson |
last post by:
Hi,
I'm trying to write a function that adds a file-like-object to a
compressed tarfile... eg ".tar.gz" or ".tar.bz2"
I've had a look at the tarfile module but the append mode doesn't support...
|
by: PerryG |
last post by:
We have a .NET 1.1 client which is sending a gzipped soap request using
HttpWebRequest to an Apache server. The Apache server is using a the
'mod_deflate' server to decompress the incoming...
|
by: M Ali |
last post by:
Hi,
We have a c++ application that accesses many different modules. Each
of these modules is responsible for it's own
serialization. We have seperate olestorages for each of the
module. The...
|
by: Robert Magnusson |
last post by:
Hi all,
I have a healthy class defined that happily serializes and deserializes from
the underlying XML file.
The problem I hit is that, as soon as I add an implicit conversion in any of
the...
|
by: robert |
last post by:
Hello,
I want to put (incrementally) changed/new files from a big file tree
"directly,compressed and password-only-encrypted" to a remote backup
server incrementally via FTP,SFTP or DAV.... At...
|
by: Google Groups |
last post by:
Hi,
I have the following configuraiton:
Server side:
Apache 2.x with deflate module configured for cotet-stream.
Client Side:
IE 6.0.2 SP2
dot net framework v1.1.4322
|
by: Ritesh Raj Sarraf |
last post by:
Hi,
The program downloads the files from the internet and compresses them
to a single zip archive using compress_the_file().
Upon running syncer() which calls the decompress_the_file(), the...
|
by: Marc |
last post by:
Hi,
I am trying to serialize a data structure -- a list (of custom class)
-- in one application, then read it in with another application. My
serialize and deserialize subs are in a module that...
|
by: flebber |
last post by:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
| |