473,394 Members | 1,812 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Copying zlib compression objects

I'm writing a program in python that creates tar files of a certain
maximum size (to fit onto CD/DVD). One of the problems I'm running
into is that when using compression, it's pretty much impossible to
determine if a file, once added to an archive, will cause the archive
size to exceed the maximum size.

I believe that to do this properly, you need to copy the state of tar
file (basically the current file offset as well as the state of the
compression object), then add the file. If the new size of the archive
exceeds the maximum, you need to restore the original state.

The critical part is being able to copy the compression object.
Without compression it is trivial to determine if a given file will
"fit" inside the archive. When using compression, the compression
ratio of a file depends partially on all the data that has been
compressed prior to it.

The current implementation in the standard library does not allow you
to copy these compression objects in a useful way, so I've made some
minor modifications (patch follows) to the standard 2.4.2 library:
- Add copy() method to zlib compression object. This returns a new
compression object with the same internal state. I named it copy() to
keep it consistent with things like sha.copy().
- Add snapshot() / restore() methods to GzipFile and TarFile. These
work only in write mode. snapshot() returns a state object. Passing
in this state object to restore() will restore the state of the
GzipFile / TarFile to the state represented by the object.

Future work:
- Decompression objects could use a copy() method too
- Add support for copying bzip2 compression objects

Does this seem like a good approach?

Cheers,
Chris

diff -ur Python-2.4.2.orig/Lib/gzip.py Python-2.4.2/Lib/gzip.py
--- Python-2.4.2.orig/Lib/gzip.py 2005-06-09 10:22:07.000000000 -0400
+++ Python-2.4.2/Lib/gzip.py 2006-02-14 13:12:29.000000000 -0500
@@ -433,6 +433,17 @@
else:
raise StopIteration

+ def snapshot(self):
+ if self.mode == READ:
+ raise IOError("Can't create a snapshot in READ mode")
+ return (self.size, self.crc, self.fileobj.tell(), self.offset,
self.compress.copy())
+
+ def restore(self, s):
+ if self.mode == READ:
+ raise IOError("Can't restore a snapshot in READ mode")
+ self.size, self.crc, offset, self.offset, self.compress = s
+ self.fileobj.seek(offset)
+ self.fileobj.truncate()

def _test():
# Act like gzip; with -d, act like gunzip.
diff -ur Python-2.4.2.orig/Lib/tarfile.py Python-2.4.2/Lib/tarfile.py
--- Python-2.4.2.orig/Lib/tarfile.py 2005-08-27 06:08:21.000000000
-0400
+++ Python-2.4.2/Lib/tarfile.py 2006-02-14 16:50:41.000000000 -0500
@@ -1825,6 +1825,28 @@
"""
if level <= self.debug:
print >> sys.stderr, msg
+
+ def snapshot(self):
+ """Save the current state of the tarfile
+ """
+ self._check("_aw")
+ if hasattr(self.fileobj, "snapshot"):
+ return self.fileobj.snapshot(), self.offset,
self.members[:]
+ else:
+ return self.fileobj.tell(), self.offset, self.members[:]
+
+ def restore(self, s):
+ """Restore the state of the tarfile from a previous snapshot
+ """
+ self._check("_aw")
+ if hasattr(self.fileobj, "restore"):
+ snapshot, self.offset, self.members = s
+ self.fileobj.restore(snapshot)
+ else:
+ offset, self.offset, self.members = s
+ self.fileobj.seek(offset)
+ self.fileobj.truncate()
+
# class TarFile

class TarIter:
diff -ur Python-2.4.2.orig/Modules/zlibmodule.c
Python-2.4.2/Modules/zlibmodule.c
--- Python-2.4.2.orig/Modules/zlibmodule.c 2004-12-28
15:12:31.000000000 -0500
+++ Python-2.4.2/Modules/zlibmodule.c 2006-02-14 14:05:35.000000000
-0500
@@ -653,6 +653,36 @@
return RetVal;
}

+PyDoc_STRVAR(comp_copy__doc__,
+"copy() -- Return a copy of the compression object.");
+
+static PyObject *
+PyZlib_copy(compobject *self, PyObject *args)
+{
+ compobject *retval;
+
+ retval = newcompobject(&Comptype);
+
+ /* Copy the zstream state */
+ /* TODO: Are the ENTER / LEAVE needed? */
+ ENTER_ZLIB
+ deflateCopy(&retval->zst, &self->zst);
+ LEAVE_ZLIB
+
+ /* Make references to the original unused_data and unconsumed_tail
+ * They're not used by compression objects so we don't have to do
+ * anything special here */
+ retval->unused_data = self->unused_data;
+ retval->unconsumed_tail = self->unconsumed_tail;
+ Py_INCREF(retval->unused_data);
+ Py_INCREF(retval->unconsumed_tail);
+
+ /* Mark it as being initialized */
+ retval->is_initialised = 1;
+
+ return (PyObject*)retval;
+}
+
PyDoc_STRVAR(decomp_flush__doc__,
"flush() -- Return a string containing any remaining decompressed
data.\n"
"\n"
@@ -723,6 +753,8 @@
comp_compress__doc__},
{"flush", (binaryfunc)PyZlib_flush, METH_VARARGS,
comp_flush__doc__},
+ {"copy", (binaryfunc)PyZlib_copy, METH_VARARGS,
+ comp_copy__doc__},
{NULL, NULL}
};

Feb 14 '06 #1
1 2326
No comments?

I found a small bug in TarFile.snapshot() / restore() - they need to
save and restore self.inodes as well.

Feb 16 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Kirsten | last post by:
Hi @all, today I have implemented the ZLib library (1.2.1) in my C++ project. Using the function compress2(...) (Compression level 1) I have encoded my string. Then I wanted to uncompress the...
3
by: Alan Toppen | last post by:
I was unable to use the ZipFile class in the zipfile module in Python2.4. I got an error that zlib could not be found. Comparing my Python 2.2 installation I noticed Python 2.4 was missing a...
1
by: MuZZy | last post by:
Hello, I am pretty new to .NET programming and probably my question has an obvious answer: I am starting to port an existing c++ application to c#.NET The first problem i am facing is that the...
1
by: Dennis Powell | last post by:
Does anyone have a successful implementaion of the zlib.dll in VB. Net they can show me. I'm writting a class encaplsulating zlib functionality and I keep getting a System.NullReferenceException...
1
by: Leif Wessman | last post by:
I enabled automatic gzip compression with the following lines in ..htaccess: php_value zlib.output_compression On php_value zlib.output_compression_level 5 The problem is that the...
1
by: DLPnet | last post by:
Hello all, I m not sure if it is the good newsgroup, please fu to the correct one if you know (already tried comp.compression with no answer) I m really knew to the use of the zlib in C++ to...
4
by: Anonymous | last post by:
Slightly OT, but can't find a zlib specific ng - so hopefuly, someone can point out why uncompressed strings are not matching the original strings (lots of strange characters at end of string). ...
0
by: Bint | last post by:
Hello, I am trying to decompress some data in a file, from PHP. It's data that has been zlib-compressed on a handheld device and sent wirelessly to the PHP server. I can open the file and...
5
by: tombrogan3 | last post by:
Hi, I need to implement in-memory zlib compression in c# to replace an old c++ app. Pre-requisites.. 1) The performance must be FAST (with source memory sizes from a few k to a meg). 2) The...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.