je****@unpythonic.net wrote:
As far as I know, there is not a prefabbed solution for this problem.
One issue that you must solve is the issue of buffering (when must some
data you've written to the compressor really go out to the other side) and the
issue of what to do when a read() or recv() reads gzipped bytes but these
don't produce any additional unzipped bytes---this is a problem because normally a
read() that returns '' indicates end-of-file.
If you only work with whole files at a time, then one easy thing to
do is use the 'zlib' encoding: >>> "abc".encode("zlib") "x\x9cKLJ\x06\x00\x02M\x01'" >>> _.decode("zlib")
'abc'
... but because zlib isn't self-delimiting, this won't work if you
want to write() multiple times, or if you want to read() less than the full file
That's basically a solved problem; zlib does have a kind of
self-delimiting. The key is the 'flush' method of the
compression object:
some_send_function( compressor.flush(Z_SYNC_FLUSH) )
The Python module doc is unclear/wrong on this, but zlib.h
explains:
If the parameter flush is set to Z_SYNC_FLUSH, all pending
output is flushed to the output buffer and the output is
aligned on a byte boundary, so that the decompressor can get
all input data available so far.
There's also Z_FULL_FLUSH, which also re-sets the compression
dictionary. For a stream socket, we'd usually want to keep the
dictionary, since that's what gives us the compression. The
Python doc states:
Z_SYNC_FLUSH and Z_FULL_FLUSH allow compressing further
strings of data and are used to allow partial error recovery
on decompression
That's not correct. Z_FULL_FLUSH allows recovery after errors,
but Z_SYNC_FLUSH is just to allow pushing all the compressor's
input to the decompressor's output.
--
--Bryan