Sean Davis wrote:

I have a set of numpy arrays which I would like to save to a gzip

file. Here is an example without gzip:

b=numpy.ones(1000000,dtype=numpy.uint8)

a=numpy.zeros(1000000,dtype=numpy.uint8)

fd = file('test.dat','wb')

a.tofile(fd)

b.tofile(fd)

fd.close()

This works fine. However, this does not:

fd = gzip.open('test.dat','wb')

a.tofile(fd)

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

IOError: first argument must be a string or open file

As drobinow says, the .tofile() method needs an actual file object with a real

FILE* pointer underneath it. You will need to call fd.write() on strings (or

buffers) made from the arrays instead. If your arrays are large (as they must be

if compression helps), then you will probably want to split it up. Use

numpy.array_split() to do this. For example:

In [13]: import numpy

In [14]: a=numpy.zeros(1000000,dtype=numpy.uint8)

In [15]: chunk_size = 256*1024

In [17]: import gzip

In [18]: fd = gzip.open('foo.gz', 'wb')

In [19]: for chunk in numpy.array_split(a, len(a) // chunk_size):

....: fd.write(buffer(chunk))

....:

In the bigger picture, I want to be able to write multiple numpy

arrays with some metadata to a binary file for very fast reading, and

these arrays are pretty compressible (strings of small integers), so I

can probably benefit in speed and file size by gzipping.

File size perhaps, but I suspect the speed gains you get will be swamped by the

Python-level manipulation you will have to do to reconstruct the array. You will

have to read in (partial!) strings and then put the data into an array. If you

think compression will really help, look into PyTables. It uses the HDF5 library

which includes the ability to compress arrays with gzip and other compression

schemes. All of the decompression happens in C, so you don't have to do all of

the manipulations at the Python level. If you stand to gain anything from

compression, this is the best way to find out and probably the best way to

implement it, too.

http://www.pytables.org
If you have more numpy questions, you will probably want to ask on the numpy

mailing list:

http://www.scipy.org/Mailing_Lists
--

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma

that is made terrible by our own mad attempt to interpret it as though it had

an underlying truth."

-- Umberto Eco