473,385 Members | 1,712 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Numpy array to gzip file

I have a set of numpy arrays which I would like to save to a gzip
file. Here is an example without gzip:

b=numpy.ones(1000000,dtype=numpy.uint8)
a=numpy.zeros(1000000,dtype=numpy.uint8)
fd = file('test.dat','wb')
a.tofile(fd)
b.tofile(fd)
fd.close()

This works fine. However, this does not:

fd = gzip.open('test.dat','wb')
a.tofile(fd)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: first argument must be a string or open file

In the bigger picture, I want to be able to write multiple numpy
arrays with some metadata to a binary file for very fast reading, and
these arrays are pretty compressible (strings of small integers), so I
can probably benefit in speed and file size by gzipping.

Thanks,
Sean
Jun 27 '08 #1
3 5921
On Jun 11, 9:17 am, Sean Davis <seand...@gmail.comwrote:
I have a set of numpy arrays which I would like to save to a gzip
file. Here is an example without gzip:

b=numpy.ones(1000000,dtype=numpy.uint8)
a=numpy.zeros(1000000,dtype=numpy.uint8)
fd = file('test.dat','wb')
a.tofile(fd)
b.tofile(fd)
fd.close()

This works fine. However, this does not:

fd = gzip.open('test.dat','wb')
a.tofile(fd)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: first argument must be a string or open file

In the bigger picture, I want to be able to write multiple numpy
arrays with some metadata to a binary file for very fast reading, and
these arrays are pretty compressible (strings of small integers), so I
can probably benefit in speed and file size by gzipping.

Thanks,
Sean
Use
fd.write(a)

The documentation says that gzip simulates most of the methods of a
file object.
Apparently that means it does not subclass it. numpy.tofile wants a
file object
Or something like that.
Jun 27 '08 #2
On Jun 11, 12:42 pm, "drobi...@gmail.com" <drobi...@gmail.comwrote:
On Jun 11, 9:17 am, Sean Davis <seand...@gmail.comwrote:
I have a set of numpy arrays which I would like to save to a gzip
file. Here is an example without gzip:
b=numpy.ones(1000000,dtype=numpy.uint8)
a=numpy.zeros(1000000,dtype=numpy.uint8)
fd = file('test.dat','wb')
a.tofile(fd)
b.tofile(fd)
fd.close()
This works fine. However, this does not:
fd = gzip.open('test.dat','wb')
a.tofile(fd)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: first argument must be a string or open file
In the bigger picture, I want to be able to write multiple numpy
arrays with some metadata to a binary file for very fast reading, and
these arrays are pretty compressible (strings of small integers), so I
can probably benefit in speed and file size by gzipping.
Thanks,
Sean

Use
fd.write(a)
That seems to work fine. Just to add to the answer a bit, one can
then use:

b=numpy.frombuffer(fd.read(),dtype=numpy.uint8)

to get the array back as a numpy uint8 array.

Thanks for the help.

Sean
Jun 27 '08 #3
Sean Davis wrote:
I have a set of numpy arrays which I would like to save to a gzip
file. Here is an example without gzip:

b=numpy.ones(1000000,dtype=numpy.uint8)
a=numpy.zeros(1000000,dtype=numpy.uint8)
fd = file('test.dat','wb')
a.tofile(fd)
b.tofile(fd)
fd.close()

This works fine. However, this does not:

fd = gzip.open('test.dat','wb')
a.tofile(fd)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: first argument must be a string or open file
As drobinow says, the .tofile() method needs an actual file object with a real
FILE* pointer underneath it. You will need to call fd.write() on strings (or
buffers) made from the arrays instead. If your arrays are large (as they must be
if compression helps), then you will probably want to split it up. Use
numpy.array_split() to do this. For example:

In [13]: import numpy

In [14]: a=numpy.zeros(1000000,dtype=numpy.uint8)

In [15]: chunk_size = 256*1024

In [17]: import gzip

In [18]: fd = gzip.open('foo.gz', 'wb')

In [19]: for chunk in numpy.array_split(a, len(a) // chunk_size):
....: fd.write(buffer(chunk))
....:
In the bigger picture, I want to be able to write multiple numpy
arrays with some metadata to a binary file for very fast reading, and
these arrays are pretty compressible (strings of small integers), so I
can probably benefit in speed and file size by gzipping.
File size perhaps, but I suspect the speed gains you get will be swamped by the
Python-level manipulation you will have to do to reconstruct the array. You will
have to read in (partial!) strings and then put the data into an array. If you
think compression will really help, look into PyTables. It uses the HDF5 library
which includes the ability to compress arrays with gzip and other compression
schemes. All of the decompression happens in C, so you don't have to do all of
the manipulations at the Python level. If you stand to gain anything from
compression, this is the best way to find out and probably the best way to
implement it, too.

http://www.pytables.org

If you have more numpy questions, you will probably want to ask on the numpy
mailing list:

http://www.scipy.org/Mailing_Lists

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Jun 27 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Andrew Felch | last post by:
So I wanted to matrixmultiply , ] * Of course this is impossible, because the number of columns in the
8
by: Bo Peng | last post by:
Dear list, I am writing a Python extension module that needs a way to expose pieces of a big C array to python. Currently, I am using NumPy like the following: PyObject* res =...
2
by: Justin Lemkul | last post by:
Hello all, I am hoping someone out there will be able to help me. I am trying to install a program that utilizes NumPy. In installing NumPy, I realized that I was lacking Atlas. I ran into...
15
by: nikie | last post by:
I'm a little bit stuck with NumPy here, and neither the docs nor trial&error seems to lead me anywhere: I've got a set of data points (x/y-coordinates) and want to fit a straight line through...
2
by: robert | last post by:
in Gnuplot (Gnuplot.utils) the input array will be converted to a Numeric float array as shown below. When I insert a numpy array into Gnuplot like that below, numbers 7.44 are cast to 7.0 Why is...
2
by: Chris Smith | last post by:
Howdy, I'm a college student and for one of we are writing programs to numerically compute the parameters of antenna arrays. I decided to use Python to code up my programs. Up to now I haven't...
5
by: auditory | last post by:
I am a newbie here I am trying to read "space separated floating point data" from file I read about csv module by searching this group, but I couldn't read space separated values with csv....
7
by: vj | last post by:
I've tried to post this to the numpy google group but it seems to be down. My migration seems to be going well. I currently have one issue with using scipy_base.insert. array() array() array(,...
3
by: Duncan Smith | last post by:
Hello, Since moving to numpy I've had a few problems with my existing code. It basically revolves around the numpy scalar types. e.g. ------------------------------------------------ array(,...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.