469,951 Members | 2,338 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,951 developers. It's quick & easy.

memory use with regard to large pickle files

I'm writing a python program that reads in a very large
"pickled" file (consisting of one large dictionary and one
small one), and parses the results out to several binary and hdf
files.

The program works fine, but the memory load is huge. The size of
the pickle file on disk is about 900 Meg so I would theoretically
expect my program to consume about twice that (the dictionary
contained in the pickle file plus its repackaging into other formats),
but instead my program needs almost 5 Gig of memory to run.
Am I being unrealistic in my memory expectations?

I'm running Python 2.5 on a Linux box (Fedora release 7).

Is there a way to see how much memory is being consumed
by a single data structure or variable? How can I go about
debugging this problem?

Catherine
Oct 15 '08 #1
3 5801
The program works fine, but the memory load is huge. The size of
the pickle file on disk is about 900 Meg so I would theoretically
expect my program to consume about twice that (the dictionary
contained in the pickle file plus its repackaging into other formats),
but instead my program needs almost 5 Gig of memory to run.
Am I being unrealistic in my memory expectations?
I would say so, yes. As you use 5GiB of memory, it seems you are
running a 64-bit system.

On such a system, each pointer takes 8 bytes. In addition,
each object takes at least 16 bytes; if it's variable-sized,
it takes at least 24 bytes, plus the actual data in the object.

OTOH, in a pickle, a pointer takes no space, unless it's a
shared pointer (i.e. backwards reference), which takes
as many digits as you need to encode the "object number"
in the pickle. Each primitive object takes only a single byte
overhead (as opposed to 24), causing quite drastic space
reductions. Of course, non-primitive objects take more, as
they need to encode the class they are instances of.
Is there a way to see how much memory is being consumed
by a single data structure or variable? How can I go about
debugging this problem?
In Python 2.6, there is the sys.getsizeof function. For
earlier versions, the asizeof package gives similar results.

Regards,
Martin
Oct 15 '08 #2
In message <gd**********@news.jpl.nasa.gov>, Catherine Moroney wrote:
I'm writing a python program that reads in a very large
"pickled" file (consisting of one large dictionary and one
small one), and parses the results out to several binary and hdf
files.
Job for a database?
Oct 19 '08 #3
Catherine Moroney wrote:
I'm writing a python program that reads in a very large
"pickled" file (consisting of one large dictionary and one
small one), and parses the results out to several binary and hdf
files.

The program works fine, but the memory load is huge. The size of
the pickle file on disk is about 900 Meg so I would theoretically
expect my program to consume about twice that (the dictionary
contained in the pickle file plus its repackaging into other formats),
but instead my program needs almost 5 Gig of memory to run.
Am I being unrealistic in my memory expectations?

I'm running Python 2.5 on a Linux box (Fedora release 7).

Is there a way to see how much memory is being consumed
by a single data structure or variable? How can I go about
debugging this problem?

Catherine
There's always the 'shelve' module.
Oct 19 '08 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

8 posts views Thread by Hans Georg Krauthaeuser | last post: by
1 post views Thread by A.B., Khalid | last post: by
reply views Thread by Mike P. | last post: by
6 posts views Thread by Jim Lewis | last post: by
5 posts views Thread by Chris | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.