473,320 Members | 2,054 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

memory use with regard to large pickle files

I'm writing a python program that reads in a very large
"pickled" file (consisting of one large dictionary and one
small one), and parses the results out to several binary and hdf
files.

The program works fine, but the memory load is huge. The size of
the pickle file on disk is about 900 Meg so I would theoretically
expect my program to consume about twice that (the dictionary
contained in the pickle file plus its repackaging into other formats),
but instead my program needs almost 5 Gig of memory to run.
Am I being unrealistic in my memory expectations?

I'm running Python 2.5 on a Linux box (Fedora release 7).

Is there a way to see how much memory is being consumed
by a single data structure or variable? How can I go about
debugging this problem?

Catherine
Oct 15 '08 #1
3 6275
The program works fine, but the memory load is huge. The size of
the pickle file on disk is about 900 Meg so I would theoretically
expect my program to consume about twice that (the dictionary
contained in the pickle file plus its repackaging into other formats),
but instead my program needs almost 5 Gig of memory to run.
Am I being unrealistic in my memory expectations?
I would say so, yes. As you use 5GiB of memory, it seems you are
running a 64-bit system.

On such a system, each pointer takes 8 bytes. In addition,
each object takes at least 16 bytes; if it's variable-sized,
it takes at least 24 bytes, plus the actual data in the object.

OTOH, in a pickle, a pointer takes no space, unless it's a
shared pointer (i.e. backwards reference), which takes
as many digits as you need to encode the "object number"
in the pickle. Each primitive object takes only a single byte
overhead (as opposed to 24), causing quite drastic space
reductions. Of course, non-primitive objects take more, as
they need to encode the class they are instances of.
Is there a way to see how much memory is being consumed
by a single data structure or variable? How can I go about
debugging this problem?
In Python 2.6, there is the sys.getsizeof function. For
earlier versions, the asizeof package gives similar results.

Regards,
Martin
Oct 15 '08 #2
In message <gd**********@news.jpl.nasa.gov>, Catherine Moroney wrote:
I'm writing a python program that reads in a very large
"pickled" file (consisting of one large dictionary and one
small one), and parses the results out to several binary and hdf
files.
Job for a database?
Oct 19 '08 #3
Catherine Moroney wrote:
I'm writing a python program that reads in a very large
"pickled" file (consisting of one large dictionary and one
small one), and parses the results out to several binary and hdf
files.

The program works fine, but the memory load is huge. The size of
the pickle file on disk is about 900 Meg so I would theoretically
expect my program to consume about twice that (the dictionary
contained in the pickle file plus its repackaging into other formats),
but instead my program needs almost 5 Gig of memory to run.
Am I being unrealistic in my memory expectations?

I'm running Python 2.5 on a Linux box (Fedora release 7).

Is there a way to see how much memory is being consumed
by a single data structure or variable? How can I go about
debugging this problem?

Catherine
There's always the 'shelve' module.
Oct 19 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Hans Georg Krauthaeuser | last post by:
Dear all, I have a long running application (electromagnetic compatibility measurements in mode-stirred chambers over GPIB) that use pickle (cPickle) to autosave a class instance with all the...
1
by: A.B., Khalid | last post by:
I wonder if someone can explain what is wrong here. I am pickling a list of dictionaries (see code attached) and unpickling it back using the HIGHEST_PROTOCOL of pickle and cPickle. I am getting an...
0
by: Mike P. | last post by:
Hi all, I'm working on a simulation (can be considered a game) in Python where I want to be able to dump the simulation state to a file and be able to load it up later. I have used the standard...
4
by: AN | last post by:
Greetings, We make an ASP.NET web application and we host it for our customers. We have provisioned hardware and hope to be able to service around 200 customers on this hardware. The web...
6
by: Jim Lewis | last post by:
Pickling an instance of a class, gives "can't pickle instancemethod objects". What does this mean? How do I find the class method creating the problem?
5
by: Chris | last post by:
Why can pickle serialize references to functions, but not methods? Pickling a function serializes the function name, but pickling a staticmethod, classmethod, or instancemethod generates an...
2
by: Nagu | last post by:
I am trying to save a dictionary of size 65000X50 to a local file and I get the memory error problem. How do I go about resolving this? Is there way to partition the pickle object and combine...
0
by: Nagu | last post by:
I am trying to save a dictionary of size 65000X50 to a local file and I get the memory error problem. How do I go about resolving this? Is there way to partition the pickle object and combine...
1
by: Nagu | last post by:
I didn't have the problem with dumping as a string. When I tried to save this object to a file, memory error pops up. I am sorry for the mention of size for a dictionary. What I meant by...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.