472,373 Members | 1,849 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,373 software developers and data experts.

memory use with regard to large pickle files

I'm writing a python program that reads in a very large
"pickled" file (consisting of one large dictionary and one
small one), and parses the results out to several binary and hdf
files.

The program works fine, but the memory load is huge. The size of
the pickle file on disk is about 900 Meg so I would theoretically
expect my program to consume about twice that (the dictionary
contained in the pickle file plus its repackaging into other formats),
but instead my program needs almost 5 Gig of memory to run.
Am I being unrealistic in my memory expectations?

I'm running Python 2.5 on a Linux box (Fedora release 7).

Is there a way to see how much memory is being consumed
by a single data structure or variable? How can I go about
debugging this problem?

Catherine
Oct 15 '08 #1
3 6145
The program works fine, but the memory load is huge. The size of
the pickle file on disk is about 900 Meg so I would theoretically
expect my program to consume about twice that (the dictionary
contained in the pickle file plus its repackaging into other formats),
but instead my program needs almost 5 Gig of memory to run.
Am I being unrealistic in my memory expectations?
I would say so, yes. As you use 5GiB of memory, it seems you are
running a 64-bit system.

On such a system, each pointer takes 8 bytes. In addition,
each object takes at least 16 bytes; if it's variable-sized,
it takes at least 24 bytes, plus the actual data in the object.

OTOH, in a pickle, a pointer takes no space, unless it's a
shared pointer (i.e. backwards reference), which takes
as many digits as you need to encode the "object number"
in the pickle. Each primitive object takes only a single byte
overhead (as opposed to 24), causing quite drastic space
reductions. Of course, non-primitive objects take more, as
they need to encode the class they are instances of.
Is there a way to see how much memory is being consumed
by a single data structure or variable? How can I go about
debugging this problem?
In Python 2.6, there is the sys.getsizeof function. For
earlier versions, the asizeof package gives similar results.

Regards,
Martin
Oct 15 '08 #2
In message <gd**********@news.jpl.nasa.gov>, Catherine Moroney wrote:
I'm writing a python program that reads in a very large
"pickled" file (consisting of one large dictionary and one
small one), and parses the results out to several binary and hdf
files.
Job for a database?
Oct 19 '08 #3
Catherine Moroney wrote:
I'm writing a python program that reads in a very large
"pickled" file (consisting of one large dictionary and one
small one), and parses the results out to several binary and hdf
files.

The program works fine, but the memory load is huge. The size of
the pickle file on disk is about 900 Meg so I would theoretically
expect my program to consume about twice that (the dictionary
contained in the pickle file plus its repackaging into other formats),
but instead my program needs almost 5 Gig of memory to run.
Am I being unrealistic in my memory expectations?

I'm running Python 2.5 on a Linux box (Fedora release 7).

Is there a way to see how much memory is being consumed
by a single data structure or variable? How can I go about
debugging this problem?

Catherine
There's always the 'shelve' module.
Oct 19 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Hans Georg Krauthaeuser | last post by:
Dear all, I have a long running application (electromagnetic compatibility measurements in mode-stirred chambers over GPIB) that use pickle (cPickle) to autosave a class instance with all the...
1
by: A.B., Khalid | last post by:
I wonder if someone can explain what is wrong here. I am pickling a list of dictionaries (see code attached) and unpickling it back using the HIGHEST_PROTOCOL of pickle and cPickle. I am getting an...
0
by: Mike P. | last post by:
Hi all, I'm working on a simulation (can be considered a game) in Python where I want to be able to dump the simulation state to a file and be able to load it up later. I have used the standard...
4
by: AN | last post by:
Greetings, We make an ASP.NET web application and we host it for our customers. We have provisioned hardware and hope to be able to service around 200 customers on this hardware. The web...
6
by: Jim Lewis | last post by:
Pickling an instance of a class, gives "can't pickle instancemethod objects". What does this mean? How do I find the class method creating the problem?
5
by: Chris | last post by:
Why can pickle serialize references to functions, but not methods? Pickling a function serializes the function name, but pickling a staticmethod, classmethod, or instancemethod generates an...
2
by: Nagu | last post by:
I am trying to save a dictionary of size 65000X50 to a local file and I get the memory error problem. How do I go about resolving this? Is there way to partition the pickle object and combine...
0
by: Nagu | last post by:
I am trying to save a dictionary of size 65000X50 to a local file and I get the memory error problem. How do I go about resolving this? Is there way to partition the pickle object and combine...
1
by: Nagu | last post by:
I didn't have the problem with dumping as a string. When I tried to save this object to a file, memory error pops up. I am sorry for the mention of size for a dictionary. What I meant by...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
1
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...
2
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...
0
by: jack2019x | last post by:
hello, Is there code or static lib for hook swapchain present? I wanna hook dxgi swapchain present for dx11 and dx9.
0
DizelArs
by: DizelArs | last post by:
Hi all) Faced with a problem, element.click() event doesn't work in Safari browser. Tried various tricks like emulating touch event through a function: let clickEvent = new Event('click', {...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.