469,336 Members | 5,570 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,336 developers. It's quick & easy.

Unpickling crashing my machine


No response, so I'm reposting this ad it seems an "interesting" problem...

I have a huge dataset which contains a lot of individual records
represented by class instances.

I pickle this to a file :

way #1 :
for object in objects :
cPickle.dump( object, myfile, -1 )

way #2 :
p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )

When I try to unpickle this big file :

p = cPickle.Unpickler( open( ... ))
many times p.load()... display a progress counter...

Loading the file generated by #1 works fine, with linear speed.
Loading the file generated by #2 :
- the progress counter runs as fast as #1
- eats all memory, then swap
- when eating swap, the progress counter slows down a lot (of course)
- and the process must be killed to save the machine.

I'm talking lots of memory here. The pickled file is about 80 MB, when
loaded it fits into RAM no problem.
However I killed the #2 when it had already hogged about 700 Mb of RAM,
and showed no sign of wanting to stop.

What's the problem ?
Jul 18 '05 #1
2 1157
Pierre-Frédéric Caillaud wrote:

No response, so I'm reposting this ad it seems an "interesting" problem...

I have a huge dataset which contains a lot of individual records
represented by class instances.

I pickle this to a file :

way #1 :
for object in objects :
cPickle.dump( object, myfile, -1 )

way #2 :
p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )

When I try to unpickle this big file :

p = cPickle.Unpickler( open( ... ))
many times p.load()... display a progress counter...

Loading the file generated by #1 works fine, with linear speed.
Loading the file generated by #2 :
- the progress counter runs as fast as #1
- eats all memory, then swap
- when eating swap, the progress counter slows down a lot (of course)
- and the process must be killed to save the machine.

I'm talking lots of memory here. The pickled file is about 80 MB, when
loaded it fits into RAM no problem.
However I killed the #2 when it had already hogged about 700 Mb of RAM,
and showed no sign of wanting to stop.

What's the problem ?


I have just tried to pickle the same object twice using both methods you
describe. The file created using the Pickler is shorter than the one
written by dump(), which I suppose creates a new pickler for every call.
That means that the pickler keeps a cache of objects already written and
therefore the Unpickler must do the same. I believe that what you see is
the excessive growth of that cache.

Peter

Jul 18 '05 #2

Peter> Pierre-Frédéric Caillaud wrote:
way #1 :
for object in objects :
cPickle.dump( object, myfile, -1 )

way #2 :
p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )

Loading the file generated by #2 :
- the progress counter runs as fast as #1
- eats all memory, then swap

...

Peter> I believe that what you see is the excessive growth of that
Peter> cache.

Correct. If each object dumped is independent of all the other objects
being dumped you should clear the memo after each dump() call in the second
case:

p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )
p.memo_clear()

The first case doesn't suffer from this problem because each call to
cPickle.dump() creates a new Pickler, and thus a new memo.

Skip
Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Bob | last post: by
3 posts views Thread by Siobhan Perricone | last post: by
7 posts views Thread by Jeffrey Barrett | last post: by
2 posts views Thread by Boris Borcic | last post: by
7 posts views Thread by Steve Bergman | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by Marylou17 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.