473,385 Members | 1,983 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Unpickling crashing my machine


No response, so I'm reposting this ad it seems an "interesting" problem...

I have a huge dataset which contains a lot of individual records
represented by class instances.

I pickle this to a file :

way #1 :
for object in objects :
cPickle.dump( object, myfile, -1 )

way #2 :
p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )

When I try to unpickle this big file :

p = cPickle.Unpickler( open( ... ))
many times p.load()... display a progress counter...

Loading the file generated by #1 works fine, with linear speed.
Loading the file generated by #2 :
- the progress counter runs as fast as #1
- eats all memory, then swap
- when eating swap, the progress counter slows down a lot (of course)
- and the process must be killed to save the machine.

I'm talking lots of memory here. The pickled file is about 80 MB, when
loaded it fits into RAM no problem.
However I killed the #2 when it had already hogged about 700 Mb of RAM,
and showed no sign of wanting to stop.

What's the problem ?
Jul 18 '05 #1
2 1203
Pierre-Frédéric Caillaud wrote:

No response, so I'm reposting this ad it seems an "interesting" problem...

I have a huge dataset which contains a lot of individual records
represented by class instances.

I pickle this to a file :

way #1 :
for object in objects :
cPickle.dump( object, myfile, -1 )

way #2 :
p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )

When I try to unpickle this big file :

p = cPickle.Unpickler( open( ... ))
many times p.load()... display a progress counter...

Loading the file generated by #1 works fine, with linear speed.
Loading the file generated by #2 :
- the progress counter runs as fast as #1
- eats all memory, then swap
- when eating swap, the progress counter slows down a lot (of course)
- and the process must be killed to save the machine.

I'm talking lots of memory here. The pickled file is about 80 MB, when
loaded it fits into RAM no problem.
However I killed the #2 when it had already hogged about 700 Mb of RAM,
and showed no sign of wanting to stop.

What's the problem ?


I have just tried to pickle the same object twice using both methods you
describe. The file created using the Pickler is shorter than the one
written by dump(), which I suppose creates a new pickler for every call.
That means that the pickler keeps a cache of objects already written and
therefore the Unpickler must do the same. I believe that what you see is
the excessive growth of that cache.

Peter

Jul 18 '05 #2

Peter> Pierre-Frédéric Caillaud wrote:
way #1 :
for object in objects :
cPickle.dump( object, myfile, -1 )

way #2 :
p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )

Loading the file generated by #2 :
- the progress counter runs as fast as #1
- eats all memory, then swap

...

Peter> I believe that what you see is the excessive growth of that
Peter> cache.

Correct. If each object dumped is independent of all the other objects
being dumped you should clear the memo after each dump() call in the second
case:

p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )
p.memo_clear()

The first case doesn't suffer from this problem because each call to
cPickle.dump() creates a new Pickler, and thus a new memo.

Skip
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Bob | last post by:
I've read over section 3.14.5.2 of the doc about a zillion times, and it still makes absolutely no sense to me. Can someone please explain it? What I'd like to do is, basically, have the object...
9
by: Alex | last post by:
I have a serious problem and I hope there is some solution. It is easier to illustrate with a simple code: >>> class Parent(object): __slots__= def __init__(self, a, b): self.A=a; self.B=b...
3
by: Siobhan Perricone | last post by:
I have a user who is having a problem with an access database that I didn't build and haven't had anything to do with in the past. The database opens up to the switchboard, and she clicks through...
7
by: Jeffrey Barrett | last post by:
/* machine.txt: ------------------ Cola 0.75 20 Ruby Red Blast 1.00 10 Lemon Fizz 0.75 8 Grape Soda
2
by: KC | last post by:
Anybody had problems with this 'PrintHandler' class posted at http://www.thescarms.com/dotNet/PrintDataSet.asp I don't know if it was my computer or what, but I kept getting the...
2
by: Boris Borcic | last post by:
Assuming that the items of my_stream share no content (they are dumps of db cursor fetches), is there a simple way to do the equivalent of def pickles(my_stream) : from cPickle import...
1
by: Jeemo | last post by:
I've been tearing my hair out on this one. This effin' database keeps crashing on me while I'm doing query-related operations. System info: Access 2003 SP2 with all the latest updates on an XP...
6
by: Jeemo | last post by:
I've been tearing my hair out on this one. This effin' database keeps crashing on me while I'm doing query-related operations. System info: Access 2003 SP2 with all the latest updates on an XP...
7
by: Steve Bergman | last post by:
I'm involved in a discussion thread in which it has been stated that: """ Anything written in a language that is 20x slower (Perl, Python, PHP) than C/C++ should be instantly rejected by users...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.