Unpickling crashing my machine

Pierre-Frédéric Caillaud

No response, so I'm reposting this ad it seems an "interesting" problem...

I have a huge dataset which contains a lot of individual records
represented by class instances.

I pickle this to a file :

way #1 :
for object in objects :
cPickle.dump( object, myfile, -1 )

way #2 :
p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )

When I try to unpickle this big file :

p = cPickle.Unpickler( open( ... ))
many times p.load()... display a progress counter...

Loading the file generated by #1 works fine, with linear speed.
Loading the file generated by #2 :
- the progress counter runs as fast as #1
- eats all memory, then swap
- when eating swap, the progress counter slows down a lot (of course)
- and the process must be killed to save the machine.

I'm talking lots of memory here. The pickled file is about 80 MB, when
loaded it fits into RAM no problem.
However I killed the #2 when it had already hogged about 700 Mb of RAM,
and showed no sign of wanting to stop.

What's the problem ?

Jul 18 '05 #1

Subscribe Post Reply

1203

Peter Otten

Pierre-Frédéric Caillaud wrote:

No response, so I'm reposting this ad it seems an "interesting" problem...

I have a huge dataset which contains a lot of individual records
represented by class instances.

I pickle this to a file :

way #1 :
for object in objects :
cPickle.dump( object, myfile, -1 )

way #2 :
p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )

When I try to unpickle this big file :

p = cPickle.Unpickler( open( ... ))
many times p.load()... display a progress counter...

Loading the file generated by #1 works fine, with linear speed.
Loading the file generated by #2 :
- the progress counter runs as fast as #1
- eats all memory, then swap
- when eating swap, the progress counter slows down a lot (of course)
- and the process must be killed to save the machine.

I'm talking lots of memory here. The pickled file is about 80 MB, when
loaded it fits into RAM no problem.
However I killed the #2 when it had already hogged about 700 Mb of RAM,
and showed no sign of wanting to stop.

What's the problem ?

I have just tried to pickle the same object twice using both methods you
describe. The file created using the Pickler is shorter than the one
written by dump(), which I suppose creates a new pickler for every call.
That means that the pickler keeps a cache of objects already written and
therefore the Unpickler must do the same. I believe that what you see is
the excessive growth of that cache.

Peter

Jul 18 '05 #2

Skip Montanaro

Peter> Pierre-Frédéric Caillaud wrote:

way #1 :
for object in objects :
cPickle.dump( object, myfile, -1 )

way #2 :
p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )

Loading the file generated by #2 :
- the progress counter runs as fast as #1
- eats all memory, then swap

...

Peter> I believe that what you see is the excessive growth of that
Peter> cache.

Correct. If each object dumped is independent of all the other objects
being dumped you should clear the memo after each dump() call in the second
case:

p = cPickle.Pickler( myfile, -1 )
for object in objects :
p.dump( object )
p.memo_clear()

The first case doesn't suffer from this problem because each call to
cPickle.dump() creates a new Pickler, and thus a new memo.

Skip

Jul 18 '05 #3

Similar topics

Pickling/unpickling extensions types

by: Bob | last post by:

I've read over section 3.14.5.2 of the doc about a zillion times, and it still makes absolutely no sense to me. Can someone please explain it? What I'd like to do is, basically, have the object...

Python

Pickling and unpickling inherited attributes

by: Alex | last post by:

I have a serious problem and I hope there is some solution. It is easier to illustrate with a simple code: >>> class Parent(object): __slots__= def __init__(self, a, b): self.A=a; self.B=b...

Python

Access 2002 Db crashing due to one form element

by: Siobhan Perricone | last post by:

I have a user who is having a problem with an access database that I didn't build and haven't had anything to do with in the past. The database opens up to the switchboard, and she clicks through...

Microsoft Access / VBA

Why is this crashing??

by: Jeffrey Barrett | last post by:

/* machine.txt: ------------------ Cola 0.75 20 Ruby Red Blast 1.00 10 Lemon Fizz 0.75 8 Grape Soda

C / C++

Crashing caused by PrintHandler class?

by: KC | last post by:

Anybody had problems with this 'PrintHandler' class posted at http://www.thescarms.com/dotNet/PrintDataSet.asp I don't know if it was my computer or what, but I kept getting the...

Visual Basic .NET

Segmenting a pickle stream without unpickling

by: Boris Borcic | last post by:

Assuming that the items of my_stream share no content (they are dumps of db cursor fetches), is there a simple way to do the equivalent of def pickles(my_stream) : from cPickle import...

Python

Access 2003 keeps crashing

by: Jeemo | last post by:

I've been tearing my hair out on this one. This effin' database keeps crashing on me while I'm doing query-related operations. System info: Access 2003 SP2 with all the latest updates on an XP...

Microsoft Access / VBA

Queries crashing A2003

by: Jeemo | last post by:

I've been tearing my hair out on this one. This effin' database keeps crashing on me while I'm doing query-related operations. System info: Access 2003 SP2 with all the latest updates on an XP...

Microsoft Access / VBA

How to make this unpickling/sorting demo faster?

by: Steve Bergman | last post by:

I'm involved in a discussion thread in which it has been stated that: """ Anything written in a language that is 20x slower (Perl, Python, PHP) than C/C++ should be instantly rejected by users...

Python

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware