By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,629 Members | 1,222 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,629 IT Pros & Developers. It's quick & easy.

Cache a large list to disk

P: n/a
I have a set of routines, the first of which reads lots and lots of
data from disparate regions of disk. This read routine takes 40
minutes on a P3-866 (with IDE drives). This routine populates an
array with a number of dictionaries, e.g.,

[{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},
{'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
{'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
{'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
(not actually the data i'm reading)

This information is acted upon by subsequent routines. These routines
change very often, but the data changes very infrequently (the
opposite pattern of what I'm used to). This data changes once per
week, so I can safely cache this data to a big file on disk, and read
out of this big file -- rather than having to read about 10,000 files
-- when the program is loaded.

Now, if this were C I'd know how to do this in a pretty
straightforward manner. But being new to Python, I don't know how I
can (hopefully easily) write this data to a file, and then read it out
into memory on subsequent launches.

If anyone can provide some pointers, or even some sample code on how
to accomplish this, it would be greatly appreciated.

Thanks in advance for any help.
-cjl
Jul 18 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Chris wrote:
week, so I can safely cache this data to a big file on disk, and read
out of this big file -- rather than having to read about 10,000 files
-- when the program is loaded.

Now, if this were C I'd know how to do this in a pretty
straightforward manner. But being new to Python, I don't know how I
can (hopefully easily) write this data to a file, and then read it out
into memory on subsequent launches.


Have a look at pickle:
data = [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0}, .... {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
.... {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
.... {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
import cPickle as pickle # cPickle is pickle implemented in C
pickle.dump(data, file("tmp.pickle", "w"))
data_reloaded = pickle.load(file("tmp.pickle"))
data_reloaded == data, data_reloaded is data (True, False) data_reloaded

[{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0}, {'el2': 15, 'el
3': 21, 'el1': 9, 'el4': 33, 'el5': 51}, {'el2': 35, 'el3': 49, 'el1
': 21, 'el4': 77, 'el5': 119}, {'el2': 45, 'el3': 63, 'el1': 27, 'el
4': 99, 'el5': 153}]

Peter

Jul 18 '05 #2

P: n/a

Use the pickle or shelve modules.

http://www.python.org/doc/current/li...le-pickle.html

http://www.python.org/doc/current/li...le-shelve.html
> "Chris" == Chris <ia*******@hotmail.com> writes:

Chris> straightforward manner. But being new to Python, I
Chris> don't know how I can (hopefully easily) write this data
Chris> to a file, and then read it out into memory on
Chris> subsequent launches.

--
Karl 2004-05-17 13:44
Jul 18 '05 #3

P: n/a
Peter Otten wrote:
Chris wrote:
week, so I can safely cache this data to a big file on disk, and read
out of this big file -- rather than having to read about 10,000 files
-- when the program is loaded.

Now, if this were C I'd know how to do this in a pretty
straightforward manner. But being new to Python, I don't know how I
can (hopefully easily) write this data to a file, and then read it out
into memory on subsequent launches.


Have a look at pickle:
data = [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0}, ... {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
... {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
... {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
import cPickle as pickle # cPickle is pickle implemented in C


And, yes, cPickle is faster. A lot faster.

There are switches you can throw to have it use binary instead of sticking
to readable characters for some savings, too.
Jul 18 '05 #4

P: n/a
ia*******@hotmail.com (Chris) writes:
I have a set of routines, the first of which reads lots and lots of
data from disparate regions of disk. This read routine takes 40
minutes on a P3-866 (with IDE drives). This routine populates an
array with a number of dictionaries, e.g.,

[{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},
{'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
{'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
{'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
(not actually the data i'm reading)
The dict entries are the same for each list item?
Now, if this were C I'd know how to do this in a pretty
straightforward manner. But being new to Python, I don't know how I
can (hopefully easily) write this data to a file, and then read it out
into memory on subsequent launches.

If anyone can provide some pointers, or even some sample code on how
to accomplish this, it would be greatly appreciated.


I dunno what the question is. You can open files, seek on them, etc.
in Python just like in C. You can use the mmap module to map a file
into memory. If you want to lose some efficiency, you can write out
the Python objects (dicts, lists, etc) with the pickle or cpickle modules.
Jul 18 '05 #5

P: n/a
Ah, poifect =) This'll do just fine. Thanks, folks.
Karl Chen <qu***@nospam.quarl.org> wrote in message news:<ma************************************@pytho n.org>...
Use the pickle or shelve modules.

http://www.python.org/doc/current/li...le-pickle.html

http://www.python.org/doc/current/li...le-shelve.html
>> "Chris" == Chris <ia*******@hotmail.com> writes:

Chris> straightforward manner. But being new to Python, I
Chris> don't know how I can (hopefully easily) write this data
Chris> to a file, and then read it out into memory on
Chris> subsequent launches.

Jul 18 '05 #6

P: n/a
Chris <ia*******@hotmail.com> wrote:
I have a set of routines, the first of which reads lots and lots of
data from disparate regions of disk. This read routine takes 40
minutes on a P3-866 (with IDE drives). This routine populates an
array with a number of dictionaries, e.g.,

[{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},
{'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
{'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
{'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
(not actually the data i'm reading)

This information is acted upon by subsequent routines. These routines
change very often, but the data changes very infrequently (the
opposite pattern of what I'm used to). This data changes once per
week, so I can safely cache this data to a big file on disk, and read
out of this big file -- rather than having to read about 10,000 files
-- when the program is loaded.

Now, if this were C I'd know how to do this in a pretty
straightforward manner. But being new to Python, I don't know how I
can (hopefully easily) write this data to a file, and then read it out
into memory on subsequent launches.

If anyone can provide some pointers, or even some sample code on how
to accomplish this, it would be greatly appreciated.


as already mentioned, use cPickle or shelve
However, depending how big and how many your dictionaries are,
you can use *dbm databases instead of dictionaries, with numbers
packed up using struct module (I found out it is sometimes much
efficient than using shelve).
Looking at your sample, you could even reorganize the data as:
{'el2': [0, 15, 35, 45],
'el3': [0, 21, 49, 63],
...
}
and use one big dbm database, with lists represented as array objects -
that is going to give you major memory efficiency boost.

If the arrays are going to be big (like really BIG, of some tens
of megabytes), you can store them one per file, and use mmap
to access them - I am doing now something similar
--
-----------------------------------------------------------
| Radovan GarabĂ*k http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
Jul 18 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.