472,133 Members | 1,036 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,133 software developers and data experts.

Is shelve/dbm supposed to be this inefficient?

I am using shelve to store some data since it is probably the best solution
to my "data formats, number of columns, etc can change at any time"
problem. However, I seem to be dealing with bloat.

My original data is 33MB. When each row is converted to python lists, and
inserted into a shelve DB, it balloons to 69MB. Now, there is some
additional data in there namely a list of all the keys containing data (vs.
the keys that contain version/file/config information), BUT if I copy all
the data over to a dict and dump the dict to a file using cPickle, that
file is only 49MB. I'm using pickle protocol 2 in both cases.

Is this expected? Is there really that much overhead to using shelve and dbm
files? Are there any similar solutions that are more space efficient? I'd
use straight pickle.dump, but loading requires pulling the entire thing
into memory, and I don't want to have to do that every time.

[Note, for those that might suggest a standard DB. Yes, I'd like to use a
regular DB, but I have a domain where the number of data points in a sample
may change at any time, so a timestamp-keyed dict is arguably the best
solution, thus my use of shelve.]

Thanks for any pointers.

j

--
Joshua Kugler
Lead System Admin -- Senior Programmer
http://www.eeinternet.com
PGP Key: http://pgp.mit.edu/ Â*ID 0xDB26D7CE

Aug 1 '07 #1
1 1783
On Wed, 01 Aug 2007 15:47:21 -0800, Joshua J. Kugler wrote:
My original data is 33MB. When each row is converted to python lists, and
inserted into a shelve DB, it balloons to 69MB. Now, there is some
additional data in there namely a list of all the keys containing data (vs.
the keys that contain version/file/config information), BUT if I copy all
the data over to a dict and dump the dict to a file using cPickle, that
file is only 49MB. I'm using pickle protocol 2 in both cases.

Is this expected? Is there really that much overhead to using shelve and dbm
files? Are there any similar solutions that are more space efficient? I'd
use straight pickle.dump, but loading requires pulling the entire thing
into memory, and I don't want to have to do that every time.
You did not say how many records you store. If the underlying DB used by
`shelve` works with a hash table it may be expected to see that "bloat".
It's a space vs. speed trade off then.

Ciao,
Marc 'BlackJack' Rintsch
Aug 2 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

6 posts views Thread by Rami A. Kishek | last post: by
1 post views Thread by Kris Caselden | last post: by
reply views Thread by ex laguna | last post: by
1 post views Thread by Paul Rubin | last post: by
3 posts views Thread by Michele Petrazzo | last post: by
13 posts views Thread by 7stud | last post: by
5 posts views Thread by gluckj | last post: by
1 post views Thread by Matthew Schibler | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.