473,385 Members | 1,523 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Is shelve/dbm supposed to be this inefficient?

I am using shelve to store some data since it is probably the best solution
to my "data formats, number of columns, etc can change at any time"
problem. However, I seem to be dealing with bloat.

My original data is 33MB. When each row is converted to python lists, and
inserted into a shelve DB, it balloons to 69MB. Now, there is some
additional data in there namely a list of all the keys containing data (vs.
the keys that contain version/file/config information), BUT if I copy all
the data over to a dict and dump the dict to a file using cPickle, that
file is only 49MB. I'm using pickle protocol 2 in both cases.

Is this expected? Is there really that much overhead to using shelve and dbm
files? Are there any similar solutions that are more space efficient? I'd
use straight pickle.dump, but loading requires pulling the entire thing
into memory, and I don't want to have to do that every time.

[Note, for those that might suggest a standard DB. Yes, I'd like to use a
regular DB, but I have a domain where the number of data points in a sample
may change at any time, so a timestamp-keyed dict is arguably the best
solution, thus my use of shelve.]

Thanks for any pointers.

j

--
Joshua Kugler
Lead System Admin -- Senior Programmer
http://www.eeinternet.com
PGP Key: http://pgp.mit.edu/ *ID 0xDB26D7CE

Aug 1 '07 #1
1 1843
On Wed, 01 Aug 2007 15:47:21 -0800, Joshua J. Kugler wrote:
My original data is 33MB. When each row is converted to python lists, and
inserted into a shelve DB, it balloons to 69MB. Now, there is some
additional data in there namely a list of all the keys containing data (vs.
the keys that contain version/file/config information), BUT if I copy all
the data over to a dict and dump the dict to a file using cPickle, that
file is only 49MB. I'm using pickle protocol 2 in both cases.

Is this expected? Is there really that much overhead to using shelve and dbm
files? Are there any similar solutions that are more space efficient? I'd
use straight pickle.dump, but loading requires pulling the entire thing
into memory, and I don't want to have to do that every time.
You did not say how many records you store. If the underlying DB used by
`shelve` works with a hash table it may be expected to see that "bloat".
It's a space vs. speed trade off then.

Ciao,
Marc 'BlackJack' Rintsch
Aug 2 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Rami A. Kishek | last post by:
Hi - this mysterious behavior with shelve is just about to kill me. I hope someone here can shed some light. First of all, I have this piece of code which uses shelve to save instances of some...
1
by: Kris Caselden | last post by:
Python's docs say that Shelve uses Pickle to serialize its data. However, I've noticed that Pickle can maintain internal links, while Shelve cannot. For instance: >>> d =...
0
by: seth | last post by:
Last week I encountered an AttributeError in my unit tests that I wasn'table to catch with an "except AttributeError" statement. The problem stemmed from a class that raised an error inside...
0
by: ex laguna | last post by:
Hi, I have ran into a problem with py2exe 0.5.0 and shelve in python 2.3.3. The script works fine standalone, but not with py2exe. Does anyone have a solution of workaround for this? Thanks...
1
by: Paul Rubin | last post by:
class x: pass z = x() z.a = 'a' d = {'a': z} for i in range(5): print id(d) prints the same id 5 times as you'd expect.
3
by: Michele Petrazzo | last post by:
Hi, I'm trying a script on a debian 3.1 that has problems on shelve library. The same script work well on a fedora 2 and I don't know why it create this problem on debian: #extract from my code...
13
by: 7stud | last post by:
test1.py: -------------------- import shelve s = shelve.open("/Users/me/2testing/dir1/aaa.txt") s = "red" s.close() --------output:------ $ python test1.py
5
by: gluckj | last post by:
Hi, I'm not a Win ME fan myself (I'm a Mac user), but I'm here in Thailand developing software for special-needs kids, and the test PC in my home office is a Win ME machine (sigh). So when I...
1
by: Matthew Schibler | last post by:
I'm a newbie to Python, with some experience using perl (where I used nested arrays and hashes extensively). I am building a script in python for a MUD I play, and I want to use the shelve module...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.