By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,089 Members | 2,418 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,089 IT Pros & Developers. It's quick & easy.

Best dbm to use?

P: n/a
I'm creating an persistant index of a large 63GB file
containing millions of peices of data. For this I would
naturally use one of python's dbm modules. But which is the
best to use?

The index would be created with something like this:
fh=open('file_to_index')
db=dbhash.open('file_to_index.idx')
for obj in fh:
db[obj.name]=fh.tell()

The index should serve two purposes. Random access and
sequential stepped access. Random access could be dealt with
by the hash table ability for example:
fh.seek(db[name])
obj=fh.GetObj()

However, I may want to access the i'th element in the file.
Something like this:
fh.seek(db.GetElement(i))
obj=fh.GetObj()

This is where the hash table breaks down and a b-tree would
serve my purpose better. Is there a unified data structure
that I could use or am I doomed to maintaining two seperate
index's?

Thanks in advance for any help.

-Brian
Sep 7 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
br****@temple.edu wrote:
I'm creating an persistant index of a large 63GB file
containing millions of peices of data. For this I would
naturally use one of python's dbm modules. But which is the
best to use?


BDB4, but consider using sqlite - it's really simple, holds all data in
a single file and it's more supported (in the sense: there are bindings
for sqlite for almost any language/environment out there, and the file
format is stable). It's also very fast and you can later add more
information you want to store (by adding more fields to table).

Sep 7 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.