By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,550 Members | 1,189 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,550 IT Pros & Developers. It's quick & easy.

Berkely Db. How to iterate over large number of keys "quickly"

P: n/a
I have a berkely db and Im using the bsddb module to access it. The Db
is quite huge (anywhere from 2-30GB). I want to iterate over the keys
serially.
I tried using something basic like

for key in db.keys()

but this takes lot of time. I guess Python is trying to get the list
of all keys first and probbaly keep it in memory. Is there a way to
avoid this, since I just want to access keys serially. I mean is there
a way I can tell Python to not load all keys, but try to access it as
the loop progresses(like in a linked list). I could find any accessor
methonds on bsddb to this with my initial search.
I am guessing BTree might be a good choice here, but since while the
Dbs were written it was opened using hashopen, Im not able to use
btopen when I want to iterate over the db.

Aug 2 '07 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Sorry, Just a small correction,
a way I can tell Python to not load allkeys, but try to access it as
the loop progresses(like in a linked list). I could find any accessor
methonds on bsddb to this with my initial search.
I meant, "I couldn't find any accessor methonds on bsddb to do
this(i.e accesing like in a linked list) with my initial search"
I am guessing BTree might be a good choice here, but since while the
Dbs were written it was opened using hashopen, Im not able to use
btopen when I want to iterate over the db.

Aug 2 '07 #2

P: n/a
On Thu, 2007-08-02 at 19:43 +0000, lazy wrote:
I have a berkely db and Im using the bsddb module to access it. The Db
is quite huge (anywhere from 2-30GB). I want to iterate over the keys
serially.
I tried using something basic like

for key in db.keys()

but this takes lot of time. I guess Python is trying to get the list
of all keys first and probbaly keep it in memory. Is there a way to
avoid this, since I just want to access keys serially. I mean is there
a way I can tell Python to not load all keys, but try to access it as
the loop progresses(like in a linked list). I could find any accessor
methonds on bsddb to this with my initial search.
I am guessing BTree might be a good choice here, but since while the
Dbs were written it was opened using hashopen, Im not able to use
btopen when I want to iterate over the db.
try instead

key = db.firstkey()
while key != None:
# do something with db[key]
key = db.nextkey(key)
Aug 2 '07 #3

P: n/a
On Thu, Aug 02, 2007 at 07:43:58PM -0000, lazy wrote:
I have a berkely db and Im using the bsddb module to access it. The Db
is quite huge (anywhere from 2-30GB). I want to iterate over the keys
serially.
I tried using something basic like

for key in db.keys()

but this takes lot of time. I guess Python is trying to get the list
of all keys first and probbaly keep it in memory. Is there a way to
avoid this, since I just want to access keys serially.
Does db.iterkeys() work better?

Christoph

Aug 2 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.