473,411 Members | 2,059 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,411 software developers and data experts.

Berkely Db. How to iterate over large number of keys "quickly"

I have a berkely db and Im using the bsddb module to access it. The Db
is quite huge (anywhere from 2-30GB). I want to iterate over the keys
serially.
I tried using something basic like

for key in db.keys()

but this takes lot of time. I guess Python is trying to get the list
of all keys first and probbaly keep it in memory. Is there a way to
avoid this, since I just want to access keys serially. I mean is there
a way I can tell Python to not load all keys, but try to access it as
the loop progresses(like in a linked list). I could find any accessor
methonds on bsddb to this with my initial search.
I am guessing BTree might be a good choice here, but since while the
Dbs were written it was opened using hashopen, Im not able to use
btopen when I want to iterate over the db.

Aug 2 '07 #1
3 2233
Sorry, Just a small correction,
a way I can tell Python to not load allkeys, but try to access it as
the loop progresses(like in a linked list). I could find any accessor
methonds on bsddb to this with my initial search.
I meant, "I couldn't find any accessor methonds on bsddb to do
this(i.e accesing like in a linked list) with my initial search"
I am guessing BTree might be a good choice here, but since while the
Dbs were written it was opened using hashopen, Im not able to use
btopen when I want to iterate over the db.

Aug 2 '07 #2
On Thu, 2007-08-02 at 19:43 +0000, lazy wrote:
I have a berkely db and Im using the bsddb module to access it. The Db
is quite huge (anywhere from 2-30GB). I want to iterate over the keys
serially.
I tried using something basic like

for key in db.keys()

but this takes lot of time. I guess Python is trying to get the list
of all keys first and probbaly keep it in memory. Is there a way to
avoid this, since I just want to access keys serially. I mean is there
a way I can tell Python to not load all keys, but try to access it as
the loop progresses(like in a linked list). I could find any accessor
methonds on bsddb to this with my initial search.
I am guessing BTree might be a good choice here, but since while the
Dbs were written it was opened using hashopen, Im not able to use
btopen when I want to iterate over the db.
try instead

key = db.firstkey()
while key != None:
# do something with db[key]
key = db.nextkey(key)
Aug 2 '07 #3
On Thu, Aug 02, 2007 at 07:43:58PM -0000, lazy wrote:
I have a berkely db and Im using the bsddb module to access it. The Db
is quite huge (anywhere from 2-30GB). I want to iterate over the keys
serially.
I tried using something basic like

for key in db.keys()

but this takes lot of time. I guess Python is trying to get the list
of all keys first and probbaly keep it in memory. Is there a way to
avoid this, since I just want to access keys serially.
Does db.iterkeys() work better?

Christoph

Aug 2 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Anthony Baxter | last post by:
To go along with the 2.4a3 release, here's an updated version of the decorator PEP. It describes the state of decorators as they are in 2.4a3. PEP: 318 Title: Decorators for Functions and...
36
by: Andrea Griffini | last post by:
I did it. I proposed python as the main language for our next CAD/CAM software because I think that it has all the potential needed for it. I'm not sure yet if the decision will get through, but...
40
by: Greg G | last post by:
http://risky-biz.com/new/risky.html I finally got DSL service recently, but I haven't forgotten the agony of waiting for the 64th image to load before I can see ANYTHING on a page. So I will...
2
by: Simon Morgan | last post by:
I hope this isn't OT, I looked for a newsgroup dealing purely with algorithms but none were to be found and seeing as I'm trying to implement this in C I thought this would be the best place. I...
21
by: Helge Jensen | last post by:
I've got some data that has Set structure, that is membership, insert and delete is fast (O(1), hashing). I can't find a System.Collections interface that matches the operations naturally offered...
28
by: john_sips_tea | last post by:
Just tried Ruby over the past two days. I won't bore you with the reasons I didn't like it, however one thing really struck me about it that I think we (the Python community) can learn from. ...
4
by: lawrence k | last post by:
I've a jpeg image that is 514k, which doesn't strike me as very large. Yet I'm running out of error when I try to resize it: Fatal error: Allowed memory size of 20971520 bytes exhausted (tried to...
35
by: erikwickstrom | last post by:
Hi all, I'm sorry about the newbie question, but I've been searching all afternoon and can't find the answer! I'm trying to get this bit of code to work without triggering the IndexError. ...
206
by: WaterWalk | last post by:
I've just read an article "Building Robust System" by Gerald Jay Sussman. The article is here: http://swiss.csail.mit.edu/classes/symbolic/spring07/readings/robust-systems.pdf In it there is a...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.