On May 9, 4:01 pm, sinoo...@yahoo.com wrote:
Thanks for the info Nick. I plan on accessing the data in pretty much
random order, and once the database is built, it will be read only.
At this point Im not too concerned about access times, just getting
something to work. I've been messing around with both bt and hash with
limited success, which led me to think that maybe I was going beyond
some internal limit for the data size.It works great on a limited set
of data, but once I turn it loose on the full set, usually several
hours later, it either causes a hard reset of my machine or the HD
grinds on endlessly with no apparent progress. Is there a limit to
the size of data you can place per key?
Thanks for the MySQL suggestion, I'll take a look.
-JM
JM,
If you want, take a look at my PyDBTable on
www.psipy.com.
The description and the examples section is being finished but the
source API documentation will help you.
It is a fast Python wrapper around MySQL, PostgreSQL or SQLite. It is
very fast and buffers queries and insertions. You just set up the
database and then pass the connection parameters to the initializer
method. After that you can use the pydb object as a dictionary of
{ primary_key : list_of_values }. You can even create indices on
individual fields and query with queries like :
---------------------------------------------------------------------------------------------------------
pydb.query( ['id','data_field1'], ('id','<',10),
('data_field1','LIKE','Hello%') )
--------------------------------------------------------------------------------------------------------
Which will translate into the SQL query like :
----------------------------------------------------------------------------------------------------------------------
SELECT id, data_field1 FROM ... WHERE id<10 AND data_field1 LIKE 'Hello
%'
----------------------------------------------------------------------------------------------------------------------
and return an __iterator__.
The iterator as a the result is excellent because you can iterate over
results much larger than your virtual memory. But in the background
PyDBTable will retrieve rows from the database in large batches and
cache them as to optimise I/O.
Anyway, on my machine PyDBTable saturates the disk I/O (it runs as
fast as a pure MySQL query).
Take care,
-Nick Vatamaniuc