best small database?

David Isaac

I have no experience with database applications.
This database will likely hold only a few hundred items,
including both textfiles and binary files.

I would like a pure Python solution to the extent reasonable.

Suggestions?

Thank you,
Alan Isaac

Sep 11 '06 #1

Subscribe Reply

5607

Thorsten Kampe

* David Isaac (2006-09-11 14:23 +0100)

I have no experience with database applications.
This database will likely hold only a few hundred items,
including both textfiles and binary files.

I would like a pure Python solution to the extent reasonable.

Gadfly?

Sep 11 '06 #2

Larry Bates

David Isaac wrote:

I have no experience with database applications.
This database will likely hold only a few hundred items,
including both textfiles and binary files.

I would like a pure Python solution to the extent reasonable.

Suggestions?

Thank you,
Alan Isaac

If they are files, just use the filesyatem, you don't need a
database. If you need multiple indexes into these files, then
use a database, but only for the indexes that point to the
files on the filesystem. The filesystem is almost always the
most efficient place to store files, not as blobs in a
database.

The answer about which database depends on your target
platform but you could consider gadfly.

-Larry Bates

Sep 11 '06 #3

Paul Watson

David Isaac wrote:

I have no experience with database applications.
This database will likely hold only a few hundred items,
including both textfiles and binary files.

I would like a pure Python solution to the extent reasonable.

Suggestions?

Thank you,
Alan Isaac

If you want really simple, look at the anydbm module. If nothing better
is available, anydbm will use dumbdbm. All of these are in the Python
build, so you do not need to fetch/read/install anything additional.

Doing the DB-API would be much stronger, but might be overkill in your
situation.

Sep 11 '06 #4

Laurent Pointal

David Isaac a écrit :

I have no experience with database applications.
This database will likely hold only a few hundred items,
including both textfiles and binary files.

I would like a pure Python solution to the extent reasonable.

Suggestions?

May take a look at buzhug (very pythonic way to manipulate data in the
base).

http://buzhug.sourceforge.net/

>
Thank you,
Alan Isaac

Sep 11 '06 #5

Aahz

In article <45**************@redlinepy.com>,
Paul Watson <pw*****@redlinepy.comwrote:

>David Isaac wrote:
>>
I have no experience with database applications.
This database will likely hold only a few hundred items,
including both textfiles and binary files.

I would like a pure Python solution to the extent reasonable.

Suggestions?

If you want really simple, look at the anydbm module. If nothing better
is available, anydbm will use dumbdbm. All of these are in the Python
build, so you do not need to fetch/read/install anything additional.

Doing the DB-API would be much stronger, but might be overkill in your
situation.

Once Python 2.5 comes out, I recommend using sqlite because it avoids
the mess that dbm can cause.
--
Aahz (aa**@pythoncraft.com) <* http://www.pythoncraft.com/

"LL YR VWL R BLNG T S" -- www.nancybuttons.com

Sep 11 '06 #6

Paul McGuire

"Aahz" <aa**@pythoncraft.comwrote in message
news:ee**********@panix3.panix.com...

In article <45**************@redlinepy.com>,
Once Python 2.5 comes out, I recommend using sqlite because it avoids
the mess that dbm can cause.
--
Aahz (aa**@pythoncraft.com) <*>
http://www.pythoncraft.com/

and if you don't want to wait for 2.5, you can install pysqlite without too
much trouble - and it is *very* easy to use!

For SQLite design and data browsing, check out the SQLite Browser at
http://sqlitebrowser.sourceforge.net.

-- Paul

Sep 11 '06 #7

John Salerno

Paul McGuire wrote:

"Aahz" <aa**@pythoncraft.comwrote in message
news:ee**********@panix3.panix.com...
>In article <45**************@redlinepy.com>,
Once Python 2.5 comes out, I recommend using sqlite because it avoids
the mess that dbm can cause.
--
Aahz (aa**@pythoncraft.com) <*>
http://www.pythoncraft.com/

and if you don't want to wait for 2.5, you can install pysqlite without too
much trouble - and it is *very* easy to use!

Yeah, just be sure to do this:

from pysqlite import dbapi2 as sqlite3

then you're ready for 2.5! :)

Sep 11 '06 #8

Thorsten Kampe

* Aahz (2006-09-11 16:34 +0100)

In article <45**************@redlinepy.com>,
Paul Watson <pw*****@redlinepy.comwrote:
>>David Isaac wrote:
>>>
I have no experience with database applications.
This database will likely hold only a few hundred items,
including both textfiles and binary files.

I would like a pure Python solution to the extent reasonable.

Suggestions?

If you want really simple, look at the anydbm module. If nothing better
is available, anydbm will use dumbdbm. All of these are in the Python
build, so you do not need to fetch/read/install anything additional.

Doing the DB-API would be much stronger, but might be overkill in your
situation.

Once Python 2.5 comes out, I recommend using sqlite because it avoids
the mess that dbm can cause.

But sqlite is not "pure Python" because it's just a wrapper around
sqlite (which has to be installed separately)...

Thorsten

Sep 11 '06 #9

John Salerno

Thorsten Kampe wrote:

But sqlite is not "pure Python" because it's just a wrapper around
sqlite (which has to be installed separately)...

But that's the point. Once 2.5 is released, sqlite is built-in. Unless
there's more to it that I don't know, and something must still be
installed? But that makes no sense.

Sep 11 '06 #10

Pierre Quentel

Here are some pure-Python databases :
- gadfly : an SQL engine, mature and well tested, works in memory so
not fit for large data sets
- SnakeSQL : another SQL engine, less mature I think and very slow when
I tested it
- KirbyBase : stores data in a single file ; uses a more Pythonic
syntax (no SQL) ; no size limit but performance decreases very much
with the size. It looked promising but the last version is more than 1
year old and the author seems to focus on the Ruby version now
- buzhug : Pythonic syntax (uses list comprehensions or methods like
create(), select() on the db object), much faster than all the above.
I'm obviously biaised : I wrote it...
- for a small set of data you could also try strakell, the recipe I
published on the Python Cookbook :
http://aspn.activestate.com/ASPN/Coo.../Recipe/496770
With less than 200 lines of code, it's a very fast in-memory db
engine that also uses list comprehensions for requests :

SQL : SELECT name FROM persons WHERE age 20
strakell : [ r["name"] for r in persons if r["age"] 20 ]

You can also create an index : persons.create_index("age")
and then use it like this : persons.age[20] = list of the records where
age = 20

Other pure-Python databases : ZODB (probably overkill for a small
database) and Durus (I didn't test it)

As said in others answers, the inclusion of SQLite in the standard
distribution might make pure-Python solutions less attractive

Regards,
Pierre

Sep 11 '06 #11

Paul Rubin

"David Isaac" <ai*****@verizon.netwrites:

I have no experience with database applications.
This database will likely hold only a few hundred items,
including both textfiles and binary files.

I would like a pure Python solution to the extent reasonable.

I usually use anydbm when I want something quick and simple.

Sep 11 '06 #12

Blair P. Houghton

Larry Bates wrote:

The filesystem is almost always the
most efficient place to store files, not as blobs in a
database.

I could get all theoretical about why that's not so in most cases,
but there are plenty of cases where it is so (especially when the
person doing the DB doesn't get the idea behind all filesystems,
which is that they are themselves simplified databases), so
I won't*.

In this case, the filesystem may be the best place to
do the work, because it's the cheapest to implement
and maintain.

--Blair

* - okay, I will

1. Since the filesystem is a database, making accesses
to it after being directed there by a database means you're
using two database systems (and an intervening operating
system) to do one thing. Serious databases work from
disks with no filesystem to get rid of that extra layer entirely.
But there are benefits to having things in files reachable by
ordinary tools, and to having the OS mediating access to
the data, but you need to be sure you need those benefits
and can afford the overhead. Academic in most cases,
including the one that started this thread.

2. When using the filesystem as the database
you only get one kind of native association, and have to
use semantics in the directory and filenames to give you
hints as to the type stored at a particular location. You get a
few pieces of accounting data (mod times, etc.) in the
directory listing, but can't associate anything else with
the file directly, at least not unless you create another
file that has the associated data in it, or stuff the extra
data in the file itself, but then that makes each file
a database...see where it goes? Sometimes it's better
to come up with a schema you can extend rationally to
fit the problem you are trying to solve.

--Blair

Sep 11 '06 #13

Aahz

In article <Hl******************@news.tufts.edu>,
John Salerno <jo******@NOSPAMgmail.comwrote:

>Thorsten Kampe wrote:
.
>But sqlite is not "pure Python" because it's just a wrapper around
sqlite (which has to be installed separately)...

But that's the point. Once 2.5 is released, sqlite is built-in. Unless
there's more to it that I don't know, and something must still be
installed? But that makes no sense.

2.5 will include the sqlite library itself on Windows (and Macs? I
forget) but you need the to install the library separately on Linux
boxes, which is generally about as complicated as apt-get install
sqlite-dev.
--
Aahz (aa**@pythoncraft.com) <* http://www.pythoncraft.com/

"LL YR VWL R BLNG T S" -- www.nancybuttons.com

Sep 11 '06 #14

Larry Bates

Blair P. Houghton wrote:

Larry Bates wrote:
>The filesystem is almost always the
most efficient place to store files, not as blobs in a
database.

I could get all theoretical about why that's not so in most cases,
but there are plenty of cases where it is so (especially when the
person doing the DB doesn't get the idea behind all filesystems,
which is that they are themselves simplified databases), so
I won't*.

In this case, the filesystem may be the best place to
do the work, because it's the cheapest to implement
and maintain.

--Blair

* - okay, I will

1. Since the filesystem is a database, making accesses
to it after being directed there by a database means you're
using two database systems (and an intervening operating
system) to do one thing. Serious databases work from
disks with no filesystem to get rid of that extra layer entirely.
But there are benefits to having things in files reachable by
ordinary tools, and to having the OS mediating access to
the data, but you need to be sure you need those benefits
and can afford the overhead. Academic in most cases,
including the one that started this thread.

2. When using the filesystem as the database
you only get one kind of native association, and have to
use semantics in the directory and filenames to give you
hints as to the type stored at a particular location. You get a
few pieces of accounting data (mod times, etc.) in the
directory listing, but can't associate anything else with
the file directly, at least not unless you create another
file that has the associated data in it, or stuff the extra
data in the file itself, but then that makes each file
a database...see where it goes? Sometimes it's better
to come up with a schema you can extend rationally to
fit the problem you are trying to solve.

--Blair

Not quite sure why response "bothered" you so much but it
appears it did. I'll admit that I was doing my best to read
the OP's mind in my answer.

Item 1 - The OP who specifically said he wanted to store 100's
of files. You rarely need a database to store 100's of anything
and the overhead of installing and maintaining one isn't typically
worth the effort. Store the info in a text file and read the
entire file into memory and do linear searches. Python can search
100's of items in a list faster than you can even begin an SQL
query.

Item 2 - You will note that I said "If you need multiple indexes
into these files, then use a database, but only for the indexes
that point to the files on the filesystem". You sometimes need
multiple indexes (which databases are GREAT at providing).

As far as "rational extension" is concerned, I think I can relate.
As a developer of imaging systems that store multiple-millions of
scanned pieces of paper online for customers, I can promise you
the file system is quite efficient at storing files (and that is
what the OP asked for in the original post) and way better than
storing in Oracle blobs. Can you store them in the database,
absolutely. Is it efficient and manageable. It has been our
experience that it is not. Ever tried to upgrade Oracle 9 to
Oracle 10 with a Tb of blobs?

-Larry

Sep 11 '06 #15

Thorsten Kampe

* John Salerno (2006-09-11 19:58 +0100)

Thorsten Kampe wrote:

>But sqlite is not "pure Python" because it's just a wrapper around
sqlite (which has to be installed separately)...

But that's the point. Once 2.5 is released, sqlite is built-in. Unless
there's more to it that I don't know, and something must still be
installed? But that makes no sense.

I was under the impression that you still have to install the sqlite
executable but that's only for compiling from source: "If you're
compiling the Python source yourself, note that the source tree
doesn't include the SQLite code, only the wrapper module."

Thorsten

Sep 11 '06 #16

Paul Rubin

Larry Bates <la*********@websafe.comwrites:

As far as "rational extension" is concerned, I think I can relate.
As a developer of imaging systems that store multiple-millions of
scanned pieces of paper online for customers, I can promise you
the file system is quite efficient at storing files (and that is
what the OP asked for in the original post) and way better than
storing in Oracle blobs. Can you store them in the database,
absolutely. Is it efficient and manageable. It has been our
experience that it is not. Ever tried to upgrade Oracle 9 to
Oracle 10 with a Tb of blobs?

I keep hearing complaints about Oracle's blob handling and I don't
doubt they're true, but that sounds like an Oracle problem. I haven't
had any problems using blobs in MySQL though I've been a fairly
lightweight user.

Sep 11 '06 #17

Alex Martelli

Thorsten Kampe <th******@thorstenkampe.dewrote:

* John Salerno (2006-09-11 19:58 +0100)
Thorsten Kampe wrote:

But sqlite is not "pure Python" because it's just a wrapper around
sqlite (which has to be installed separately)...
But that's the point. Once 2.5 is released, sqlite is built-in. Unless
there's more to it that I don't know, and something must still be
installed? But that makes no sense.

I was under the impression that you still have to install the sqlite
executable but that's only for compiling from source: "If you're
compiling the Python source yourself, note that the source tree
doesn't include the SQLite code, only the wrapper module."

You don't _need_ to install the SQlite executable[s] -- maybe the
_libraries_, unless they come bundled w/your Python distro (typically
the case on Win and Mac, but some "sumo distros" for other OSs may
choose to do the same).
Alex

Sep 12 '06 #18

Larry Bates

Paul Rubin wrote:

Larry Bates <la*********@websafe.comwrites:
>As far as "rational extension" is concerned, I think I can relate.
As a developer of imaging systems that store multiple-millions of
scanned pieces of paper online for customers, I can promise you
the file system is quite efficient at storing files (and that is
what the OP asked for in the original post) and way better than
storing in Oracle blobs. Can you store them in the database,
absolutely. Is it efficient and manageable. It has been our
experience that it is not. Ever tried to upgrade Oracle 9 to
Oracle 10 with a Tb of blobs?

I keep hearing complaints about Oracle's blob handling and I don't
doubt they're true, but that sounds like an Oracle problem. I haven't
had any problems using blobs in MySQL though I've been a fairly
lightweight user.

For small numbers of blobs it works fine. The problem comes about,
more specifically, because Oracle's method for upgrading from one
version to another is Export, create new database, Import. Exporting
of a large number of blobs is slow, requires lots of disk space, etc.
If the blobs are on the filesystem with a pointer in the database,
upgrading is is MUCH easier. Granted I'm talking about millions of
pages of scanned .TIF images here. Not a few files.

-Larry

Sep 12 '06 #19

Fredrik Lundh

Larry Bates wrote:

For small numbers of blobs it works fine. The problem comes about,
more specifically, because Oracle's method for upgrading from one
version to another is Export, create new database, Import.

Does "Pray" come before or after the steps you mentioned?

</F>

Sep 12 '06 #20

Cliff Wells

On Mon, 2006-09-11 at 13:23 +0000, David Isaac wrote:

I have no experience with database applications.
This database will likely hold only a few hundred items,
including both textfiles and binary files.

I would like a pure Python solution to the extent reasonable.

Since no one's mentioned it:

http://schevo.org/trac/wiki

--

Sep 12 '06 #21

Kay Schluehr

Pierre Quentel wrote:

- SnakeSQL : another SQL engine, less mature I think and very slow when
I tested it

And strange bugs when I used it.

- buzhug : Pythonic syntax (uses list comprehensions or methods like
create(), select() on the db object), much faster than all the above.
I'm obviously biaised : I wrote it...

Looks cool! Apperently there are still mavericks who believe in "Python
first" while all others prefer refering to "standards" or what they
personally believe those standards to be [1]

Just one stupid remark since the limits of my language are the limits
of my world: I've not the slightest association with the seemingly
nonsense word "buzhug" and don't even know how to pronounce it
correctly. Would you have the kindness to enlighten me/us ?
[1]
http://groups.google.com/group/comp....10ac0dbbf23931

Sep 12 '06 #22

Cliff Wells

On Tue, 2006-09-12 at 12:29 -0700, Kay Schluehr wrote:

Just one stupid remark since the limits of my language are the limits
of my world: I've not the slightest association with the seemingly
nonsense word "buzhug" and don't even know how to pronounce it
correctly. Would you have the kindness to enlighten me/us ?

I simply assumed it was "guhzub" backwards.

Cliff

--

Sep 12 '06 #23

David Isaac

Thanks to all for the suggestions and much else
to think about.

Summarizing:

Those who were willing to consider a database suggested:
anydbm
Gadfly
SQLite (included with Python 2.5)
Schevo

Some preferred using the file system.
The core suggestion was to choose a directory structure
along with special naming conventions to indicate relationships.
Not all who suggested this said how to store info about the files.
One suggestion was:
Store the info in a text file and read the
entire file into memory and do linear searches. Python can search
100's of items in a list faster than you can even begin an SQL query.

Alan Isaac

Sep 13 '06 #24

Blair P. Houghton

Larry Bates wrote:

As far as "rational extension" is concerned, I think I can relate.
As a developer of imaging systems that store multiple-millions of
scanned pieces of paper online for customers, I can promise you
the file system is quite efficient at storing files (and that is
what the OP asked for in the original post) and way better than
storing in Oracle blobs. Can you store them in the database,
absolutely. Is it efficient and manageable. It has been our
experience that it is not. Ever tried to upgrade Oracle 9 to
Oracle 10 with a Tb of blobs?

Can't be any harder than switching between incompatible filesystems,
unless you assume it should "just work...".

--Blair

Sep 13 '06 #25

Fredrik Lundh

Blair P. Houghton wrote:

Can't be any harder than switching between incompatible filesystems,
unless you assume it should "just work...".

so what file systems are you using that don't support file names and
binary data ?

</F>

Sep 13 '06 #26

Pierre Quentel

Buzhug (like Karrigell and Strakell) is a Breton word ; Breton is the
language spoken in Brittany, the westernmost part of France. Less and
less spoken, actually, but I do, like all my ancestors. It is a close
cousin of Welsh, and has common roots with Irish and Gaelic

Buzhug means "earthworm", the big long brown worms that you find when
you dig ; the shape is the same as a python, only smaller and less
dangerous...

You pronounce it "buzuk", with the French "u" or German "ü"

Karrigell means "cart" and strakell, any sort of engine that you don't
know its name. Bot rhyme with "hell" ; a and r like in French, g like
in goat

Now you know 3 words of Breton !

Regards,
Pierre

Sep 13 '06 #27

Kay Schluehr

Pierre Quentel wrote:

Buzhug (like Karrigell and Strakell) is a Breton word ; Breton is the
language spoken in Brittany, the westernmost part of France. Less and
less spoken, actually, but I do, like all my ancestors. It is a close
cousin of Welsh, and has common roots with Irish and Gaelic

Buzhug means "earthworm", the big long brown worms that you find when
you dig ; the shape is the same as a python, only smaller and less
dangerous...

You pronounce it "buzuk", with the French "u" or German "ü"

Karrigell means "cart" and strakell, any sort of engine that you don't
know its name. Bot rhyme with "hell" ; a and r like in French, g like
in goat

Now you know 3 words of Breton !

Regards,
Pierre

Thanks !!!

Sep 13 '06 #28

Blair P. Houghton

Fredrik Lundh wrote:

Blair P. Houghton wrote:

Can't be any harder than switching between incompatible filesystems,
unless you assume it should "just work...".

so what file systems are you using that don't support file names and
binary data ?

Mmmm, no.

I'm saying that the change from Oracle 9 to Oracle 10 is like changing
from ffs to fat32.

They have different structures related to the location and
identification of every stored object. Sometimes different storage
structures (block sizes, block organization, fragmentation rules, etc.)
for the insides of a file.

A filesystem is a specialized database that stores generalized data.

The value of a database program and its data storage system is that you
can get the filesystem out of the way, and deal only in one layer of
searching and retrieval.

A DB may be only trivially more efficient when the data are a
collection of very large objects with a few externally associated
attributes that can all be found in the average filesystem's directory
structures; but a DB doing raw accesses on a bare disk is a big
improvement in speed when dealing with a huge collection of relatively
small data, each with a relatively large number of inconsistently
associated attributes.

The tradeoff is that you end up giving your DB vendor the option of
making you have to offload and reload that disk if they change their
system between versions.

--Blair

Sep 14 '06 #29

Fredrik Lundh

Blair P. Houghton wrote:

Mmmm, no.

I'm saying that the change from Oracle 9 to Oracle 10 is like changing
from ffs to fat32.

well, I'm quite sure that the people I know who's spending a lot of
their time moving stuff from Oracle N to Oracle N+1 (and sometimes
getting stuck, due to incompatibilities between SQL and SQL and a lack
of infinite resources) would say you're completely and utterly nuts.

</F>

Sep 14 '06 #30

Magnus Lycka

David Isaac wrote:

I have no experience with database applications.
This database will likely hold only a few hundred items,
including both textfiles and binary files.

I would like a pure Python solution to the extent reasonable.

Suggestions?

You haven't provided enough requirements for us
to make any intelligent suggestions. Perhaps you
might learn something from reading through my old
EuroPython presentation.

http://www.thinkware.se/cgi-bin/thin...mingWithPython

Relational databases with SQL syntax provides a convenient
way to store data with an appropriate structure. You can
always force a tool into handling things it wasn't designed
for, but SQL database work best when you have strict, well
defined structures, such as in accounting systems, booking
systems etc. It gives you a declarative query language,
transaction handling, typically multi user support and
varying degrees of scalability and high availability
features.

For you, it's probably overkill, and if you have files
to start with, keeping them in the file system is the
natural thing to do. That means that you can use a lot
of standard tools to access, manipulate, backup and search
through them. Perhaps you rather need a search engine for
the file system?

Do you intend to store information concerning how these
files relate to each other? Perhaps it's better in that
case to just keep that relationship information in some
small database system, and to keep the actual files in
the file system.

Perhaps it's enough to keep an XML file with the structure,
and to use something like ElementTree to manipulate that
XML structure.

You gain a lot of power, robustness and flexibility by
using some kind of plain text format. Simple files play
well with configuration management systems, backup systems,
editors, standard search tools, etc. If you use XML, it's
also easier to transform your structural information to
some presentable layout through standard techniques such
as XSL.

Sep 14 '06 #31

metaperl

David Isaac wrote:

Thanks to all for the suggestions and much else
to think about.

Summarizing:

Those who were willing to consider a database suggested:
anydbm
Gadfly
SQLite (included with Python 2.5)
Schevo

You missed buzhug:
http://buzhug.sourceforge.net/

A very thorough pure Python database.

Sep 14 '06 #32

Blair P. Houghton

Fredrik Lundh wrote:

Blair P. Houghton wrote:
I'm saying that the change from Oracle 9 to Oracle 10 is like changing
from ffs to fat32.

well, I'm quite sure that the people I know who's spending a lot of
their time moving stuff from Oracle N to Oracle N+1 (and sometimes
getting stuck, due to incompatibilities between SQL and SQL and a lack
of infinite resources) would say you're completely and utterly nuts.

Maybe they'd just be hyperbolic from the frustration. Filesystems
/are/ databases, and incompatibilities /are/ incompatibilities. And
without ANSI, the SQL problem could be like incompatibilities in C.
Not unheard-of. Not at all.

--Blair

Sep 15 '06 #33

best small database?

Similar topics