By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,907 Members | 2,031 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,907 IT Pros & Developers. It's quick & easy.

Python choice of database

P: n/a
Hi,

I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K
As I start with Python objects, I thought of using shelve, but looking at
the restrictions (record size + potential collisions) I feel I should study
my options a bit further before I get started.
Regards,

Philippe

Jul 19 '05 #1
Share this Question
Share on Google+
32 Replies


P: n/a
Gadfly
PySQLite ( requires SQLite library )

J

Philippe C. Martin wrote:
Hi,

I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K
As I start with Python objects, I thought of using shelve, but looking at
the restrictions (record size + potential collisions) I feel I should study
my options a bit further before I get started.
Regards,

Philippe


Jul 19 '05 #2

P: n/a
Philippe C. Martin wrote:
I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K
As I start with Python objects, I thought of using shelve, but looking at
the restrictions (record size + potential collisions) I feel I should study
my options a bit further before I get started.


Why not just use native Python data structures and pickle them?

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
I used to walk around / Like nothing could happen to me
-- TLC
Jul 19 '05 #3

P: n/a
Just thought of a couple more:

SnakeSQL
KirbyBase

J

John Abel wrote:
Gadfly
PySQLite ( requires SQLite library )

J

Philippe C. Martin wrote:
Hi,

I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K
As I start with Python objects, I thought of using shelve, but looking at
the restrictions (record size + potential collisions) I feel I should study
my options a bit further before I get started.
Regards,

Philippe



..

Jul 19 '05 #4

P: n/a

On Mon, 20 Jun 2005 15:18:58 GMT, "Philippe C. Martin"
<ph******@philippecmartin.com> said:
Hi,

I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K

SQLite might be worth a look. There are Python bindings at:
http://initd.org/tracker/pysqlite

Cheers,
Richard
Jul 19 '05 #5

P: n/a
Philippe C. Martin wrote:
I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K

As I start with Python objects, I thought of using shelve, but looking at
the restrictions (record size + potential collisions) I feel I should study
my options a bit further before I get started.


You don't say whether you want *pure* Python solutions, so I'll suggest
pysqlite which wraps the SQLite embedded database in a pretty much
totally transparent fashion and is highly effective, fast, compact,
reliable (so far, in my experience), and clean.

You also don't say whether you want a SQL database, so if you are free
to try anything, you might look at ZODB or Durus (think of it as a
lighter-weight ZODB). I believe Durus is pure Python, but it might have
some C code for performance (like ZODB). It's not SQL, and should
perhaps be thought of (as it describes itself) as an object persistence
solution, rather than a "database".

-Peter
Jul 19 '05 #6

P: n/a
John Abel wrote:
Gadfly
PySQLite ( requires SQLite library )


I want to clarify this parenthetical comment, for the record. When I
first downloaded PySQLite I had already gone and installed SQLite,
thinking it was a prerequisite in that sense.

In fact, the PySQLite install includes a .pyd which contains a
statically linked version of the complete SQLite library. No additional
installation is required, making it an even simpler solution than I
thought at first.

-Peter
Jul 19 '05 #7

P: n/a
Well that would be shelve I guess ... with the restrictions I mentioned.

Regards,

Philippe

Erik Max Francis wrote:
Philippe C. Martin wrote:
I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K
As I start with Python objects, I thought of using shelve, but looking at
the restrictions (record size + potential collisions) I feel I should
study my options a bit further before I get started.


Why not just use native Python data structures and pickle them?


Jul 19 '05 #8

P: n/a
Philippe C. Martin wrote:
Well that would be shelve I guess ... with the restrictions I mentioned.


I was talking about pickle, not shelve.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
I used to walk around / Like nothing could happen to me
-- TLC
Jul 19 '05 #9

P: n/a
Thank you all for your answers.

A pure Python would have beenmy first choice. yet I now feel I should spend
some time looking at PySQLite (I like the fact it's pre-compiled for
Windows).

Thanks.

Philippe

Philippe C. Martin wrote:
Hi,

I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K
As I start with Python objects, I thought of using shelve, but looking at
the restrictions (record size + potential collisions) I feel I should
study my options a bit further before I get started.
Regards,

Philippe


Jul 19 '05 #10

P: n/a
You mean pickling a dictionnary of 5000/16K objects ?

Erik Max Francis wrote:
Philippe C. Martin wrote:
Well that would be shelve I guess ... with the restrictions I mentioned.


I was talking about pickle, not shelve.


Jul 19 '05 #11

P: n/a
Philippe C. Martin wrote:
You mean pickling a dictionnary of 5000/16K objects ?


Yes. You said speed was not an issue; pickling only 5000 objects, each
no more than 16 kB, is easily handled by any remotely modern machine
(and even plenty which are not very modern).

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
I used to walk around / Like nothing could happen to me
-- TLC
Jul 19 '05 #12

P: n/a
OK, I'll try that too.

Regards,

Philippe

Erik Max Francis wrote:
Philippe C. Martin wrote:
You mean pickling a dictionnary of 5000/16K objects ?


Yes. You said speed was not an issue; pickling only 5000 objects, each
no more than 16 kB, is easily handled by any remotely modern machine
(and even plenty which are not very modern).


Jul 19 '05 #13

P: n/a
Philippe C. Martin wrote:
Thank you all for your answers.

A pure Python would have beenmy first choice. yet I now feel I should spend
some time looking at PySQLite (I like the fact it's pre-compiled for
Windows).

Thanks.

Philippe

Philippe C. Martin wrote:
Hi,

I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K
As I start with Python objects, I thought of using shelve, but looking at
the restrictions (record size + potential collisions) I feel I should
study my options a bit further before I get started.
Regards,

Philippe


Out of the suggestions SnakeSQL and KirbyBase are pure python. Gadfly
is sorta pure, in that it will work without the compiled kjbuckets lib.

J
Jul 19 '05 #14

P: n/a
Philippe C. Martin <ph******@philippecmartin.com> wrote:
Hi,

I am looking for a stand-alone (not client/server) database solution
for Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K

As I start with Python objects, I thought of using shelve, but looking
at the restrictions (record size + potential collisions) I feel I
should study my options a bit further before I get started.


Possible approach might be:
1. 5000 files -- my personal favourite.
2. GDBM
3. SQLite

--
William Park <op**********@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Full featured Bash shell
http://freshmeat.net/projects/bashdiff/
Jul 19 '05 #15

P: n/a
> 1. 5000 files -- my personal favourite.
You got a point

William Park wrote:
Philippe C. Martin <ph******@philippecmartin.com> wrote:
Hi,

I am looking for a stand-alone (not client/server) database solution
for Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K

As I start with Python objects, I thought of using shelve, but looking
at the restrictions (record size + potential collisions) I feel I
should study my options a bit further before I get started.


Possible approach might be:
1. 5000 files -- my personal favourite.
2. GDBM
3. SQLite


Jul 19 '05 #16

P: n/a
Thanks, I'm looking at KirbyBase also but wonder if it can handle bitmaps (I
could always pickle it first I guess).

Regards,

Philippe

John Abel wrote:
Philippe C. Martin wrote:
Thank you all for your answers.

A pure Python would have beenmy first choice. yet I now feel I should
spend some time looking at PySQLite (I like the fact it's pre-compiled for
Windows).

Thanks.

Philippe

Philippe C. Martin wrote:
Hi,

I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K
As I start with Python objects, I thought of using shelve, but looking at
the restrictions (record size + potential collisions) I feel I should
study my options a bit further before I get started.
Regards,

Philippe


Out of the suggestions SnakeSQL and KirbyBase are pure python. Gadfly
is sorta pure, in that it will work without the compiled kjbuckets lib.

J


Jul 19 '05 #17

P: n/a
"Philippe C. Martin" <ph******@philippecmartin.com> writes:
1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K


You don't mention whether multiple running programs need to use it
concurrently. That's usually done with client/server db's but it can
be done standalone.
Jul 19 '05 #18

P: n/a
Correct, that's not a constraint right now.
Paul Rubin wrote:
"Philippe C. Martin" <ph******@philippecmartin.com> writes:
1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K


You don't mention whether multiple running programs need to use it
concurrently. That's usually done with client/server db's but it can
be done standalone.


Jul 19 '05 #19

P: n/a
Philippe C. Martin wrote:
Hi,

I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K


How about using the filesystem as a database? For the number of records
you describe it may work surprisingly well. A bonus is that the
database is easy to manage manually. One tricky point is updating: you
probably want to create a temporary file and then use os.rename to
replace a record in one atomic operation.

For very short keys and record (e.g. email addresses) you can use
symbolic links instead of files. The advantage is that you have a
single system call (readlink) to retrieve the contents of a link. No
need to open, read and close.

This works only on posix systems, of course. The actual performance
depends on your filesystem but on linux and BSDs I find that
performance easily rivals that of berkeleydb and initialization time is
much faster. This "database" also supports reliable concurrent access
by multiple threads or processes.

See http://www.tothink.com/python/linkdb

Oren

Jul 19 '05 #20

P: n/a
Yes, I agree, but as most of the customer base I target uses the O/S that
cannot be named ;-) , file names could become a problem just as 'ln -s' is
out of the question.

Yet, this might be the best trade-off.

Regards,

Philippe

Oren Tirosh wrote:
Philippe C. Martin wrote:
Hi,

I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K


How about using the filesystem as a database? For the number of records
you describe it may work surprisingly well. A bonus is that the
database is easy to manage manually. One tricky point is updating: you
probably want to create a temporary file and then use os.rename to
replace a record in one atomic operation.

For very short keys and record (e.g. email addresses) you can use
symbolic links instead of files. The advantage is that you have a
single system call (readlink) to retrieve the contents of a link. No
need to open, read and close.

This works only on posix systems, of course. The actual performance
depends on your filesystem but on linux and BSDs I find that
performance easily rivals that of berkeleydb and initialization time is
much faster. This "database" also supports reliable concurrent access
by multiple threads or processes.

See http://www.tothink.com/python/linkdb

Oren


Jul 19 '05 #21

P: n/a
I am really surprised that someone hasn't mentioned Gadfly yet. It is a
quick, free, relational database written directly for Python itself.

http://gadfly.sourceforge.net/

Brian
---
Philippe C. Martin wrote:
Hi,

I am looking for a stand-alone (not client/server) database solution for
Python.

1) speed is not an issue
2) I wish to store less than 5000 records
3) each record should not be larger than 16K
As I start with Python objects, I thought of using shelve, but looking at
the restrictions (record size + potential collisions) I feel I should study
my options a bit further before I get started.
Regards,

Philippe

Jul 19 '05 #22

P: n/a
EP
Oren suggested:
How about using the filesystem as a database? For the number of records
you describe it may work surprisingly well. A bonus is that the
database is easy to manage manually.


I tried this for one application under the Windows OS and it worked fine...

until my records (text - maybe 50KB average) unexpectedly blossomed into the 10,000-1,000,000 ranges. If I or someone else (who innocently doesn't know better) opens up one of the directories with ~150,000 files in it, the machine's personality gets a little ugly (it seems buggy but is just very busy; no crashing). Under 10,000 files per directory seems to work just fine, though.

For less expansive (and more structured) data, cPickle is a favorite.

Jul 19 '05 #23

P: n/a
On Mon, 20 Jun 2005 23:42:21 -0800, EP wrote:
until my records (text - maybe 50KB average) unexpectedly blossomed into
the 10,000-1,000,000 ranges. If I or someone else (who innocently doesn't
know better) opens up one of the directories with ~150,000 files in it,
the machine's personality gets a little ugly (it seems buggy but is just
very busy; no crashing). Under 10,000 files per directory seems to work
just fine, though.


Yes. Programs like "squid" use subdirectories to avoid this problem. If
your key is a surname, then you can just use the first letter to divide
the names up, for instance, or part of the hash value.

Many Linux FSs can cope with lots of files, but it doesn't hurt to try to
avoid this.

Jeremy

Jul 19 '05 #24

P: n/a
On Mon, 20 Jun 2005 23:42:21 -0800, EP <EP@zomething.com> wrote:
Oren suggested:
How about using the filesystem as a database? For the number of records
you describe it may work surprisingly well. A bonus is that the
database is easy to manage manually.


I tried this for one application under the Windows OS and it worked fine...

until my records (text - maybe 50KB average) unexpectedly blossomed
into the 10,000-1,000,000 ranges. If I or someone else (who
innocently doesn't know better) opens up one of the directories with
~150,000 files in it, the machine's personality gets a little ugly (it
seems buggy but is just very busy; no crashing). Under 10,000 files
per directory seems to work just fine, though.

For less expansive (and more structured) data, cPickle is a favorite.


Related question:

What if I need to create/modify MS-Access or SQL Server dbs?

Jul 19 '05 #25

P: n/a
On 6/21/05, Charles Krug <cd****@worldnet.att.net> wrote:

Related question:

What if I need to create/modify MS-Access or SQL Server dbs?


You could use ADO + adodbapi for both.
http://adodbapi.sourceforge.net/

- kv
Jul 19 '05 #26

P: n/a
For my database, I have a table of user information with a unique
identifier, and then I save to the filesystem my bitmap files, placing the
unique identifier, date and time information into the filename. Why stick a
photo into a database?

For instance:

User Table:
uniqueID: 0001
lNane: Rose
fName: Dave

Then save the bitmap with filename:
0001_13:00:00_06-21-2005.bmp

To make things faster, I also have a table of filenames saved, so I can know
exactly which files I want to read in.

-Dave

Jul 19 '05 #27

P: n/a
GMane Python wrote:
For my database, I have a table of user information with a unique
identifier, and then I save to the filesystem my bitmap files, placing the
unique identifier, date and time information into the filename. Why stick a
photo into a database?
There are various possible reasons, depending on one's specific situation.

A database allows you to store the date and time info, or other
attributes, as separate fields so you can use standard SQL (or whatever
your favourite DB supports) to sort, query, and retrieve.

A database makes it possible to update or remove the photo in the same
manner you use to access all your other data, rather than requiring you
to deal with filesystem ideosyncracies and exceptional conditions.

A database such as SQLite will store *all* your data in a single file,
making it much easier to copy for archival purposes, to send to someone
else, or to move to another machine.
Then save the bitmap with filename:
0001_13:00:00_06-21-2005.bmp
A database shouldn't make you jump through hoops to create "interesting"
file names just to store your data. :-)
To make things faster, I also have a table of filenames saved, so I can know
exactly which files I want to read in.


Oh yeah, databases can have indexes (or indices, if you will) which let
you get that sort of speedup without having to resort to still more
custom programming. :-)

Not that a database is always the best way to store an image (or to do
anything else, for that matter), but there are definitely times when it
can be a better approach than simple files. (There are disadvantages
too, of course, such as making it harder to "get at" the data from
outside the application which created it. In the case of bitmaps, this
might well be a deciding factor, but each case should be addressed on
its own merits.)

-Peter
Jul 19 '05 #28

P: n/a
I guess I use databases to store .... data ;-) and I do not wish to worry
about the type of data I'm storing. That's why I love to pickle.

I understand that during an optimization phase, decisions might be taken to
handle data otherwise.

Regards,

Philippe


GMane Python wrote:
For my database, I have a table of user information with a unique
identifier, and then I save to the filesystem my bitmap files, placing the
unique identifier, date and time information into the filename. Why stick
a photo into a database?

For instance:

User Table:
uniqueID: 0001
lNane: Rose
fName: Dave

Then save the bitmap with filename:
0001_13:00:00_06-21-2005.bmp

To make things faster, I also have a table of filenames saved, so I can
know exactly which files I want to read in.

-Dave


Jul 19 '05 #29

P: n/a

P: n/a
On 20 Jun 2005 11:43:28 -0700, rumours say that "Oren Tirosh"
<or*********@gmail.com> might have written:
For very short keys and record (e.g. email addresses) you can use
symbolic links instead of files. The advantage is that you have a
single system call (readlink) to retrieve the contents of a link. No
need to open, read and close.
readlink also does open, read and close too. And why go through
indirection? Why not make indexes into subdirectories, say, and
hard-link the records under different filenames?
This works only on posix systems, of course.


There aren't any non-posix-conformant --or, at least, any
non-self-described-as-posix-conformant :-)-- operating systems in wide
use today.

Hint: win32file.CreateHardLink
--
TZOTZIOY, I speak England very best.
"Dear Paul,
please stop spamming us."
The Corinthians
Jul 19 '05 #31

P: n/a
On Tue, 21 Jun 2005 17:00:17 +0300, rumours say that Konstantin
Veretennicov <kv***********@gmail.com> might have written:
On 6/21/05, Charles Krug <cd****@worldnet.att.net> wrote:

Related question:

What if I need to create/modify MS-Access or SQL Server dbs?


You could use ADO + adodbapi for both.
http://adodbapi.sourceforge.net/


Or pywin32/ctypes and COM (btw, I prefer DAO to ADO, but that is a
personal choice).
--
TZOTZIOY, I speak England very best.
"Dear Paul,
please stop spamming us."
The Corinthians
Jul 19 '05 #32

P: n/a
On Mon, 20 Jun 2005 23:42:21 -0800, rumours say that "EP"
<EP@zomething.com> might have written:
I tried this for one application under the Windows OS and it worked fine...

until my records (text - maybe 50KB average) unexpectedly blossomed into the 10,000-1,000,000 ranges. If I or someone else (who innocently doesn't know better) opens up one of the directories with ~150,000 files in it, the machine's personality gets a little ugly (it seems buggy but is just very busy; no crashing). Under 10,000 files per directory seems to work just fine, though.


Although I am not a pro-Windows person, I have to say here directories
containing more than 10000 files is not a problem for NTFS (at least
NTFS of Win2000 and WinXP based on my experience) since AFAIK
directories are stored in B-tree format; the problem is if one tries to
*view* the directory contents using Explorer. Command-line dir had no
problem on a directory with >15000 files.
--
TZOTZIOY, I speak England very best.
"Dear Paul,
please stop spamming us."
The Corinthians
Jul 19 '05 #33

This discussion thread is closed

Replies have been disabled for this discussion.