By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,795 Members | 1,761 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,795 IT Pros & Developers. It's quick & easy.

low-end persistence strategies?

P: n/a
I've started a few threads before on object persistence in medium to
high end server apps. This one is about low end apps, for example, a
simple cgi on a personal web site that might get a dozen hits a day.
The idea is you just want to keep a few pieces of data around that the
cgi can update.

Immediately, typical strategies like using a MySQL database become too
big a pain. Any kind of compiled and installed 3rd party module (e.g.
Metakit) is also too big a pain. But there still has to be some kind
of concurrency strategy, even if it's something like crude file
locking, or else two people running the cgi simultaneously can wipe
out the data store. But you don't want crashing the app to leave a
lock around if you can help it.

Anyway, something like dbm or shelve coupled with flock-style file
locking and a version of dbmopen that automatically retries after 1
second if the file is locked would do the job nicely, plus there could
be a cleanup mechanism for detecting stale locks.

Is there a standard approach to something like that, or should I just
code it the obvious way?

Thanks.
Jul 18 '05 #1
Share this Question
Share on Google+
49 Replies


P: n/a
Am Tue, 15 Feb 2005 18:57:47 -0800 schrieb Paul Rubin:
I've started a few threads before on object persistence in medium to
high end server apps. This one is about low end apps, for example, a
simple cgi on a personal web site that might get a dozen hits a day.
The idea is you just want to keep a few pieces of data around that the
cgi can update. [cut] Anyway, something like dbm or shelve coupled with flock-style file
locking and a version of dbmopen that automatically retries after 1
second if the file is locked would do the job nicely, plus there could
be a cleanup mechanism for detecting stale locks.

Is there a standard approach to something like that, or should I just
code it the obvious way?


Hi,

I would use the pickle module and access to the
pickle files could be serialized (one after the other
is allowed to read or write) with file locking.

This means your cgi application can only serve one request
after the other.

HTH,
Thomas

--
Thomas Gttler, http://www.thomas-guettler.de/
Jul 18 '05 #2

P: n/a

Maybe ZODB helps.
--
Regards,

Diez B. Roggisch
Jul 18 '05 #3

P: n/a
"Diez B. Roggisch" <de*********@web.de> writes:
Maybe ZODB helps.


I think it's way too heavyweight for what I'm envisioning, but I
haven't used it yet. I'm less concerned about object persistence
(just saving strings is good enough) than finding the simplest
possible approach to dealing with concurrent update attempts.
Jul 18 '05 #4

P: n/a
Paul Rubin wrote:
"Diez B. Roggisch" <de*********@web.de> writes:
Maybe ZODB helps.


I think it's way too heavyweight for what I'm envisioning, but I
haven't used it yet. I'm less concerned about object persistence
(just saving strings is good enough) than finding the simplest
possible approach to dealing with concurrent update attempts.


And that's exactly where zodb comes into play. It has full ACID support.
Opening a zodb is a matter of three lines of code - not to be compared to
rdbms'ses. And apart from some standard subclassing, you don't have to do
anything to make your objects persistable. Just check the tutorial.
--
Regards,

Diez B. Roggisch
Jul 18 '05 #5

P: n/a
"Diez B. Roggisch" <de*********@web.de> writes:
I think it's way too heavyweight for what I'm envisioning, but I
haven't used it yet. I'm less concerned about object persistence
(just saving strings is good enough) than finding the simplest
possible approach to dealing with concurrent update attempts.


And that's exactly where zodb comes into play. It has full ACID support.
Opening a zodb is a matter of three lines of code - not to be compared to
rdbms'ses.


The issue with using an rdbms is not with the small amount of code
needed to connect to it and query it, but in the overhead of
installing the huge piece of software (the rdbms) itself, and keeping
the rdbms server running all the time so the infrequently used app can
connect to it. ZODB is also a big piece of software to install. Is
it at least 100% Python with no C modules required? Does it need a
separate server process? If it needs either C modules or a separate
server, it really can't be called a low-end strategy.
Jul 18 '05 #6

P: n/a
> The issue with using an rdbms is not with the small amount of code
needed to connect to it and query it, but in the overhead of
Its not only connecting - its creating (automaticall if necessary) and
"connecting" which is actually only opening.
installing the huge piece of software (the rdbms) itself, and keeping
the rdbms server running all the time so the infrequently used app can
connect to it. ZODB is also a big piece of software to install. Is
it at least 100% Python with no C modules required? Does it need a
separate server process? If it needs either C modules or a separate
server, it really can't be called a low-end strategy.


It has to be installed. And it has C-modules - but I don't see that as a
problem. Of course this is my personal opinion - but it's certainly easier
installed than to cough up your own transaction isolated persistence layer.
I started using it over pickle when my multi-threaded app caused pickle to
crash.

ZODB does not have a server-process, and no external setup beyond the
installation of the module itself.

Even if you consider installing it as too heavy for your current needs, you
should skim over the tutorial to get a grasp of how it works.

--
Regards,

Diez B. Roggisch
Jul 18 '05 #7

P: n/a
Sounds like you want pickle or cpickle.
On Tue, 15 Feb 2005 19:00:31 -0800 (PST), Paul Rubin
<"http://phr.cx"@nospam.invalid> wrote:
I've started a few threads before on object persistence in medium to
high end server apps. This one is about low end apps, for example, a
simple cgi on a personal web site that might get a dozen hits a day.
The idea is you just want to keep a few pieces of data around that the
cgi can update.

Immediately, typical strategies like using a MySQL database become too
big a pain. Any kind of compiled and installed 3rd party module (e.g.
Metakit) is also too big a pain. But there still has to be some kind
of concurrency strategy, even if it's something like crude file
locking, or else two people running the cgi simultaneously can wipe
out the data store. But you don't want crashing the app to leave a
lock around if you can help it.

Anyway, something like dbm or shelve coupled with flock-style file
locking and a version of dbmopen that automatically retries after 1
second if the file is locked would do the job nicely, plus there could
be a cleanup mechanism for detecting stale locks.

Is there a standard approach to something like that, or should I just
code it the obvious way?

Thanks.
--
http://mail.python.org/mailman/listinfo/python-list

--
Thomas G. Willis
http://paperbackmusic.net
Jul 18 '05 #8

P: n/a
"Diez B. Roggisch" <de*********@web.de> writes:
It has to be installed. And it has C-modules - but I don't see that
as a problem. Of course this is my personal opinion - but it's
certainly easier installed than to cough up your own transaction
isolated persistence layer. I started using it over pickle when my
multi-threaded app caused pickle to crash.
I don't feel that I need ACID since, as mentioned, I'm willing to lock
the entire database for the duration of each transaction. I just want
a simple way to handle locking, retries, and making sure the locks are
cleaned up.
ZODB does not have a server-process, and no external setup beyond the
installation of the module itself.
That helps, thanks.
Even if you consider installing it as too heavy for your current needs, you
should skim over the tutorial to get a grasp of how it works.


Yes, I've been wanting to look at it sometime.
Jul 18 '05 #9

P: n/a
Tom Willis <to********@gmail.com> writes:
Sounds like you want pickle or cpickle.


No, the issue is how to handle multiple clients trying to update the
pickle simultaneously.
Jul 18 '05 #10

P: n/a
I'd like to second this one...ZODB is *extremely* easy to use. I use
it in projects with anything from a couple dozen simple objects all
the way up to a moderately complex system with several hundred
thousand stored custom objects. (I would use it for very complex
systems as well, but I'm not working on any right now...)

There are a few quirks to using ZODB, and the documentation sometimes
feel lite, but mostly that's b/c ZODB is so easy to use.

Chris
On Wed, 16 Feb 2005 15:11:46 +0100, Diez B. Roggisch <de*********@web.de> wrote:
Paul Rubin wrote:
"Diez B. Roggisch" <de*********@web.de> writes:
Maybe ZODB helps.


I think it's way too heavyweight for what I'm envisioning, but I
haven't used it yet. I'm less concerned about object persistence
(just saving strings is good enough) than finding the simplest
possible approach to dealing with concurrent update attempts.


And that's exactly where zodb comes into play. It has full ACID support.
Opening a zodb is a matter of three lines of code - not to be compared to
rdbms'ses. And apart from some standard subclassing, you don't have to do
anything to make your objects persistable. Just check the tutorial.
--
Regards,

Diez B. Roggisch
--
http://mail.python.org/mailman/listinfo/python-list

--
"It is our responsibilities, not ourselves, that we should take
seriously." -- Peter Ustinov
Jul 18 '05 #11

P: n/a
Chris Cioffi wrote:
I'd like to second this one...ZODB is *extremely* easy to use. I use
it in projects with anything from a couple dozen simple objects all
the way up to a moderately complex system with several hundred
thousand stored custom objects. (I would use it for very complex
systems as well, but I'm not working on any right now...)


Chris (or anyone else), could you comment on ZODB's performance? I've Googled
around a bit and haven't been able to find anything concrete, so I'm really
curious to know how ZODB does with a few hundred thousand objects.

Specifically, what level of complexity do your ZODB queries/searches have? Any
idea on how purely ad hoc searches perform? Obviously it will be affected by the
nature of the objects, but any insight into ZODB's performance on large data
sets would be helpful. What's the general ratio of reads to writes in your
application?

I'm starting on a project in which we'll do completely dynamic (generated on the
fly) queries into the database (mostly of the form of "from the set of all
objects, give me all that have property A AND have property B AND property B's
value is between 10 and 100, ..."). The objects themselves are fairly dynamic as
well, so building it on top of an RDBMS will require many joins across property
and value tables, so in the end there might not be any performance advantage in
an RDBMS (and it would certainly be a lot work to use an object database - a
huge portion of the work is in the object-relational layer).

Anyway, thanks for any info you can give me,
-Dave
Jul 18 '05 #12

P: n/a
Oops missed that sorry.

Carry on.

On Wed, 16 Feb 2005 07:29:58 -0800 (PST), Paul Rubin
<"http://phr.cx"@nospam.invalid> wrote:
Tom Willis <to********@gmail.com> writes:
Sounds like you want pickle or cpickle.


No, the issue is how to handle multiple clients trying to update the
pickle simultaneously.
--
http://mail.python.org/mailman/listinfo/python-list

--
Thomas G. Willis
http://paperbackmusic.net
Jul 18 '05 #13

P: n/a
What about bsddb? On most Unix systems it should be
already installed and on Windows it comes with the
ActiveState distribution of Python, so it should fullfill
your requirements.

Jul 18 '05 #14

P: n/a
"Michele Simionato" <mi***************@gmail.com> writes:
What about bsddb? On most Unix systems it should be already
installed and on Windows it comes with the ActiveState distribution
of Python, so it should fullfill your requirements.


As I understand it, bsddb doesn't expose the underlying Sleepycat API's
for concurrent db updates, nor does it appear to make any attempt at
locking, based on looking at the Python lib doc for it. There's an
external module called pybsddb that includes this stuff. Maybe the
stdlib maintainers ought to consider including it, if it's considered
stable enough.
Jul 18 '05 #15

P: n/a
> Chris (or anyone else), could you comment on ZODB's performance? I've
Googled around a bit and haven't been able to find anything concrete, so
I'm really curious to know how ZODB does with a few hundred thousand
objects. Specifically, what level of complexity do your ZODB queries/searches have?
Any idea on how purely ad hoc searches perform? Obviously it will be
affected by the nature of the objects, but any insight into ZODB's
performance on large data sets would be helpful. What's the general ratio
of reads to writes in your application?


This is a somewhat weak point of zodb. Zodb simply lets you store arbitrary
object graphs. There is no indices created to access these, and no query
language either. You can of course create indices yourself - and store them
as simply as all other objects. But you've got to hand-tailor these to the
objects you use, and create your querying code yourself - no 4gl like sql
available.

Of course writing queries as simple predicates evaluated against your whole
object graph is straightforward - but unoptimized.

The retrieval of objects themselves is very fast - I didn't compare to a
rdbms, but as there is no networking involved it should be faster. And of
course no joins are needed.

So in the end, if you have always the same kind of queries that you only
parametrize and create appropriate indices and hand-written "execution
plans" things are nice.

But I want to stress another point that can cause trouble when using zodb
and that I didn't mention in replies to Paul so far, as he explicitly
didn't want to use an rdbms:
For rdbms'ses, a well-defined textual representation of the entities stored
in the db is available. So while you have to put some effort on creating on
OR-mapping (if you want to deal with objects) that will most likely evolve
over time, migrating the underlying data usually is pretty straightforward,
and even toolsupport is available. Basically, you're only dealing with
CSV-Data that can be easily manipulated and stored back.

ZODB on the other side is way easier to code for - but the hard times begin
if you have a rolled out application that has a bunch of objects inside
zodb that have to be migrated to newer versions and possibly changed object
graph layouts. This made me create elaborate yaml/xml serializations to
allow for im- and exports and use with xslt and currently I'm investigating
a switch to postgres.

This point is important, and future developments of mine will take that into
consideration more than they did so far.

--
Regards,

Diez B. Roggisch
Jul 18 '05 #16

P: n/a
People sometimes run to complicated systems, when right before you
there is a solution. In this case, it is with the filesystem itself.

It turns out mkdir is an atomic event (at least on filesystems I've
encountered). And, from that simple thing, you can build something
reasonable as long as you do not need high performance. and space isn't
an issue.

You need a 2 layer lock (make 2 directories) and you need to keep 2
data files around plus a 3rd temporary file.

The reader reads from the newest of the 2 data files.

The writer makes the locks, deletes the oldest data file and renames
it's temporary file to be the new data file. You could
have the locks expire after 10 minutes, to take care of failure to
clean up. Ultimately, the writer is responsible for keeping the locks
alive. The writer knows it is his lock because it has his timestamp.
If the writer dies, no big deal, since it only affected a temporary
file and the locks will expire.

Rename the temporary file takes advantage of the fact that a rename
is essentially immediate. Since, whatever does the reading, only reads
from the newest of the 2 files (if both are available). Once, the
rename of the temporary file done by the writer is complete, any future
reads will now hit the newest data. And, deleting the oldest file
doesn't matter since the reader never looks at it.

If you want more specifics let me know.

john

Jul 18 '05 #17

P: n/a
The documentation hides this fact (I missed that) but actually python
2.3+ ships
with the pybsddb module which has all the functionality you allude too.
Check at the test directory for bsddb.

Michele Simionato

Jul 18 '05 #18

P: n/a
In article <7x***************@ruckus.brouhaha.com>,
Paul Rubin <http://ph****@NOSPAM.invalid> wrote:
I've started a few threads before on object persistence in medium to
high end server apps. This one is about low end apps, for example, a
simple cgi on a personal web site that might get a dozen hits a day.
The idea is you just want to keep a few pieces of data around that the
cgi can update.

Immediately, typical strategies like using a MySQL database become too
big a pain. Any kind of compiled and installed 3rd party module (e.g.
Metakit) is also too big a pain. But there still has to be some kind
of concurrency strategy, even if it's something like crude file
locking, or else two people running the cgi simultaneously can wipe
out the data store. But you don't want crashing the app to leave a
lock around if you can help it.

Anyway, something like dbm or shelve coupled with flock-style file
locking and a version of dbmopen that automatically retries after 1
second if the file is locked would do the job nicely, plus there could
be a cleanup mechanism for detecting stale locks.

Is there a standard approach to something like that, or should I just
code it the obvious way?

Thanks.


I have a couple of oblique, barely-helpful reactions; I
wish I knew better solutions.

First: I'm using Metakit and SQLite; they give me more
confidence and fewer surprises than dbm.

Second: Locking indeed is a problem, and I haven't
found a good global solution for it yet. I end up with
local fixes, that is, rather project-specific locking
schemes that exploit knowledge that, for example, there
are no symbolic links to worry about, or NFS mounts, or
....

Good luck.
Jul 18 '05 #19

P: n/a
"Michele Simionato" <mi***************@gmail.com> writes:
The documentation hides this fact (I missed that) but actually
python 2.3+ ships with the pybsddb module which has all the
functionality you allude too. Check at the test directory for bsddb.


Thanks, this is very interesting. It's important functionality that
should be documented, if it works reliably. Have you had any probs
with it?
Jul 18 '05 #20

P: n/a
"Michele Simionato" <mi***************@gmail.com> writes:
The documentation hides this fact (I missed that) but actually python
2.3+ ships
with the pybsddb module which has all the functionality you allude too.
Check at the test directory for bsddb.


Oh yow, it looks pretty complicated. Do you have any example code
around that uses the transaction stuff? If not I can try to figure it
out, but it looks like it would take significant effort.
Jul 18 '05 #21

P: n/a
KirbyBase sounds like something that could fit the bill.
Jul 18 '05 #22

P: n/a
Fred Pacquier <xn****@fredp.lautre.net> writes:
KirbyBase sounds like something that could fit the bill.


Hmm, this looks kind of nice. However, when used in embedded mode,
the overview blurb doesn't say anything about concurrency control.
I don't want to use it in client/server mode, for reasons already stated.
Jul 18 '05 #23

P: n/a
Paul Rubin wrote:
Fred Pacquier <xn****@fredp.lautre.net> writes:
KirbyBase sounds like something that could fit the bill.

Hmm, this looks kind of nice. However, when used in embedded mode,
the overview blurb doesn't say anything about concurrency control.
I don't want to use it in client/server mode, for reasons already stated.


The KirbyBase distribution comes with two small scripts that each
implement a server.

kbsimpleserver.py allows multi-user access to KirbyBase tables. It
takes care of concurrent update issues by being single-threaded and
blocking. Client requests are handled sequentially. It works fine for
small tables that don't have a lot of concurrent access.

kbthreadedserver.py also allows for multi-user access. It creates a
multi-threaded, non-blocking server. Each client gets it's own thread.
The only time one thread will block the others is when it is going
to write to a table, and then it only blocks other write requests to
that same table. Reads never are blocked. This server script has
worked ok for me in limited testing.

Either of these server scripts would have to be running as a process
either on your web server or on another server on your network in order
for them to work. I don't know if that would be an issue for you.

HTH,

Jamey Cribbs
Jul 18 '05 #24

P: n/a
Jamey Cribbs <jc*****@twmi.rr.com> writes:
Either of these server scripts would have to be running as a process
either on your web server or on another server on your network in
order for them to work. I don't know if that would be an issue for you.


Yes, that's the whole point. I don't want to run a server process 24/7
just to look up two or three numbers a few times a day.
Jul 18 '05 #25

P: n/a
Paul Rubin wrote:
Jamey Cribbs <jc*****@twmi.rr.com> writes:
Either of these server scripts would have to be running as a process
either on your web server or on another server on your network in
order for them to work. I don't know if that would be an issue for you.

Yes, that's the whole point. I don't want to run a server process 24/7
just to look up two or three numbers a few times a day.


Ok, I see your point now. Well, this is off the top of my head, I
haven't tried it, but I think you could just use KirbyBase embedded in
your cgi script and it should work fine. I'm kind of thinking out loud
about this, but, let's see, if you had two user's simultaneously
accessing your web site at the same time, that would be two instances of
your cgi script. If they are just looking up data, that would be two
reads that KirbyBase would be doing against the same physical file. It
just opens files in read mode for that so that should work even
concurrently.

The only time there might be trouble is if two clients try to write to
the same table (physical file) at the same time. When it writes to a
file, KirbyBase opens it in append mode (r+, I think). My guess would
be, whichever client got there first would open the file. The second
client, arriving a split second later, would attempt to open the file in
append mode also and KirbyBase would return an exception. If your cgi
script caught and handled the exception, you would be fine.

Again this is off the top of my head after a long day, so I can't be
held responsible for my ramblings. :)

Jamey
Jul 18 '05 #26

P: n/a
Paul Rubin:
Oh yow, it looks pretty complicated. Do you have any example code
around that uses the transaction stuff? If not I can try to figure it out, but it looks like it would take significant effort.


This was my impression too :-( The ZODB is way much easier to use so
at the end I used just that. Apparently the bsddb stuff is more
complicated than needed and the documentation sucks. However,
it does satisfy your requirements of being already installed, so I
mentioned it. I am looking too for an easy tutorial on how to do
concurrency/transactions with it.

Michele Simionato

Jul 18 '05 #27

P: n/a
Jamey Cribbs <jc*****@twmi.rr.com> writes:
The only time there might be trouble is if two clients try to write to
the same table (physical file) at the same time.
Yes, that's what I'm concerned about.
When it writes to a file, KirbyBase opens it in append mode (r+, I
think). My guess would be, whichever client got there first would
open the file. The second client, arriving a split second later,
would attempt to open the file in append mode also and KirbyBase
would return an exception. If your cgi script caught and handled
the exception, you would be fine.


I don't think the OS will stop both processes from opening in append
mode (which just means opening the file and seeking to the end) at
once unless you use O_EXCL or something. Then you're left with
handling the exception, which may get messy if you want to have all
the corner cases correct. It would be nice if there was a published
code snippet somewhere that did that. I may try writing one.

The best solution, I think, is Michele Simionato's, which is to use
the currently-undocumented bsddb transction features. Those features
are the right technical approach to this problem. But they really
ought to get documented first.
Jul 18 '05 #28

P: n/a
"Michele Simionato" <mi***************@gmail.com> writes:
This was my impression too :-( The ZODB is way much easier to use so
at the end I used just that. Apparently the bsddb stuff is more
complicated than needed and the documentation sucks. However,
it does satisfy your requirements of being already installed, so I
mentioned it. I am looking too for an easy tutorial on how to do
concurrency/transactions with it.


The sleepycat docs seemed fine back when I looked at them years ago.
I'm just not sure what the Python wrapper is supposed to do in terms
of API.
Jul 18 '05 #29

P: n/a
On Tue, Feb 15, 2005 at 06:57:47PM -0800, Paul Rubin wrote:
I've started a few threads before on object persistence in medium to
high end server apps. This one is about low end apps, for example, a
simple cgi on a personal web site that might get a dozen hits a day.
The idea is you just want to keep a few pieces of data around that the
cgi can update.

Immediately, typical strategies like using a MySQL database become too
big a pain. Any kind of compiled and installed 3rd party module (e.g.
Metakit) is also too big a pain. But there still has to be some kind
of concurrency strategy, even if it's something like crude file
locking, or else two people running the cgi simultaneously can wipe
out the data store. But you don't want crashing the app to leave a
lock around if you can help it.

Anyway, something like dbm or shelve coupled with flock-style file
locking and a version of dbmopen that automatically retries after 1
second if the file is locked would do the job nicely, plus there could
be a cleanup mechanism for detecting stale locks.

Is there a standard approach to something like that, or should I just
code it the obvious way?


one easy way would be something along the lines of

from ConfigParser import ConfigParser
from fcntl import flock, LOCK_SH, LOCK_EX, LOCK_UN

class LockedParser(ConfigParser):
def _read(self, fp, fpname):
flock(fp, LOCK_SH) # block until can read
try:
rv = super(LockedParser, self)._read(fp, fpname)
finally:
flock(fp, LOCK_UN)
return rv

def write(self, fp):
flock(fp, LOCK_EX) # block until can write
try:
rv = super(LockedParser, self).write(fp)
finally:
flock(fp, LOCK_UN)
return rv

although you could do the same kind of stuff with csv, or even
Pickle. Of course this doesn't work if what you're wanting to
implement is a hit counter, but that is much easier: just grab a
LOCK_EX, read in, write out, LOCK_UN. If you care about (not)
overwriting changes, but fear you'll hold the lock for too long with
the simple 'grab the lock and run' approach, you could save a version
of the original file and compare before writing out. Complexity grows
a lot, and you suddenly would be better off using pybsddb or somesuch.

Of course I'm probably overlooking something, because it really can't
be this easy, can it?

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
BOFH excuse #44:

bank holiday - system operating credits not recharged

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD4DBQFCFCRlgPqu395ykGsRAl/DAKCR0UMqnWlwYiUyXVxnE6rxJWY93gCYzaat
VGxTsZaZ+GlLrHIAwBaCHg==
=mzQa
-----END PGP SIGNATURE-----

Jul 18 '05 #30

P: n/a
John Lenton <jo**@grulic.org.ar> writes:
flock(fp, LOCK_EX) # block until can write ...
Of course I'm probably overlooking something, because it really can't
be this easy, can it?


Yes, maybe so. I'm just way behind the times and didn't realize flock
would block until existing incompatible locks are released. That may
solve the whole timeout/retry problem. I should have checked the docs
more carefully earlier; I was thinking only in terms of opening with
O_EXCL. Thanks!
Jul 18 '05 #31

P: n/a
<snip simple example with flock>

What happens if for any reason the application crashes?
Locked files will stay locked or not? And if yes, how do I
unlock them?

Michele Simionato

Jul 18 '05 #32

P: n/a
Paul Rubin <http> wrote:
The issue with using an rdbms is not with the small amount of code
needed to connect to it and query it, but in the overhead of
installing the huge piece of software (the rdbms) itself, and keeping
the rdbms server running all the time so the infrequently used app can
connect to it.
I've found SQLobject to be a really good way of poking objects in an
SQL database with zero hassle.

It can also use SQLite (which I haven't tried) which gets rid of your
large rdbms process but also gives you a migration path should the
problem expand.
ZODB is also a big piece of software to install. Is it at least
100% Python with no C modules required? Does it need a separate
server process? If it needs either C modules or a separate server,
it really can't be called a low-end strategy.


ZODB looks fun. I just wish (being lazy) that there was a seperate
debian package for just it and not the whole of Zope.

--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick
Jul 18 '05 #33

P: n/a
On Wed, Feb 16, 2005 at 10:43:42PM -0800, Michele Simionato wrote:
<snip simple example with flock>

What happens if for any reason the application crashes?
Locked files will stay locked or not? And if yes, how do I
unlock them?


the operating system cleans up the lock.

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
Linux ext2fs has been stable for a long time, now it's time to break it
-- Linuxkongre '95 in Berlin

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCFFc/gPqu395ykGsRAm21AJwITsOiADt/3ddoPnYr275c3LYeiwCeIIfI
Y1Pjlf2JZF70Wvz2t4muTCI=
=2aN5
-----END PGP SIGNATURE-----

Jul 18 '05 #34

P: n/a
John Lenton:
the operating system cleans up the lock.


So, are you effectively saying than a custom made solution based on
flock
can be quite reliable and it could be a reasonable choice to use
shelve+flock
for small/hobbysts sites? I always thought locking was a bad beast and
feared
to implement it myself, but maybe I was wrong afterall ...

Michele Simionato

Jul 18 '05 #35

P: n/a
Maybe you'll find this too naive, but why do you want to avoid
concurrent accesses to a database that will be accessed 12 times a day ?

Regards,
Pierre
Jul 18 '05 #36

P: n/a
On Thu, Feb 17, 2005 at 09:08:25PM +0100, Pierre Quentel wrote:
Maybe you'll find this too naive, but why do you want to avoid
concurrent accesses to a database that will be accessed 12 times a day ?


because every sunday at 3am your boss and his wife will both try to
use the script at the same time, and delete everything.

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
If our behavior is strict, we do not need fun!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFPsEgPqu395ykGsRAnUVAKDBJyhZHfVDVM2OyMtpqd nOnjCBBgCgmGLx
6U4VxMZHjT5gaxnAwkWz8D0=
=AIbC
-----END PGP SIGNATURE-----

Jul 18 '05 #37

P: n/a
On Thu, Feb 17, 2005 at 12:42:55AM -0800, Michele Simionato wrote:
John Lenton:
the operating system cleans up the lock.


So, are you effectively saying than a custom made solution based on
flock can be quite reliable and it could be a reasonable choice to
use shelve+flock for small/hobbysts sites? I always thought locking
was a bad beast and feared to implement it myself, but maybe I was
wrong afterall ...


locking works very well, when it works. If you're on Linux, the
manpage for flock has a NOTES section you should read. I don't know
how direct the mapping between python's flock/lockf and the OSs
flock/lockf are, you might want to look into that as well (but you'd
only really care if you are in one of the corner cases mentioned in
the referred NOTES section).

In some weird corner cases you'd have to revert to some other locking
scheme, but the same pattern applies, however: subclass whatever it is
you want to use, wrapping the appropriate methods in try/finally
lock/unlocks; you just want to change the flock to some other thing.

Also, if you use something where the process doesn't terminate between
calls (such as mod_python, I guess), you have to be sure to write the
try/finallys around your locking code, because the OS only cleans up
the lock when the process exits.

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
Keep emotionally active. Cater to your favorite neurosis.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFNNWgPqu395ykGsRArnPAJwOfYeVLD/YwY87naht2zq5W/EteACffhhy
BxiPkQ7SnfVXSE4oiKUfpoA=
=Fw5a
-----END PGP SIGNATURE-----

Jul 18 '05 #38

P: n/a
John Lenton <jo**@grulic.org.ar> writes:
Maybe you'll find this too naive, but why do you want to avoid
concurrent accesses to a database that will be accessed 12 times a day ?


because every sunday at 3am your boss and his wife will both try to
use the script at the same time, and delete everything.


Yes, I think that could be pretty typical. For example, say I write a
cgi to maintain a signup list for a party I'm having. I email an
invitation out to some friends with a url to click if they want to
attend. If a dozen people click the url in the next day, several of
them will probably in the first minute or so after the email goes out.
So two simultaneous clicks isn't implausible.

More generally, I don't like writing code with bugs even if the bugs
have fairly low chance of causing trouble. So I'm looking for the
easiest way to do this kind of thing without bugs.
Jul 18 '05 #39

P: n/a
> John Lenton:
Also, if you use something where the process doesn't terminate between
calls (such as mod_python, I guess), you have to be sure to write the
try/finallys around your locking code, because the OS only cleans up
the lock when the process exits.

This is what I feared. What happens in the case of a power failure?
Am I left with locked files floating around?

Michele Simionato

Jul 18 '05 #40

P: n/a
On Thu, Feb 17, 2005 at 09:02:37PM -0800, Michele Simionato wrote:
John Lenton:

Also, if you use something where the process doesn't terminate between
calls (such as mod_python, I guess), you have to be sure to write the
try/finallys around your locking code, because the OS only cleans up
the lock when the process exits.

This is what I feared. What happens in the case of a power failure?
Am I left with locked files floating around?


no.

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
Los hijos de los buenos, capa son de duelo

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFXuSgPqu395ykGsRAheIAJ9cl+6WaNrgGv651khEQi q7QgIGXgCdG9VN
L8L7HNSfhTzdQEmTQW3/T3o=
=/joU
-----END PGP SIGNATURE-----

Jul 18 '05 #41

P: n/a
Ok, I have yet another question: what is the difference
between fcntl.lockf and fcntl.flock? The man page of
my Linux system says that flock is implemented independently
of fcntl, however it does not say if I should use it in preference
over fcntl or not.

Michele Simionato

Jul 18 '05 #42

P: n/a
If a dozen people click the url in the next day, several of
them will probably in the first minute or so after the email goes out.
So two simultaneous clicks isn't implausible. More generally, I don't like writing code with bugs even if the bugs
have fairly low chance of causing trouble. So I'm looking for the
easiest way to do this kind of thing without bugs.


Even if the 12 requests occur in the same 5 minutes, the time needed for
a read or write operation on a small base of any kind (flat file, dbm,
shelve, etc) is so small that the probability of concurrence is very
close to zero

If you still want to avoid it, you'll have to pay some price. The most
simple and portable is a client/server mode, as suggested for KirbyBase
for instance. Yes, you have to run the server 24 hours a day, but you're
already running the web server 24/7 anyway
Jul 18 '05 #43

P: n/a
Pierre Quentel <qu************@wanadoo.fr> writes:
Even if the 12 requests occur in the same 5 minutes, the time needed
for a read or write operation on a small base of any kind (flat file,
dbm, shelve, etc) is so small that the probability of concurrence is
very close to zero
I prefer "equal to zero" over "close to zero". Also, there are times
when the server is heavily loaded, and any file operation can take a
long time.
If you still want to avoid it, you'll have to pay some price. The most
simple and portable is a client/server mode, as suggested for
KirbyBase for instance. Yes, you have to run the server 24 hours a
day, but you're already running the web server 24/7 anyway


If I have to run a db server 24/7, that's not simple or portable.
There's lots of hosting environments where that's plain not permitted.
Using file locks is much simpler and more portable. The penalty is
that I can only handle one request at a time, but as mentioned, this
is for a low-usage app, so that serialization is ok.
Jul 18 '05 #44

P: n/a
Michele Simionato <mi***************@gmail.com> wrote:
Ok, I have yet another question: what is the difference
between fcntl.lockf and fcntl.flock? The man page of
my Linux system says that flock is implemented independently
of fcntl, however it does not say if I should use it in preference
over fcntl or not.


flock() and lockf() are two different library calls.

With lockf() you can lock parts of a file. I've always used flock().

From man lockf() "On Linux, this call [lockf] is just an interface for
fcntl(2). (In general, the relation between lockf and fcntl is
unspecified.)"

see man lockf and man flock

--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick
Jul 18 '05 #45

P: n/a
On Thu, Feb 17, 2005 at 10:20:58PM -0800, Michele Simionato wrote:
Ok, I have yet another question: what is the difference
between fcntl.lockf and fcntl.flock? The man page of
my Linux system says that flock is implemented independently
of fcntl, however it does not say if I should use it in preference
over fcntl or not.


it depends on what you want to do: as the manpages say, flock is
present on most Unices, but lockf is POSIX; flock is BSD, lockf is
SYSV (although its in XPG4.2, so you have it on newer Unices of any
flavor); on Linux, lockf works over NFS (if the server supports it),
and gives you access to mandatory locking if you want it. You can't
mix lockf and flock (by this I mean: you can get a LOCK_EX via flock
and via lockf on the same file at the same time).

So: use whichever you feel more comfortable with, although if you are
pretty confident your program will run mostly on Linux there is a bias
towards lockf given its extra capabilities there.

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
QQuuiittaa eell LLooccaall EEcchhoo,, MMaannoolloo !!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFf73gPqu395ykGsRAqf9AJ4qr2BjCvBQNb8Huz/6Oc8z/zEngACeKR8Q
VcgsMiSAiJ9+kDj4Hnh5jUQ=
=ZVQO
-----END PGP SIGNATURE-----

Jul 18 '05 #46

P: n/a
Uhm ... I reading /usr/src/linux-2.4.27/Documentation/mandatory.txt
The last section says:

"""
6. Warning!
-----------

Not even root can override a mandatory lock, so runaway processes can
wreak
havoc if they lock crucial files. The way around it is to change the
file
permissions (remove the setgid bit) before trying to read or write to
it.
Of course, that might be a bit tricky if the system is hung :-(
"""

so lockf locks do not look completely harmless ...

Michele Simionato

Jul 18 '05 #47

P: n/a
On Fri, Feb 18, 2005 at 07:57:21AM -0800, Michele Simionato wrote:
Uhm ... I reading /usr/src/linux-2.4.27/Documentation/mandatory.txt
The last section says:

"""
6. Warning!
-----------

Not even root can override a mandatory lock, so runaway processes
can wreak havoc if they lock crucial files. The way around it is to
change the file permissions (remove the setgid bit) before trying to
read or write to it. Of course, that might be a bit tricky if the
system is hung :-(
"""

so lockf locks do not look completely harmless ...


if you read the whole file, you will have read that turning on
mandatory locks is not trivial. I never said it was harmless, and in
fact (as that section explains) it's a bad idea for most cases; there
are some (very few) situations where you need it, however, and so you
can get at that functionality. Having to mount your filesystem with
special options and twiddling the permission bits is a pretty strong
hint that the implementors didn't think it was a good idea for most
cases, too.

Hmm, just read that file, and it doesn't mention the "have to mount
with special options" bit. But if you look in mount(8), you see an
entry under the options list,

mand Allow mandatory locks on this filesystem. See fcntl(2)

and if you look in fcntl(2), you see that

Mandatory locking
(Non‐POSIX.) The above record locks may be either advi‐
sory or mandatory, and are advisory by default. To make
use of mandatory locks, mandatory locking must be
enabled (using the "‐o mand" option to mount(8)) for the
file system containing the file to be locked and enabled
on the file itself (by disabling group execute permis‐
sion on the file and enabling the set‐GID permission
bit).

Advisory locks are not enforced and are useful only
between cooperating processes. Mandatory locks are
enforced for all processes.

if I have come across as recommending mandatory locks in this thread,
I apologize, as it was never my intention. It is a cool feature for
the .001% of the time when you need it (and the case in discussion in
this thread is not one of those), but other than that it's a very,
very bad idea. In the same league of badness as SysV IPC; I'd still
mention SysV IPC to someone who asked about IPC on Linux, however,
because there are places where it is useful even though most times
it's a stupid way to do things (yes, Oracle, *especially* you).

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
A pencil with no point needs no eraser.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFhlQgPqu395ykGsRAn1IAJ9k7Fn19hGOoex8n2VI1E vbkpsvKgCbBMNh
lvlWU0HnYVsP3mAu+pBsss4=
=EBWP
-----END PGP SIGNATURE-----

Jul 18 '05 #48

P: n/a
You do not need to use a 24/7 process for low end persistance, if you
rely on the fact that only one thing can ever succeed in making a
directory. If haven't seen a filesystem where this isn't the case. This
type of locking works cross-thread/process whatever.

An example of that type of locking can be found at:
http://aspn.activestate.com/ASPN/Coo.../Recipe/252495

The only problem with this locking is if a process dies w/out cleaning
up the lock, how do you know when to remove them?
If you have the assumption that the write to the database is quick
(ok for low end), just have the locks time out after a minute. And if
you need to keep the lock longer, unpeel appropriately and reassert
them.

With 2 lock directories, 2 files and 1 temporary file, you end up with
a hard to break system. The cost is disk space, which for low end
should be fine.

Basically, the interesting question is, how far can one,
cross-platform, actually go in building a persistence system with long
term process business.

john

Jul 18 '05 #49

P: n/a
"Michele Simionato" <mi***************@gmail.com> writes:
Ok, I have yet another question: what is the difference between
fcntl.lockf and fcntl.flock? The man page of my Linux system says
that flock is implemented independently of fcntl, however it does
not say if I should use it in preference over fcntl or not.


It looks to me like flock is 4.2-BSD-style locking and fcntl.lockf is
Sys V style. I'm not sure exactly what the differences are, but
generally speaking, BSD did this type of thing better than Sys V. On
the other hand, it looks like lockf has more features, like the
ability to ask for a SIGIO notification if someone tries opening a
file that you have locked. In Python, flock is certainly easier to
use, since you pass an integer mode flag instead of building up the
weird fcntl structure.

There's one subtlety, which is that I'm not sure locking your file
before updating it is guaranteed to do the right thing. You may have
to use a second file as a lock. E.g., suppose the data file is small,
like a few dozen bytes. So maybe:
1) process A opens the file. The contents get read into a cache buffer.
2) Process B opens the file, locks it, updates it, releases the lock.
3) Process A locks and updates the file. Is the cached stuff guaranteed
to get invalidated even in some awful RFS environment? Or could
A clobber B's changes?
Jul 18 '05 #50

This discussion thread is closed

Replies have been disabled for this discussion.