CGIs and file exclusion

darkhorse

Hi all,

While doing a quite big "set of programs" for a university subject I've
found myself in the middle of a problem someone surely has had before, so
I'm looking for some help.

At one point, I call a python cgi that pickle.load's a file, adds or deletes
a registry and dumps the result again in the file.
I'm worried that the cgi could be called simultaneously from two or more
different computers, thus most probably corrupting the files. I don't think
I can use a mutex as it's two different instances of the program and not
different threads, so I was thinking about locking the files between
programs, but I really don't know how to do that. It's not a problem if
there's no portable way of doing this, it's only going to be run on a linux
based computer.
Another solution I would accept is that the second called cgi detects that
other instance is running and displays a message saying to try again later.
Yes, not quite professional, but it'd do the job, after all this is just a
little detail I want to settle for a quite picky professor and not a "real
life" thing.
I think that's all the background you need, if someone can answer with
references on what should I look for or even better example code that would
be simply great.
Many thanks in advance.
DH

Jul 18 '05 #1

Subscribe Post Reply

1758

Diez B. Roggisch

I'd suggest using ZODB instead of directly pickling - then every cgi can
open the database with its own connection. ZODB will manage concurrency
issues.
--
Regards,

Diez B. Roggisch

Jul 18 '05 #2

David M. Cooke

"darkhorse" <no************@terra.es> writes:

Hi all,

While doing a quite big "set of programs" for a university subject I've
found myself in the middle of a problem someone surely has had before, so
I'm looking for some help.

At one point, I call a python cgi that pickle.load's a file, adds or deletes
a registry and dumps the result again in the file.
I'm worried that the cgi could be called simultaneously from two or more
different computers, thus most probably corrupting the files. I don't think
I can use a mutex as it's two different instances of the program and not
different threads, so I was thinking about locking the files between
programs, but I really don't know how to do that. It's not a problem if
there's no portable way of doing this, it's only going to be run on a linux
based computer.
You want the fcntl module, and use either flock() or lockf().

import fcntl, cPickle
# open in update mode
fo = open('somefile.pickle', 'r+')
# acquire an exclusive lock; any other process trying to acquire this
# will block
fcntl.lockf(fo.fileno(), fcntl.LOCK_EX)
data = cPickle.load(fo)
.... do stuff to data ...
# go back to the beginning
fo.seek(0)
# write out the new pickle
cPickle.dump(data, fo)
# throw the rest of the (old) pickle away
fo.truncate()
# and close. This releases the lock.
fo.close()
Another solution I would accept is that the second called cgi detects that
other instance is running and displays a message saying to try again later.
Yes, not quite professional, but it'd do the job, after all this is just a
little detail I want to settle for a quite picky professor and not a "real
life" thing.

If you want to do something like that, use
fcntl.LOCK_EX | fcntl.LOCK_NB as the flags to fcntl.lockf(). If the
lock can't be acquired, an IOError will be raised, and you can print
out your message.

Note that lockf() does advisory locks; a process that just opens the
file and does something with it without calling lockf() *won't* be blocked.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca

Jul 18 '05 #3

Tim Roberts

"Diez B. Roggisch" <de*********@web.de> wrote:

I'd suggest using ZODB instead of directly pickling - then every cgi can
open the database with its own connection. ZODB will manage concurrency
issues.

Holy moley, isn't that something like recommending a turbocharged,
12-cylinder, air-conditioned SUV as a cell phone charger?

Sure, Zope is the answer, but only if you phrase the question very, very
carefully.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.

Jul 18 '05 #4

Andrew Durdin

On Thu, 04 Nov 2004 21:19:19 -0800, Tim Roberts <ti**@probo.com> wrote:

"Diez B. Roggisch" <de*********@web.de> wrote:

I'd suggest using ZODB instead of directly pickling - then every cgi can
open the database with its own connection. ZODB will manage concurrency
issues.

Holy moley, isn't that something like recommending a turbocharged,
12-cylinder, air-conditioned SUV as a cell phone charger?

Sure, Zope is the answer, but only if you phrase the question very, very
carefully.

Note that he said ZODB, not Zope -- ZODB can be used on its own
without the rest of Zope -- see
http://www.zope.org/Wikis/ZODB/FrontPage

Jul 18 '05 #5

darkhorse

"David M. Cooke" <co**********@physics.mcmaster.ca> escribió en el mensaje
news:qn*************@arbutus.physics.mcmaster.ca.. .
[David wrotes about fcntl.lockf...]

Thank you all for your answers, seems a good solution for what I'm doing.
I'll try it this afternoon.

Thanks again
DH

Jul 18 '05 #6

Michael Foord

"darkhorse" <no************@terra.es> wrote in message news:<2u*************@uni-berlin.de>...

Hi all,

While doing a quite big "set of programs" for a university subject I've
found myself in the middle of a problem someone surely has had before, so
I'm looking for some help.

At one point, I call a python cgi that pickle.load's a file, adds or deletes
a registry and dumps the result again in the file.
I'm worried that the cgi could be called simultaneously from two or more
different computers, thus most probably corrupting the files. I don't think
I can use a mutex as it's two different instances of the program and not
different threads, so I was thinking about locking the files between
programs, but I really don't know how to do that. It's not a problem if
there's no portable way of doing this, it's only going to be run on a linux
based computer.
Another solution I would accept is that the second called cgi detects that
other instance is running and displays a message saying to try again later.
Yes, not quite professional, but it'd do the job, after all this is just a
little detail I want to settle for a quite picky professor and not a "real
life" thing.
I think that's all the background you need, if someone can answer with
references on what should I look for or even better example code that would
be simply great.
Many thanks in advance.
DH

A simple solution that doesn't scale well is to create a file when the
access starts. You can check if the file exists and pause until the
other process deletes it - with a timeout in case the file gets keft
there due to an error.

Obviously not an industrial strength solution, but it does work...

import time
import os

def sleep(thelockfile, sleepcycle=0.01, MAXCOUNT=200):
"""Sleep until the lockfile has been removed or a certain number
of cycles have gone.
Defaults to a max 2 second delay.
"""
counter = 0
while os.path.exists(thelockfile):
time.sleep(sleepcycle)
counter += 1
if counter > MAXCOUNT: break

def createlock(thelockfile):
"""Creates a lockfile from the path supplied."""
open(thelockfile, 'w').close()

def releaselock(thelockfile):
"""Deletes the lockfile."""
if os.path.isfile(thelockfile):
os.remove(thelockfile)

The sleep function waits until the specified file dissapears - or it
times out.

Regards,

Fuzzy

http://www.voidspace.org.uk/atlantib...thonutils.html

Jul 18 '05 #7

Michele Simionato

"Diez B. Roggisch" <de*********@web.de> wrote in message news:<cm*************@news.t-online.com>...

I'd suggest using ZODB instead of directly pickling - then every cgi can
open the database with its own connection. ZODB will manage concurrency
issues.

Ok, but then I guess you need an external long-living process keeping the
DB open for you. If the cgi script opens and close the database by itself,
the only thing the ZODB buys for you is raising

IOError: [Errno 11] Resource temporarily unavailable

(not unexpectedly). Here is what I tried:

$ cat concurrency.py
import ZODB
from ZODB.FileStorage import FileStorage

def openzodb(dbname):
db = ZODB.DB(FileStorage(dbname + ".fs"))
conn = db.open()
return db, conn, conn.root()

if __name__ == "__main__":
print "Opening the db ..."
db, conn, root = openzodb("x")
print "Storing something ..."
root["somekey"] = "somedata"
get_transaction().commit()
print "Closing the db ..."
conn.close(); db.close()

$ echo Makefile
default:
python concurrency.py&
python concurrency.py

$ make
python concurrency.py&
python concurrency.py
Opening the db ...
Opening the db ...
Traceback (most recent call last):
File "concurrency.py", line 12, in ?
db, conn, root = openzodb("x")
File "concurrency.py", line 6, in openzodb
db = ZODB.DB(FileStorage(dbname + ".fs"))
File "/opt/zope/lib/python/ZODB/FileStorage.py", line 232, in __init__
Storing something ...
self._lock_file = LockFile(file_name + '.lock')
File "/opt/zope/lib/python/ZODB/lock_file.py", line 62, in __init__
lock_file(self._fp)
File "/opt/zope/lib/python/ZODB/lock_file.py", line 42, in lock_file
fcntl.flock(file.fileno(), _flags)
IOError: [Errno 11] Resource temporarily unavailable
Closing the db ...

BTW, it is clear from the traceback than internally ZODB uses fcntl
and probably a custom solution based on it would be simpler than
installing the ZODB (unless the OP has some reason to want it).
But I guess you had in mind something different, care to explain?

Michele Simionato

Jul 18 '05 #8

Diez B. Roggisch

> Ok, but then I guess you need an external long-living process keeping the

DB open for you. If the cgi script opens and close the database by itself,
the only thing the ZODB buys for you is raising
Oops, yes. I'm not sure if anything else than filestorage will help with
that, but of course my application is a "real" server.

BTW, it is clear from the traceback than internally ZODB uses fcntl
and probably a custom solution based on it would be simpler than
installing the ZODB (unless the OP has some reason to want it).
But I guess you had in mind something different, care to explain?

I've run into the same tracebacks when my application died and got stale so
starting it up again provoked that locking error.

But this exactly is the reason why I personally would refrain from using a
fcntl-approach. What happens if your cgi process freezes? As I never used
mod_python or similar "primitive" http frameworks, I don't know if
malfunctioning scripts are reliably killed after some time - and even if
they _are_, that would most probably be after at least 30seconds, if not
more.

So to paraphrase my suggestion: Instead of trying to reinvent the wheel of
serialized access to a critical resource from different processes, use some
database abstraction layer. As the OP wanted to use pickle, I guess he's
not interested in OR-mapping his stuff to a RDBMS, but wants to deal with
"pure" python objects - thats why I recommended zodb in the first place.

So maybe a small application server is in order - you could for exampl use
corba, xmlrpc or pyro to connect to that service from the cgi, keeping the
cgi-layer extremely simple.

--
Regards,

Diez B. Roggisch

Jul 18 '05 #9

Mike Meyer

fu******@gmail.com (Michael Foord) writes:

"darkhorse" <no************@terra.es> wrote in message news:<2u*************@uni-berlin.de>...
Hi all,

While doing a quite big "set of programs" for a university subject I've
found myself in the middle of a problem someone surely has had before, so
I'm looking for some help.

At one point, I call a python cgi that pickle.load's a file, adds or deletes
a registry and dumps the result again in the file.
I'm worried that the cgi could be called simultaneously from two or more
different computers, thus most probably corrupting the files. I don't think
I can use a mutex as it's two different instances of the program and not
different threads, so I was thinking about locking the files between
programs, but I really don't know how to do that. It's not a problem if
there's no portable way of doing this, it's only going to be run on a linux
based computer.
Another solution I would accept is that the second called cgi detects that
other instance is running and displays a message saying to try again later.
Yes, not quite professional, but it'd do the job, after all this is just a
little detail I want to settle for a quite picky professor and not a "real
life" thing.
I think that's all the background you need, if someone can answer with
references on what should I look for or even better example code that would
be simply great.
Many thanks in advance.
DH
A simple solution that doesn't scale well is to create a file when the
access starts. You can check if the file exists and pause until the
other process deletes it - with a timeout in case the file gets keft
there due to an error.

Obviously not an industrial strength solution, but it does work...

To strengthen the solution, write the process id of the script
(available via os.getpid()) to the file. If the file doesn't vanish before
your timeout, you can check to see if the process is still around, and
kill it.

<mike
import time
import os

def sleep(thelockfile, sleepcycle=0.01, MAXCOUNT=200):
"""Sleep until the lockfile has been removed or a certain number
of cycles have gone.
Defaults to a max 2 second delay.
"""
counter = 0
while os.path.exists(thelockfile):
time.sleep(sleepcycle)
counter += 1
if counter > MAXCOUNT: f = file(thelockfile)
pid = int(f.read())
f.close()
p = os.popen("/bin/ps -p %s" % pid)
l = p.read().split('\n')
p.close()
if len(l) > 2:
os.kill(pid, 9)

def createlock(thelockfile):
"""Creates a lockfile from the path supplied.""" f = file(thelockfile, 'w')
f.write(str(os.getpid()))
f.close()
def releaselock(thelockfile):
"""Deletes the lockfile."""
if os.path.isfile(thelockfile):
os.remove(thelockfile)

The sleep function waits until the specified file dissapears - or it
times out.

Regards,

Fuzzy

http://www.voidspace.org.uk/atlantib...thonutils.html

--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Jul 18 '05 #10

Michele Simionato

fu******@gmail.com (Michael Foord) wrote in message news:<6f*************************@posting.google.c om>...

A simple solution that doesn't scale well is to create a file when the
access starts. You can check if the file exists and pause until the
other process deletes it - with a timeout in case the file gets keft
there due to an error.

Obviously not an industrial strength solution, but it does work...

import time
import os

def sleep(thelockfile, sleepcycle=0.01, MAXCOUNT=200):
"""Sleep until the lockfile has been removed or a certain number
of cycles have gone.
Defaults to a max 2 second delay.
"""
counter = 0
while os.path.exists(thelockfile):
time.sleep(sleepcycle)
counter += 1
if counter > MAXCOUNT: break

def createlock(thelockfile):
"""Creates a lockfile from the path supplied."""
open(thelockfile, 'w').close()

def releaselock(thelockfile):
"""Deletes the lockfile."""
if os.path.isfile(thelockfile):
os.remove(thelockfile)

The sleep function waits until the specified file dissapears - or it
times out.

I tried essentially the same solution in my experiments, but I was
unhappy
with it: it seems to work 99% of times, but occasionally you get
strange
things (for instance once I got "File not found" when trying to remove
the lockfile, evidently it was already removed by another process;
other
times I got different strange errors). The issue is that it is very
difficult to reproduce the problems, hence to fix them. Maybe
Diez B. Roggisch is right and a real database server is the simplest
solution. However my first attempt with ZEO didn't worked either:

$ cat zeoclient.py
import ZODB, ZEO
from ZEO.ClientStorage import ClientStorage

def openzeo(host, port):
db = ZODB.DB(ClientStorage((host, port)))
conn = db.open()
return db, conn, conn.root()

def store():
# I have a ZEO instance running on port 9999
print "Opening the db ..."
db, conn, root = openzeo("localhost", 9999)
print "Storing something ..."
root["somekey"] = "somedata"
get_transaction().commit()
print "Closing the db ..."
conn.close(); db.close()

if __name__ == "__main__":
store()

$ echo Makefile
default:
python zeoclient.py&
python zeoclient.py

$ make
python zeoclient.py&
python zeoclient.py
Opening the db ...
Opening the db ...

Storing something ...
Storing something ...
Closing the db ...
Traceback (most recent call last):
File "zeoclient.py", line 20, in ?
store()
File "zeoclient.py", line 15, in store
get_transaction().commit()
File "/usr/share/partecs/zope/lib/python/ZODB/Transaction.py", line
247, in commit
~/pt/python/zopexplore $
~/pt/python/zopexplore $ vote(self)
File "/usr/share/partecs/zope/lib/python/ZODB/Connection.py", line
699, in tpc_vote
s = vote(transaction)
File "/opt/zope/lib/python/ZEO/ClientStorage.py", line 841, in
tpc_vote
return self._check_serials()
File "/opt/zope/lib/python/ZEO/ClientStorage.py", line 825, in
_check_serials
raise s
ZODB.POSException.ConflictError: database conflict error (oid
0000000000000000, serial was 035900d31b7fedaa, now 035900d2f6cd8799)

(it works with a single process instead).

Maybe I misunderstood how ZEO is intended to be used, as usual it is
difficult to found the relevant documentation :-( Maybe I should ask
on another list ...
Michele Simionato

Jul 18 '05 #11

Tim Peters

[Michele Simionato]

...
Maybe Diez B. Roggisch is right and a real database server is the simplest
solution. However my first attempt with ZEO didn't worked either:
That's OK, nobody's first attempt with any database server works <0.6 wink>.
$ cat zeoclient.py
import ZODB, ZEO
from ZEO.ClientStorage import ClientStorage

def openzeo(host, port):
db = ZODB.DB(ClientStorage((host, port)))
conn = db.open()
return db, conn, conn.root()

def store():
# I have a ZEO instance running on port 9999
print "Opening the db ..."
db, conn, root = openzeo("localhost", 9999)
print "Storing something ..."
root["somekey"] = "somedata"
It's important to note that store() always changes at least the root object.
get_transaction().commit()
print "Closing the db ..."
conn.close(); db.close()

if __name__ == "__main__":
store()

$ echo Makefile
default:
python zeoclient.py&
python zeoclient.py

$ make
python zeoclient.py&
python zeoclient.py
Opening the db ...
Opening the db ...

Storing something ...
Storing something ...
Closing the db ...
Traceback (most recent call last):
File "zeoclient.py", line 20, in ?
store()
File "zeoclient.py", line 15, in store
get_transaction().commit()
File "/usr/share/partecs/zope/lib/python/ZODB/Transaction.py", line
247, in commit
~/pt/python/zopexplore $
~/pt/python/zopexplore $ vote(self)
File "/usr/share/partecs/zope/lib/python/ZODB/Connection.py", line
699, in tpc_vote
s = vote(transaction)
File "/opt/zope/lib/python/ZEO/ClientStorage.py", line 841, in
tpc_vote
return self._check_serials()
File "/opt/zope/lib/python/ZEO/ClientStorage.py", line 825, in
_check_serials
raise s
ZODB.POSException.ConflictError: database conflict error (oid
0000000000000000, serial was 035900d31b7fedaa, now 035900d2f6cd8799)

(it works with a single process instead).
Yes, that's predictable too <wink>.
Maybe I misunderstood how ZEO is intended to be used, as usual it is
difficult to found the relevant documentation :-( Maybe I should ask
on another list ...

zo******@zope.org is the best place for ZODB/ZEO questions independent
of Zope use. Note that you must subscribe to a zope.org list in order
to post to it (that's a Draconian but very effective anti-spam
policy).

In the case above, ZEO isn't actually relevant. You'd see the same
thing if you had a single process with two threads, each using a
"direct" ZODB connection to the same database.

ZODB doesn't do object-level locking. It relies on "optimistic
concurrency control" (a googlable phrase) instead, which is especially
appropriate for high-read low-write applications like most Zope
deployments.

In effect, that means it won't stop you from trying to do something
insane, but does stop you from *completing* it. What you got above is
a "write conflict error", and is normal behavior. What happens:

- Process A loads revision n of some particular object O.
- Process B loads the same revision n of O.
- Process A modifies O, creating revision n+1.
- Process A commits its change to O. Revsion n+1 is then current.
- Process B modifies O, creating revision n+2.
- Process B *tries* to commit its change to O.

The implementation of commit() investigates, and effectively says
"Hmm. Process B started with revision n of O, but revision n+1 is
currently committed. That means B didn't *start* with the currently
committed revision of O, so B has no idea what might have happened in
revision n+1 -- B may be trying to commit an insane change as a
result. Can't let that happen, so I'll raise ConflictError". That
line of argument makes a lot more sense if more than one object is
involved, but maybe it's enough to hint at the possible problems.

Anyway, since your store() method always picks on the root object,
you're going to get ConflictErrors frequently. It's bad application
design for a ZODB/ZEO app to have a "hot spot" like that.

In real life, all ZEO apps, and all multithreaded ZODB apps, always do
their work inside try/except structures. When a conflict error
occurs, the except clause catches it, and generally tries the
transaction again. In your code above, that isn't going to work well,
because there's a single object that's modified by every transaction
-- it will be rare for a commit() attempt not to give up with a
conflict error.

Perhaps paradoxically, it can be easier to get a real ZEO app working
well than one's first overly simple attempts -- ZODB effectively
*wants* you to scribble all over the database.

Jul 18 '05 #12

Michael Foord

Mike Meyer <mw*@mired.org> wrote in message news:<x7************@guru.mired.org>...

fu******@gmail.com (Michael Foord) writes:

[snip..]

A simple solution that doesn't scale well is to create a file when the
access starts. You can check if the file exists and pause until the
other process deletes it - with a timeout in case the file gets keft
there due to an error.

Obviously not an industrial strength solution, but it does work...

To strengthen the solution, write the process id of the script
(available via os.getpid()) to the file. If the file doesn't vanish before
your timeout, you can check to see if the process is still around, and
kill it.

<mike

Thanks - a good suggestion.

Regards,

Fuzzy
http://www.voidspace.org.uk/atlantib...thonutils.html

Jul 18 '05 #13

Michele Simionato

Tim Peters <ti********@gmail.com> wrote in message news:<ma**************************************@pyt hon.org>...

What you got above is
a "write conflict error", and is normal behavior. What happens:

- Process A loads revision n of some particular object O.
- Process B loads the same revision n of O.
- Process A modifies O, creating revision n+1.
- Process A commits its change to O. Revsion n+1 is then current.
- Process B modifies O, creating revision n+2.
- Process B *tries* to commit its change to O.

The implementation of commit() investigates, and effectively says
"Hmm. Process B started with revision n of O, but revision n+1 is
currently committed. That means B didn't *start* with the currently
committed revision of O, so B has no idea what might have happened in
revision n+1 -- B may be trying to commit an insane change as a
result. Can't let that happen, so I'll raise ConflictError". That
line of argument makes a lot more sense if more than one object is
involved, but maybe it's enough to hint at the possible problems.

Anyway, since your store() method always picks on the root object,
you're going to get ConflictErrors frequently. It's bad application
design for a ZODB/ZEO app to have a "hot spot" like that.

In real life, all ZEO apps, and all multithreaded ZODB apps, always do
their work inside try/except structures. When a conflict error
occurs, the except clause catches it, and generally tries the
transaction again. In your code above, that isn't going to work well,
because there's a single object that's modified by every transaction
-- it will be rare for a commit() attempt not to give up with a
conflict error.

Perhaps paradoxically, it can be easier to get a real ZEO app working
well than one's first overly simple attempts -- ZODB effectively
*wants* you to scribble all over the database.

Ok, I understand what you are saying, but I do not understand how would I
solve the problem. This is interesting to me since it has to do with a real
application I am working on. Maybe I should give the framework.

We have an application where the users can interact with the system via
a Web interface (developed in Zope/Plone by other people) and via email. I am
doing the email part. We want the email part to be independent from the
Zope part, since it must act also as a safety belt (i.e. even if the Zope
server is down for any reason the email part must continue to work).

Moreover, people with slow connections can prefer the email interface
over the Zope/Plone interface which is pretty heavyweight. So, it must be
there. We do expect to have few emails coming in (<100 per hour) so I just
modified /etc/aliases and each mail is piped to a simple Python script which
parses it and stores the relevant information (who sent the email, the date,
the content, etc.).

Input coming via email or via the web interface should go into
the same database. Since we are using Zope anyway and there is no
much writing to do, we thought to use the ZODB and actually ZEO to
keep it independent from the main Zope instance. We could use another
database if needed, but we would prefer to avoid additional dependencies
and installation issues.

The problem is that occasionally two emails (or an email and a web submission)
can arrive at the same time. At the moment I just catch the error and send
back an email saying "Sorry, there was an internal error. Please retry later".
This is rare but it happened during the testing phase. I would rather avoid
that. I thought about catching the exception and waiting a bit before retrying,
but I am not completely happy with that; I also tried a hand-coded solution
involving a lock file but if was not 100% reliable. So I ask here if the ZODB
has some smart way to solve that, or if it is possible to change the design in
such a way to avoid those concurrency issues as much as possible.

Another concern of mine is security. What happens if a maliciuous user
sends 10000 emails at the same time? Does the mail server (can be postfix or
exim4) spawn tons of processes until we run out of memory and the server
crashes? How would I avoid that? I can think of various hackish solutions but I
would like something reliable.

Any hints? Suggestions?

Thanks,
Michele Simionato

Jul 18 '05 #14

CGIs and file exclusion

Similar topics