467,915 Members | 1,582 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 467,915 developers. It's quick & easy.

avoiding file corruption

Hi,

Trying to open a file for writing that is already open for writing
should result in an exception.

It's all too easy to accidentally open a shelve for writing twice and
this can lead to hard to track down database corruption errors.

Amir

Aug 27 '06 #1
  • viewed: 3266
Share:
15 Replies
27 Aug 2006 00:44:33 -0700, Amir Michail <am******@gmail.com>:
Hi,

Trying to open a file for writing that is already open for writing
should result in an exception.

It's all too easy to accidentally open a shelve for writing twice and
this can lead to hard to track down database corruption errors.

Amir

--
http://mail.python.org/mailman/listinfo/python-list
Even if it could be strange, the OS usually allow you to open a file
twice, that's up to the programmer to ensure the consistency of the
operations.

PAolo

--
if you have a minute to spend please visit my photogrphy site:
http://mypic.co.nr
Aug 27 '06 #2

Paolo Pantaleo wrote:
27 Aug 2006 00:44:33 -0700, Amir Michail <am******@gmail.com>:
Hi,

Trying to open a file for writing that is already open for writing
should result in an exception.

It's all too easy to accidentally open a shelve for writing twice and
this can lead to hard to track down database corruption errors.

Amir

--
http://mail.python.org/mailman/listinfo/python-list
Even if it could be strange, the OS usually allow you to open a file
twice, that's up to the programmer to ensure the consistency of the
operations.

PAolo
But if this is usually a serious bug, shouldn't an exception be raised?

Amir
>

--
if you have a minute to spend please visit my photogrphy site:
http://mypic.co.nr
Aug 27 '06 #3
Amir Michail schrieb:
Paolo Pantaleo wrote:
>27 Aug 2006 00:44:33 -0700, Amir Michail <am******@gmail.com>:
>>Hi,

Trying to open a file for writing that is already open for writing
should result in an exception.

It's all too easy to accidentally open a shelve for writing twice and
this can lead to hard to track down database corruption errors.

Amir

--
http://mail.python.org/mailman/listinfo/python-list
Even if it could be strange, the OS usually allow you to open a file
twice, that's up to the programmer to ensure the consistency of the
operations.

PAolo

But if this is usually a serious bug, shouldn't an exception be raised?
executing "rm -rf /" via subprocess is usually also a bad idea. So? No
language can prevent you from doing such mistake. And there is no way to
know if a file is opened twice - it might that you open the same file
twice via e.g. a network share. No way to know that it is the same file.

Diez
Aug 27 '06 #4

Amir Michail wrote:
Hi,

Trying to open a file for writing that is already open for writing
should result in an exception.

It's all too easy to accidentally open a shelve for writing twice and
this can lead to hard to track down database corruption errors.

Amir
I've never done this in anger so feel free to mock (a little :-).

I'd have a fixed field at the beginning of the field that can hold the
hostname process number, and access time of a writing process, togeher
with a sentinal value that means "no process has access to the file".

A program would:
1. wait a random time.
2. open for update the file
3. read the locking data
4. If it is already being used by another process then goto 1.
5. write the process's locking data and time into the lock field.
6 Modify the files other fields.
7 write the sentinal value to the locking field.
8. Close and flush the file to disk.

I have left what to do if a process has locked the file for too long as
a simple exercise for you ;-).

- Paddy.

Aug 27 '06 #5
Diez B. Roggisch wrote:
Amir Michail schrieb:
Paolo Pantaleo wrote:
27 Aug 2006 00:44:33 -0700, Amir Michail <am******@gmail.com>:
Hi,

Trying to open a file for writing that is already open for writing
should result in an exception.

It's all too easy to accidentally open a shelve for writing twice and
this can lead to hard to track down database corruption errors.

Amir

--
http://mail.python.org/mailman/listinfo/python-list

Even if it could be strange, the OS usually allow you to open a file
twice, that's up to the programmer to ensure the consistency of the
operations.

PAolo
But if this is usually a serious bug, shouldn't an exception be raised?

executing "rm -rf /" via subprocess is usually also a bad idea. So? No
language can prevent you from doing such mistake. And there is no way to
know if a file is opened twice - it might that you open the same file
twice via e.g. a network share. No way to know that it is the same file.

Diez
The scenario I have in mind is something like this:

def f():
db=shelve.open('test.db', 'c')
# do some stuff with db
g()
db.close()

def g():
db=shelve.open('test.db', 'c')
# do some stuff with db
db.close()

I think it would be easy for python to check for this problem in
scenarios like this.

Amir

Aug 27 '06 #6
Amir Michail schrieb:
Diez B. Roggisch wrote:
>Amir Michail schrieb:
>>Paolo Pantaleo wrote:
27 Aug 2006 00:44:33 -0700, Amir Michail <am******@gmail.com>:
Hi,
>
Trying to open a file for writing that is already open for writing
should result in an exception.
>
It's all too easy to accidentally open a shelve for writing twice and
this can lead to hard to track down database corruption errors.
>
Amir
>
--
http://mail.python.org/mailman/listinfo/python-list
>
Even if it could be strange, the OS usually allow you to open a file
twice, that's up to the programmer to ensure the consistency of the
operations.

PAolo

But if this is usually a serious bug, shouldn't an exception be raised?
executing "rm -rf /" via subprocess is usually also a bad idea. So? No
language can prevent you from doing such mistake. And there is no way to
know if a file is opened twice - it might that you open the same file
twice via e.g. a network share. No way to know that it is the same file.

Diez

The scenario I have in mind is something like this:

def f():
db=shelve.open('test.db', 'c')
# do some stuff with db
g()
db.close()

def g():
db=shelve.open('test.db', 'c')
# do some stuff with db
db.close()

I think it would be easy for python to check for this problem in
scenarios like this.
You are requesting a general solution for a very particular problem. As
I pointed out, that solution is unlikely to work reliably - if not
infeasible at all.

If you really have problems as the above, use a custom wrapper for
shelve that prevents _you_ from making that mistake.

Diez
Aug 27 '06 #7
Amir Michail wrote:
Trying to open a file for writing that is already open for writing
should result in an exception.

It's all too easy to accidentally open a shelve for writing twice and
this can lead to hard to track down database corruption errors.
The right solution is file locking. Unfortunately, the Python
tandard distribution doesn't have a portable file lock, but you
can do it on Unix and Win NT or better. See:

http://mail.python.org/pipermail/pyt...ry/002957.html

and/or

http://aspn.activestate.com/ASPN/Coo...n/Recipe/65203.
--
--Bryan
Aug 27 '06 #8
On 2006-08-27, Amir Michail <am******@gmail.comwrote:
Trying to open a file for writing that is already open for writing
should result in an exception.
MS Windows seems to do something similar, and it pisses me off
no end. Trying to open a file and read it while somebody else
has it open for writing causes an exception. If I want to open
a file and read it while it's being writtent to, that's my
business.

Likewise, if I want to have a file open for writing twice,
that's my business as well. I certainly don't want to be
hobbled to prevent me from wandering off in the wrong direction.
It's all too easy to accidentally open a shelve for writing
twice and this can lead to hard to track down database
corruption errors.
It's all to easy to delete the wrong element from a list. It's
all to easy to re-bind the wrong object to a name. Should
lists be immutable and names be permanently bound?

--
Grant Edwards grante Yow! I'm in a twist
at contest!! I'm in a
visi.com bathtub! It's on Mars!! I'm
in tip-top condition!
Aug 27 '06 #9
Grant Edwards wrote:
On 2006-08-27, Amir Michail <am******@gmail.comwrote:
Trying to open a file for writing that is already open for writing
should result in an exception.

MS Windows seems to do something similar, and it pisses me off
no end. Trying to open a file and read it while somebody else
has it open for writing causes an exception. If I want to open
a file and read it while it's being writtent to, that's my
business.

Likewise, if I want to have a file open for writing twice,
that's my business as well. I certainly don't want to be
hobbled to prevent me from wandering off in the wrong direction.
It's all too easy to accidentally open a shelve for writing
twice and this can lead to hard to track down database
corruption errors.

It's all to easy to delete the wrong element from a list. It's
all to easy to re-bind the wrong object to a name. Should
lists be immutable and names be permanently bound?
How often do you need to open a file multiple times for writing?

As a high-level language, Python should prevent people from corrupting
data as much as possible.

Amir
--
Grant Edwards grante Yow! I'm in a twist
at contest!! I'm in a
visi.com bathtub! It's on Mars!! I'm
in tip-top condition!
Aug 27 '06 #10
Amir Michail wrote:
Hi,

Trying to open a file for writing that is already open for writing
should result in an exception.
Look at fcntl module, I use it in a class to control access from within my processes.
I don't think this functionality should be inherent to python though.
Keep in mind only my processes open the shelve db so your mileage may vary.
get and set methods are just for convenience
This works under linux, don't know about windows.

#!/usr/bin/env python

import fcntl, shelve, time, bsddb
from os.path import exists

class fLocked:

def __init__(self, fname):
if exists(fname):
#verify it is not corrupt
bsddb.db.DB().verify(fname)
self.fname = fname
self.have_lock = False
self.db = shelve.open(self.fname)
self.fileno = self.db.dict.db.fd()

def __del__(self):
try: self.db.close()
except: pass

def aquire_lock(self, timeout = 5):
if self.have_lock: return True
started = time.time()
while not self.have_lock and (time.time() - started < timeout):
try:
fcntl.flock(self.fileno, fcntl.LOCK_EX + fcntl.LOCK_NB)
self.have_lock = True
except IOError:
# wait for it to become available
time.sleep(.5)
return self.have_lock

def release_lock(self):
if self.have_lock:
fcntl.flock(self.fileno, fcntl.LOCK_UN)
self.have_lock = False
return not self.have_lock

def get(self, key, default = {}):
if self.aquire_lock():
record = self.db.get(key, default)
self.release_lock()
else:
raise IOError, "Unable to lock %s" % self.fname
return record

def set(self, key, value):
if self.aquire_lock():
self.db[key] = value
self.release_lock()
else:
raise IOError, "Unable to lock %s" % self.fname

if __name__ == '__main__':
fname = 'test.db'
dbs = []
for i in range(2): dbs.append(fLocked(fname))
print dbs[0].aquire_lock()
print dbs[1].aquire_lock(1) #should fail getting flock
dbs[0].release_lock()
print dbs[1].aquire_lock() #should be able to get lock
--Tim

Aug 27 '06 #11
Paddy wrote:
I've never done this in anger so feel free to mock (a little :-).

I'd have a fixed field at the beginning of the field that can hold the
hostname process number, and access time of a writing process, togeher
with a sentinal value that means "no process has access to the file".

A program would:
1. wait a random time.
2. open for update the file
3. read the locking data
4. If it is already being used by another process then goto 1.
5. write the process's locking data and time into the lock field.
6 Modify the files other fields.
7 write the sentinal value to the locking field.
8. Close and flush the file to disk.
That doesn't really work; you have still have a race condition.

Locking the file is the good solution, but operating systems
vary in how it works. Other reasonable solutions are to re-name
the file, work with the renamed version, then change it back
after closing; and to use "lock files", which Wikipedia explains
near the bottom of the "File locking" article.
--
--Bryan
Aug 27 '06 #12
Grant Edwards wrote:
Amir Michail wrote:
>Trying to open a file for writing that is already open for writing
should result in an exception.

MS Windows seems to do something similar, and it pisses me off
no end. Trying to open a file and read it while somebody else
has it open for writing causes an exception. If I want to open
a file and read it while it's being writtent to, that's my
business.
Windows is actually much more sophisticated. It does allows shared
write access; see the FILE_SHARE_WRITE option for Win32's CreateFile.
You can also lock specific byte ranges in a file.
--
--Bryan
Aug 27 '06 #13
On 2006-08-27, Amir Michail <am******@gmail.comwrote:
How often do you need to open a file multiple times for writing?
Not very often, but I don't think it should be illegal. That's
probably a result of being a 25 year user of Unix where it's
assumed that the user knows what he's doing.
As a high-level language, Python should prevent people from
corrupting data as much as possible.
For somebody with a Unix background it seems overly restrictive.

--
Grant Edwards grante Yow! Youth of today! Join
at me in a mass rally
visi.com for traditional mental
attitudes!
Aug 27 '06 #14
Dennis Lee Bieber wrote:
On Sun, 27 Aug 2006 14:41:05 -0000, Grant Edwards <gr****@visi.com>
declaimed the following in comp.lang.python:
>>
MS Windows seems to do something similar, and it pisses me off
no end. Trying to open a file and read it while somebody else
has it open for writing causes an exception. If I want to open
a file and read it while it's being writtent to, that's my
business.
Though strangely, Windows seems to permit one to make a COPY of that
open file, and then open that with another application...
Yes, so long as the file hasn't been opened so as to deny reading you can
open it for reading, but you do have to specify the sharing mode. Microsoft
too follow the rule that "Explicit is better than implicit."
Aug 27 '06 #15
On Sun, 2006-08-27 at 07:51 -0700, Amir Michail wrote:
How often do you need to open a file multiple times for writing?
How often do you write code that you don't understand well enough to
fix? This issue is clearly a problem within *your* application.

I'm curious how you could possibly think this could be solved in any
case. What if you accidentally open two instances of the application?
How would Python know? You are asking Python to perform an OS-level
operation (and a questionable one at that).

My suggestion is that you use a real database if you need concurrent
access. If you don't need concurrent access then fix your application.
As a high-level language, Python should prevent people from corrupting
data as much as possible.
"Data" is application-specific. Python has no idea how you intend to
use your data and therefore should not (even if it could) try to protect
you.

Regards,
Cliff

Aug 28 '06 #16

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

12 posts views Thread by Riley DeWiley | last post: by
13 posts views Thread by Bob Darlington | last post: by
8 posts views Thread by ranjeet.gupta | last post: by
5 posts views Thread by robert.waters | last post: by
6 posts views Thread by rdemyan via AccessMonster.com | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.