By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,795 Members | 1,666 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,795 IT Pros & Developers. It's quick & easy.

fileinput module dangerous?

P: n/a
I just started working with the fileinput module, and reading the bit
about inplace modification, it seems very dangerous:

From the doc:
---
Optional in-place filtering: if the keyword argument inplace=1 is
passed to input() or to the FileInput constructor, the file is moved
to a backup file and standard output is directed to the input file

**(if a file of the same name as the backup file already exists, it
will be replaced silently)**.

This makes it possible to write a filter that rewrites its input file
in place. If the keyword argument backup='.<some extension>' is also
given, it specifies the extension for the backup file, and the backup
file remains around; by default, the extension is '.bak' and it is
deleted when the output file is closed. In-place filtering is disabled
when standard input is read.
---[Emphasis mine]

This seems like very dangerous behavior, as .bak is a very common and
useful extension for old versions of files users would like to keep
around, and this module obliterates files one would not expect it to
touch. This behavior is also very un-Pythonic. The Zen says "Errors
should never pass silently." There seems to be no way to get useful
behavior either, as the backup keyword just specifies a backup
extension, which you can never guarantee will not be in use, and it
leaves the file around in the directory as well.

It seems like this module should use tmpfile in the os module for it's
temp file needs.

Is there a rational reason it's this way, or should I start working on
a patch?

--
Chris Connett
Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On 12 Oct 2004 21:57:40 -0700, Chris Connett <ch**********@gmail.com> wrote:
I just started working with the fileinput module, and reading the bit
about inplace modification, it seems very dangerous:

From the doc:
---
Optional in-place filtering: if the keyword argument inplace=1 is
passed to input() or to the FileInput constructor, the file is moved
to a backup file and standard output is directed to the input file

**(if a file of the same name as the backup file already exists, it
will be replaced silently)**. ....
This seems like very dangerous behavior, as .bak is a very common and
useful extension for old versions of files users would like to keep
around, and this module obliterates files one would not expect it to
touch. This behavior is also very un-Pythonic.
(The end user probably doesn't care if this is Pythonic or not when it
clobbers her files ...)

I had no idea it worked like this -- thanks for pointing it out!
It was such an obvious import of the Perl idiom:
perl -pi -e 's/a/b/' file1 file2 ...
perl -pi.bak -e 's/a/b/' file1 file2 ...
which clobbers existing .bak files -- /if you tell it to, not otherwise/.
Anything else is insane.

(Perl will happily overwrite write-protected files, at least under Unix, but
that's a different story...)
Is there a rational reason it's this way, or should I start working on
a patch?


I can't see how there could be a reason. I'd welcome a patch.

/Jorgen

--
// Jorgen Grahn <jgrahn@ Ph'nglui mglw'nafh Cthulhu
\X/ algonet.se> R'lyeh wgah'nagl fhtagn!
Jul 18 '05 #2

P: n/a
Jorgen Grahn wrote:
[...]

I can't see how there could be a reason. I'd welcome a patch.
OK, I've gone ahead and made a fix. It makes use of os.tmpfile if it is
available, but implements the same (now safe) behavior if it is
unavailable through other means.

I also searched through old postings about this module, and went over
various complaints. I found that unicode awareness was missing and
wanted, and since there is no other effective way to mimic it in client
code, I added encoding support. All my changes *should* be backwards
compatible, which brings me to my next question.

As someone who has never posted a patch, what are next steps from here.
Is there a standard test suite that I need to run? What about
documentation updates?

This patch should be considered alpha quality, especially the encoding
support, though if the module is used without specifying an encoding, it
should work exactly as before. Please beat on it, though do note that I
haven't run any standard test suites if they exist, so bugs may not
necessarily be obscure ones. I will continue to run more tests myself
in the coming days.

Note:
- Due to the buffering nature of fileinput, universal newlines only
works when using an encoding whose StreamReader's readlines() method
performs universal newlining.

--
Chris Connett

190d189
< self._encoding = encoding
200a200 self._encoding = encoding 211a212,218 # need parameterized file modes, since the files backing
# encodings should be raw bytestreams
self._readmode = 'r'
self._writemode = 'w'
if self._encoding:
self._readmode += 'b'
self._writemode += 'b' 248c255
< self._savestdout = 0
--- self._savestdout = None 253c260
< self._output = 0
--- self._output = None 290c297
< self._backupfilename = 0
--- self._backupfilename = None 297,307c304,305
< self._backupfilename = (
< self._filename + (self._backup or os.extsep+"bak"))
< try: os.unlink(self._backupfilename)
< except os.error: pass
< # The next few lines may raise IOError
< os.rename(self._filename, self._backupfilename)
< self._file = open(self._backupfilename, "r")
< try:
< perm = os.fstat(self._file.fileno()).st_mode
< except OSError:
< self._output = open(self._filename, "w")
--- if self._backup:
self._backupfilename = self._filename + self._backup 309,312d306
< fd = os.open(self._filename,
< os.O_CREAT | os.O_WRONLY | os.O_TRUNC,
< perm)
< self._output = os.fdopen(fd, "w")
314,316c308,331
< if hasattr(os, 'chmod'):
< os.chmod(self._filename, perm)
< except OSError:
--- self._file = os.tmpfile()
# copy our input into the tmpfile to keep the same
# backup/write relationship
self._file.write(open(self._filename,'rb').read())
self._file.seek(0)
except NameError: # tmpfile not available
import random
self._backupfilename = self._filename
while os.path.exists( self._backupfilename ):
self._backupfilename += (
'%x' % random.randint(0, 16))
# The next few lines may raise IOError
if not self._file: # _file may already be a tmpfile
os.unlink(self._backupfilename)
# self._backupfilename will only exist if the
# user explicitly specified an extension to
# use, i.e., it's their fault if we clobber
# something now
os.rename(self._filename, self._backupfilename)
self._file = open(self._backupfilename, self._readmode)
try:
perm = os.stat(self._backupfilename).st_mode
os.chmod(self._filename,perm)
except (OSError, NameError): 317a333,334
self._output = open(self._filename, self._writemode) 322c339,347
< self._file = open(self._filename, "r")
--- self._file = open(self._filename, self._readmode)
# now wrap all our new file objects if the user specified
# an encoding
if self._encoding:
(x, x, reader, writer) = codecs.lookup(self._encoding)
self._file = reader(self._file)
if self._output:
self._output = writer(self._output)
sys.stdout = self._output


Jul 18 '05 #3

P: n/a
OK, I found the SF page for all that stuff and took care of everything.

Patch #1048075:
https://sourceforge.net/tracker/inde...70&atid=305470
Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.