By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,675 Members | 2,270 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,675 IT Pros & Developers. It's quick & easy.

combining the path and fileinput modules

P: n/a
Newbie to python writing a script to recurse a directory tree and delete
the first line of a file if it contains a given string. I get the same
error on a Mac running OS X 10.4.8 and FreeBSD 6.1.

Here's the script:

# start of program

# p.pl - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

import fileinput
import os
import re
import string
import sys
from path import path

# recurse dirs
dir = path(/home/wsbs/Maildir)
for f in dir.walkfiles('*'):
#
# test:
# print f
#
# open file, search, change if necessary, write backup
for line in fileinput.input(f, inplace=1, backup='.bak'):
# check first line only
if fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
# just print all other lines
if not fileinput.isfirstline():
print line.rstrip('\n')
fileinput.close()
# end of program

The script produces this error:

Traceback (most recent call last):
File "./p", line 22, in ?
for line in fileinput.input(f, inplace=1, backup='.bak'):
File "/sw/lib/python2.4/fileinput.py", line 231, in next
line = self.readline()
File "/sw/lib/python2.4/fileinput.py", line 300, in readline
os.rename(self._filename, self._backupfilename)
OSError: [Errno 21] Is a directory

If I uncomment that test routine, and comment out the fileinput stuff,
the program DOES print the full pathname/filename for the variable f.

Many thanks for clues as to why fileinput.input doesn't like f.

Nov 23 '06 #1
Share this Question
Share on Google+
10 Replies


P: n/a

wo_shi_big_stomach wrote:
Newbie to python writing a script to recurse a directory tree and delete
the first line of a file if it contains a given string. I get the same
error on a Mac running OS X 10.4.8 and FreeBSD 6.1.

Here's the script:

# start of program

# p.pl - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

import fileinput
import os
import re
import string
import sys
from path import path

# recurse dirs
dir = path(/home/wsbs/Maildir)
for f in dir.walkfiles('*'):
#
# test:
# print f
Are you absolutely sure that f list doesn't contain
any path to directory, not file?
Add this:

f = filter(os.path.isfile, f)

and try one more time.
#
# open file, search, change if necessary, write backup
for line in fileinput.input(f, inplace=1, backup='.bak'):
# check first line only
if fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
# just print all other lines
if not fileinput.isfirstline():
print line.rstrip('\n')
fileinput.close()
# end of program
--
HTH,
Rob

Nov 23 '06 #2

P: n/a
On 11/23/06 6:15 AM, Rob Wolfe wrote:
wo_shi_big_stomach wrote:
>Newbie to python writing a script to recurse a directory tree and delete
the first line of a file if it contains a given string. I get the same
error on a Mac running OS X 10.4.8 and FreeBSD 6.1.

Here's the script:

# start of program

# p.pl - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

import fileinput
import os
import re
import string
import sys
from path import path

# recurse dirs
dir = path(/home/wsbs/Maildir)
for f in dir.walkfiles('*'):
#
# test:
# print f

Are you absolutely sure that f list doesn't contain
any path to directory, not file?
Add this:

f = filter(os.path.isfile, f)

and try one more time.
Sorry, no joy. Printing f then produces:

rppp
rppppp
rppppp
rpppr
rppppp
rpppP
rppppp
rppppp

which I assure you are not the filenames in this directory.

I've tried this with f and f.name. The former prints the full pathname
and filename; the latter prints just the filename. But neither works
with the fileinput.input() call below.

I get the same error with the filtered mod as before:

File "./p", line 23, in ?
for line in fileinput.input(f, inplace=1, backup='.bak'):

Thanks again for info on what to feed fileinput.input()
>
> #
# open file, search, change if necessary, write backup
for line in fileinput.input(f, inplace=1, backup='.bak'):
# check first line only
if fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
# just print all other lines
if not fileinput.isfirstline():
print line.rstrip('\n')
fileinput.close()
# end of program
Nov 23 '06 #3

P: n/a
At Thursday 23/11/2006 12:21, wo_shi_big_stomach wrote:
dir = path(/home/wsbs/Maildir)
for f in dir.walkfiles('*'):
#
# test:
# print f
Are you absolutely sure that f list doesn't contain
any path to directory, not file?
Add this:

f = filter(os.path.isfile, f)

and try one more time.

Sorry, no joy. Printing f then produces:

rppp
rppppp
rppppp
The filter should be applied to walkfiles. Something like this:

dir = path(/home/wsbs/Maildir)
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test:
# print f
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ˇgratis!
ˇAbrí tu cuenta ya! - http://correo.yahoo.com.ar
Nov 25 '06 #4

P: n/a
Gabriel Genellina wrote:
The filter should be applied to walkfiles. Something like this:

dir = path(/home/wsbs/Maildir)
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test:
# print f
Thanks, this way f will print the full pathname/filename. But f already
does that using Jason Orendorff's path module:

dir = path('/home/wsbs/Maildir')
for f in dir.walkfiles('*'):
print f

Printing the full path/filename isn't the problem. The problem instead
is how to supply f to fileinput.input().

Either the path or the os.path methods cause this line:

for line in fileinput.input(f, inplace=1, backup='.bak'):

to throw this error:

File "./p2.py", line 23, in ?
for line in fileinput.input(f, inplace=1, backup='.bak'):

At this point I believe the error has to do with fileinput, not the path
or os.path modules.

If I give fileinput.input() a hardcoded path/filename in place of 'f'
the program runs. However the program will not accept either f or 'f' as
an argument to fileinput.input().

Again, thanks for guidance on the care and feeding of fileinput.input()

/wsbs

import fileinput
import os
import re
import string
import sys
from path import path

# p.pl - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

# recurse dirs
dir = path('/home/wsbs/Maildir')
#for f in dir.walkfiles('*'):
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test: this will print the full path/filename of each file
print f
#
# open file, search, change if necessary, write backup
# for line in fileinput.input('f', inplace=1, backup='.bak'):
# # just print 2nd and subsequent lines
# if not fileinput.isfirstline():
# print line.rstrip('\n')
# # check first line only
# elif fileinput.isfirstline():
# if not re.search('^From ',line):
# print line.rstrip('\n')
# fileinput.close()
Nov 25 '06 #5

P: n/a
At Saturday 25/11/2006 00:14, wo_shi_big_stomach wrote:
The filter should be applied to walkfiles. Something like this:

dir = path(/home/wsbs/Maildir)
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test:
# print f

Thanks, this way f will print the full pathname/filename. But f already
does that using Jason Orendorff's path module:

dir = path('/home/wsbs/Maildir')
for f in dir.walkfiles('*'):
print f
The filter is used to exclude directories. fileinput can't handle directories.
>At this point I believe the error has to do with fileinput, not the path
or os.path modules.

If I give fileinput.input() a hardcoded path/filename in place of 'f'
the program runs. However the program will not accept either f or 'f' as
an argument to fileinput.input().
Tried with (f,) ?
Notice that *this* error is not the same as your previous error.
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ˇgratis!
ˇAbrí tu cuenta ya! - http://correo.yahoo.com.ar
Nov 25 '06 #6

P: n/a
Gabriel Genellina wrote:
At Saturday 25/11/2006 00:14, wo_shi_big_stomach wrote:
The filter should be applied to walkfiles. Something like this:

dir = path(/home/wsbs/Maildir)
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test:
# print f

Thanks, this way f will print the full pathname/filename. But f already
does that using Jason Orendorff's path module:

dir = path('/home/wsbs/Maildir')
for f in dir.walkfiles('*'):
print f

The filter is used to exclude directories. fileinput can't handle
directories.
???

Both routines above produce identical output -- full path/filenames.
Neither prints just a directory name.
>
>At this point I believe the error has to do with fileinput, not the path
or os.path modules.

If I give fileinput.input() a hardcoded path/filename in place of 'f'
the program runs. However the program will not accept either f or 'f' as
an argument to fileinput.input().

Tried with (f,) ?
Notice that *this* error is not the same as your previous error.
File "p2.py", line 23, in ?
for line in fileinput.input(f,):
File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/fileinput.py",
line 231, in next
line = self.readline()
File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/fileinput.py",
line 320, in readline
self._file = open(self._filename, "r")

This looks similar to before -- fileinput.input() still isn't operating
on the input.

Again, I'm looking 1) walk through all files in a directory tree and 2)
using fileinput, evaluate and possibly edit the files.

The current version of the program is below.

thanks!

/wsbs

# start of program
import fileinput
import os
import re
import string
import sys
from path import path

# p2.py - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

# recurse dirs
dir = path('/home/wsbs/Maildir')
#for f in dir.walkfiles('*'):
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test: this will print the full path/filename of each file
# print f
#
# open file, search, change if necessary, write backup
for line in fileinput.input(f,):
# just print 2nd and subsequent lines
if not fileinput.isfirstline():
print line.rstrip('\n')
# check first line only
elif fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
fileinput.close()

# end of program
Nov 25 '06 #7

P: n/a
Dennis Lee Bieber wrote:
On Sat, 25 Nov 2006 07:58:26 -0800, wo_shi_big_stomach
<wo****************@mac.comdeclaimed the following in
comp.lang.python:
> File "p2.py", line 23, in ?
for line in fileinput.input(f,):
File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/fileinput.py",
line 231, in next
line = self.readline()
File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/fileinput.py",
line 320, in readline
self._file = open(self._filename, "r")

This looks similar to before -- fileinput.input() still isn't operating
on the input.
And where is the actual exception message line -- the one with the
error code/description.

>dir = path('/home/wsbs/Maildir')
#for f in dir.walkfiles('*'):
for f in filter(os.path.isfile, dir.walkfiles('*')):

If I understand the documentation of fileinput, you shouldn't even
need this output loop; fileinput is designed to expect a list of files
(that it works with a single file seems an afterthought)
Yes, thanks. This is the key point.

Feeding fileinput.input() a list rather than a single file (or whatever
it's called in Python) got my program working. Thanks!
>
> for line in fileinput.input(f,):
for line in fileinput.input(filter(os.path.isfile,
dir.walkfiles("*")),
inplace=1):

should handle all the files...
Indeed it does -- too many times.

Sorry, but this (and the program you provided) iterate over the entire
list N times, where N is the number of files, rather than doing one
iteration on each file.

For instance, using your program with inplace editing and a ".bak" file
extension for the originals, I ended up with filenames like
name.bak.bak.bak.bak.bak in a directory with five files in it.

I don't have this third party path
module, so the directory tree walking isn't active, but...
The path module:

http://www.jorendorff.com/articles/python/path/

is a *lot* cleaner than os.path; see the examples at that URL.

Thanks for the great tip about fileinput.input(), and thanks to all who
answered my query. I've pasted the working code below.

/wsbs

import fileinput
import os
import re
import string
import sys
from path import path

# p2.py - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

# recurse dirs
dir = path('/home/wsbs/Maildir')
g = dir.walkfiles('*')
for line in fileinput.input(g, inplace=1, backup='.bak'):
# just print 2nd and subsequent lines
if not fileinput.isfirstline():
print line.rstrip('\n')
# check first line only
elif fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
fileinput.close()
Nov 26 '06 #8

P: n/a
wo_shi_big_stomach wrote:
Thanks for the great tip about fileinput.input(), and thanks to all who
answered my query. I've pasted the working code below.
[snip]
# check first line only
elif fileinput.isfirstline():
if not re.search('^From ',line):
This "works", and in this case you are doing it on only the first line
in each file, but for future reference:

1. Read the re docs section about when to use search and when to use
match; the "^" anchor in your pattern means that search and match give
the same result here.

However the time they take to do it can differ quite a bit :-0

C:\junk>\python25\python -mtimeit -s"import re;text='x'*100"
"re.match('^From ',
text)"
100000 loops, best of 3: 4.39 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*1000"
"re.match('^From '
,text)"
100000 loops, best of 3: 4.41 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*10000"
"re.match('^From
',text)"
100000 loops, best of 3: 4.4 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*100"
"re.search('^From '
,text)"
100000 loops, best of 3: 6.54 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*1000"
"re.search('^From
',text)"
10000 loops, best of 3: 26 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*10000"
"re.search('^From
',text)"
1000 loops, best of 3: 219 usec per loop

Aside: I noticed this years ago but assumed that the simple
optimisation of search was not done as a penalty on people who didn't
RTFM, and so didn't report it :-)

2. Then realise that your test is equivalent to

if not line.startswith('^From '):

which is much easier to understand without the benefit of comments, and
(bonus!) is also much faster than re.match:

C:\junk>\python25\python -mtimeit -s"text='x'*100"
"text.startswith('^From ')"
1000000 loops, best of 3: 0.584 usec per loop

C:\junk>\python25\python -mtimeit -s"text='x'*1000"
"text.startswith('^From ')"
1000000 loops, best of 3: 0.583 usec per loop

C:\junk>\python25\python -mtimeit -s"text='x'*10000"
"text.startswith('^From ')"

1000000 loops, best of 3: 0.612 usec per loop

HTH,
John

Nov 26 '06 #9

P: n/a
John Machin wrote:
[snip]
2. Then realise that your test is equivalent to

if not line.startswith('^From '):
Whoops!

That '^From ' (and all later ones) should have been 'From '

(the perils of over-hasty copy/paste)

The timings are, if anything, a tiny bit faster than before.

Cheers,
John

Nov 26 '06 #10

P: n/a
At Sunday 26/11/2006 01:29, wo_shi_big_stomach wrote:
>for line in fileinput.input(g, inplace=1, backup='.bak'):
# just print 2nd and subsequent lines
if not fileinput.isfirstline():
print line.rstrip('\n')
# check first line only
elif fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
Just a note: the elif is redundant, use a simple else clause.
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ˇgratis!
ˇAbrí tu cuenta ya! - http://correo.yahoo.com.ar
Nov 28 '06 #11

This discussion thread is closed

Replies have been disabled for this discussion.