combining the path and fileinput modules

wo_shi_big_stomach

Newbie to python writing a script to recurse a directory tree and delete
the first line of a file if it contains a given string. I get the same
error on a Mac running OS X 10.4.8 and FreeBSD 6.1.

Here's the script:

# start of program

# p.pl - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

import fileinput
import os
import re
import string
import sys
from path import path

# recurse dirs
dir = path(/home/wsbs/Maildir)
for f in dir.walkfiles('*'):
#
# test:
# print f
#
# open file, search, change if necessary, write backup
for line in fileinput.input(f, inplace=1, backup='.bak'):
# check first line only
if fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
# just print all other lines
if not fileinput.isfirstline():
print line.rstrip('\n')
fileinput.close()
# end of program

The script produces this error:

Traceback (most recent call last):
File "./p", line 22, in ?
for line in fileinput.input(f, inplace=1, backup='.bak'):
File "/sw/lib/python2.4/fileinput.py", line 231, in next
line = self.readline()
File "/sw/lib/python2.4/fileinput.py", line 300, in readline
os.rename(self._filename, self._backupfilename)
OSError: [Errno 21] Is a directory

If I uncomment that test routine, and comment out the fileinput stuff,
the program DOES print the full pathname/filename for the variable f.

Many thanks for clues as to why fileinput.input doesn't like f.

Nov 23 '06 #1

Subscribe Post Reply

2916

Rob Wolfe

wo_shi_big_stomach wrote:

Newbie to python writing a script to recurse a directory tree and delete
the first line of a file if it contains a given string. I get the same
error on a Mac running OS X 10.4.8 and FreeBSD 6.1.

Here's the script:

# start of program

# p.pl - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

import fileinput
import os
import re
import string
import sys
from path import path

# recurse dirs
dir = path(/home/wsbs/Maildir)
for f in dir.walkfiles('*'):
#
# test:
# print f

Are you absolutely sure that f list doesn't contain
any path to directory, not file?
Add this:

f = filter(os.path.isfile, f)

and try one more time.

#
# open file, search, change if necessary, write backup
for line in fileinput.input(f, inplace=1, backup='.bak'):
# check first line only
if fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
# just print all other lines
if not fileinput.isfirstline():
print line.rstrip('\n')
fileinput.close()
# end of program

--
HTH,
Rob

Nov 23 '06 #2

wo_shi_big_stomach

On 11/23/06 6:15 AM, Rob Wolfe wrote:

wo_shi_big_stomach wrote:
>Newbie to python writing a script to recurse a directory tree and delete
the first line of a file if it contains a given string. I get the same
error on a Mac running OS X 10.4.8 and FreeBSD 6.1.

Here's the script:

# start of program

# p.pl - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

import fileinput
import os
import re
import string
import sys
from path import path

# recurse dirs
dir = path(/home/wsbs/Maildir)
for f in dir.walkfiles('*'):
#
# test:
# print f

Are you absolutely sure that f list doesn't contain
any path to directory, not file?
Add this:

f = filter(os.path.isfile, f)

and try one more time.

Sorry, no joy. Printing f then produces:

rppp
rppppp
rppppp
rpppr
rppppp
rpppP
rppppp
rppppp

which I assure you are not the filenames in this directory.

I've tried this with f and f.name. The former prints the full pathname
and filename; the latter prints just the filename. But neither works
with the fileinput.input() call below.

I get the same error with the filtered mod as before:

File "./p", line 23, in ?
for line in fileinput.input(f, inplace=1, backup='.bak'):

Thanks again for info on what to feed fileinput.input()

>
> #
# open file, search, change if necessary, write backup
for line in fileinput.input(f, inplace=1, backup='.bak'):
# check first line only
if fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
# just print all other lines
if not fileinput.isfirstline():
print line.rstrip('\n')
fileinput.close()
# end of program

Nov 23 '06 #3

Gabriel Genellina

At Thursday 23/11/2006 12:21, wo_shi_big_stomach wrote:

dir = path(/home/wsbs/Maildir)
for f in dir.walkfiles('*'):
#
# test:
# print f
Are you absolutely sure that f list doesn't contain
any path to directory, not file?
Add this:

f = filter(os.path.isfile, f)

and try one more time.

Sorry, no joy. Printing f then produces:

rppp
rppppp
rppppp

The filter should be applied to walkfiles. Something like this:

dir = path(/home/wsbs/Maildir)
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test:
# print f
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar

Nov 25 '06 #4

wo_shi_big_stomach

Gabriel Genellina wrote:

The filter should be applied to walkfiles. Something like this:

dir = path(/home/wsbs/Maildir)
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test:
# print f

Thanks, this way f will print the full pathname/filename. But f already
does that using Jason Orendorff's path module:

dir = path('/home/wsbs/Maildir')
for f in dir.walkfiles('*'):
print f

Printing the full path/filename isn't the problem. The problem instead
is how to supply f to fileinput.input().

Either the path or the os.path methods cause this line:

for line in fileinput.input(f, inplace=1, backup='.bak'):

to throw this error:

File "./p2.py", line 23, in ?
for line in fileinput.input(f, inplace=1, backup='.bak'):

At this point I believe the error has to do with fileinput, not the path
or os.path modules.

If I give fileinput.input() a hardcoded path/filename in place of 'f'
the program runs. However the program will not accept either f or 'f' as
an argument to fileinput.input().

Again, thanks for guidance on the care and feeding of fileinput.input()

/wsbs

import fileinput
import os
import re
import string
import sys
from path import path

# p.pl - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

# recurse dirs
dir = path('/home/wsbs/Maildir')
#for f in dir.walkfiles('*'):
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test: this will print the full path/filename of each file
print f
#
# open file, search, change if necessary, write backup
# for line in fileinput.input('f', inplace=1, backup='.bak'):
# # just print 2nd and subsequent lines
# if not fileinput.isfirstline():
# print line.rstrip('\n')
# # check first line only
# elif fileinput.isfirstline():
# if not re.search('^From ',line):
# print line.rstrip('\n')
# fileinput.close()

Nov 25 '06 #5

Gabriel Genellina

At Saturday 25/11/2006 00:14, wo_shi_big_stomach wrote:

The filter should be applied to walkfiles. Something like this:

dir = path(/home/wsbs/Maildir)
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test:
# print f

Thanks, this way f will print the full pathname/filename. But f already
does that using Jason Orendorff's path module:

dir = path('/home/wsbs/Maildir')
for f in dir.walkfiles('*'):
print f

The filter is used to exclude directories. fileinput can't handle directories.

>At this point I believe the error has to do with fileinput, not the path
or os.path modules.

If I give fileinput.input() a hardcoded path/filename in place of 'f'
the program runs. However the program will not accept either f or 'f' as
an argument to fileinput.input().

Tried with (f,) ?
Notice that *this* error is not the same as your previous error.
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar

Nov 25 '06 #6

wo_shi_big_stomach

Gabriel Genellina wrote:

At Saturday 25/11/2006 00:14, wo_shi_big_stomach wrote:

The filter should be applied to walkfiles. Something like this:

dir = path(/home/wsbs/Maildir)
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test:
# print f

Thanks, this way f will print the full pathname/filename. But f already
does that using Jason Orendorff's path module:

dir = path('/home/wsbs/Maildir')
for f in dir.walkfiles('*'):
print f

The filter is used to exclude directories. fileinput can't handle
directories.

???

Both routines above produce identical output -- full path/filenames.
Neither prints just a directory name.

>
>At this point I believe the error has to do with fileinput, not the path
or os.path modules.

If I give fileinput.input() a hardcoded path/filename in place of 'f'
the program runs. However the program will not accept either f or 'f' as
an argument to fileinput.input().

Tried with (f,) ?
Notice that *this* error is not the same as your previous error.

File "p2.py", line 23, in ?
for line in fileinput.input(f,):
File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/fileinput.py",
line 231, in next
line = self.readline()
File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/fileinput.py",
line 320, in readline
self._file = open(self._filename, "r")

This looks similar to before -- fileinput.input() still isn't operating
on the input.

Again, I'm looking 1) walk through all files in a directory tree and 2)
using fileinput, evaluate and possibly edit the files.

The current version of the program is below.

thanks!

/wsbs

# start of program
import fileinput
import os
import re
import string
import sys
from path import path

# p2.py - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

# recurse dirs
dir = path('/home/wsbs/Maildir')
#for f in dir.walkfiles('*'):
for f in filter(os.path.isfile, dir.walkfiles('*')):
#
# test: this will print the full path/filename of each file
# print f
#
# open file, search, change if necessary, write backup
for line in fileinput.input(f,):
# just print 2nd and subsequent lines
if not fileinput.isfirstline():
print line.rstrip('\n')
# check first line only
elif fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
fileinput.close()

# end of program

Nov 25 '06 #7

wo_shi_big_stomach

Dennis Lee Bieber wrote:

On Sat, 25 Nov 2006 07:58:26 -0800, wo_shi_big_stomach
<wo****************@mac.comdeclaimed the following in
comp.lang.python:

> File "p2.py", line 23, in ?
for line in fileinput.input(f,):
File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/fileinput.py",
line 231, in next
line = self.readline()
File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/fileinput.py",
line 320, in readline
self._file = open(self._filename, "r")

This looks similar to before -- fileinput.input() still isn't operating
on the input.

And where is the actual exception message line -- the one with the
error code/description.

>dir = path('/home/wsbs/Maildir')
#for f in dir.walkfiles('*'):
for f in filter(os.path.isfile, dir.walkfiles('*')):

If I understand the documentation of fileinput, you shouldn't even
need this output loop; fileinput is designed to expect a list of files
(that it works with a single file seems an afterthought)

Yes, thanks. This is the key point.

Feeding fileinput.input() a list rather than a single file (or whatever
it's called in Python) got my program working. Thanks!

>
> for line in fileinput.input(f,):
for line in fileinput.input(filter(os.path.isfile,
dir.walkfiles("*")),
inplace=1):

should handle all the files...

Indeed it does -- too many times.

Sorry, but this (and the program you provided) iterate over the entire
list N times, where N is the number of files, rather than doing one
iteration on each file.

For instance, using your program with inplace editing and a ".bak" file
extension for the originals, I ended up with filenames like
name.bak.bak.bak.bak.bak in a directory with five files in it.

I don't have this third party path

module, so the directory tree walking isn't active, but...

The path module:

http://www.jorendorff.com/articles/python/path/

is a *lot* cleaner than os.path; see the examples at that URL.

Thanks for the great tip about fileinput.input(), and thanks to all who
answered my query. I've pasted the working code below.

/wsbs

import fileinput
import os
import re
import string
import sys
from path import path

# p2.py - fix broken SMTP headers in email files
#
# recurses from dir and searches all subdirs
# for each file, evaluates whether 1st line starts with "From "
# for each match, program deletes line

# recurse dirs
dir = path('/home/wsbs/Maildir')
g = dir.walkfiles('*')
for line in fileinput.input(g, inplace=1, backup='.bak'):
# just print 2nd and subsequent lines
if not fileinput.isfirstline():
print line.rstrip('\n')
# check first line only
elif fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')
fileinput.close()

Nov 26 '06 #8

John Machin

wo_shi_big_stomach wrote:

Thanks for the great tip about fileinput.input(), and thanks to all who
answered my query. I've pasted the working code below.

[snip]

# check first line only
elif fileinput.isfirstline():
if not re.search('^From ',line):

This "works", and in this case you are doing it on only the first line
in each file, but for future reference:

1. Read the re docs section about when to use search and when to use
match; the "^" anchor in your pattern means that search and match give
the same result here.

However the time they take to do it can differ quite a bit :-0

C:\junk>\python25\python -mtimeit -s"import re;text='x'*100"
"re.match('^From ',
text)"
100000 loops, best of 3: 4.39 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*1000"
"re.match('^From '
,text)"
100000 loops, best of 3: 4.41 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*10000"
"re.match('^From
',text)"
100000 loops, best of 3: 4.4 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*100"
"re.search('^From '
,text)"
100000 loops, best of 3: 6.54 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*1000"
"re.search('^From
',text)"
10000 loops, best of 3: 26 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*10000"
"re.search('^From
',text)"
1000 loops, best of 3: 219 usec per loop

Aside: I noticed this years ago but assumed that the simple
optimisation of search was not done as a penalty on people who didn't
RTFM, and so didn't report it :-)

2. Then realise that your test is equivalent to

if not line.startswith('^From '):

which is much easier to understand without the benefit of comments, and
(bonus!) is also much faster than re.match:

C:\junk>\python25\python -mtimeit -s"text='x'*100"
"text.startswith('^From ')"
1000000 loops, best of 3: 0.584 usec per loop

C:\junk>\python25\python -mtimeit -s"text='x'*1000"
"text.startswith('^From ')"
1000000 loops, best of 3: 0.583 usec per loop

C:\junk>\python25\python -mtimeit -s"text='x'*10000"
"text.startswith('^From ')"

1000000 loops, best of 3: 0.612 usec per loop

HTH,
John

Nov 26 '06 #9

John Machin

John Machin wrote:
[snip]

2. Then realise that your test is equivalent to

if not line.startswith('^From '):

Whoops!

That '^From ' (and all later ones) should have been 'From '

(the perils of over-hasty copy/paste)

The timings are, if anything, a tiny bit faster than before.

Cheers,
John

Nov 26 '06 #10

Gabriel Genellina

At Sunday 26/11/2006 01:29, wo_shi_big_stomach wrote:

>for line in fileinput.input(g, inplace=1, backup='.bak'):
# just print 2nd and subsequent lines
if not fileinput.isfirstline():
print line.rstrip('\n')
# check first line only
elif fileinput.isfirstline():
if not re.search('^From ',line):
print line.rstrip('\n')

Just a note: the elif is redundant, use a simple else clause.
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar

Nov 28 '06 #11

Similar topics

Supporting read() with fileinput?

by: Daniel Yoo | last post by:

Hi everyone, I'm was wondering: would it be a good idea to have the FileInput class support a read() method? I found myself running into a small problem while using xml.sax.parse in combination...

Python

Customizing the python search path depending on source directory

by: Peter Schwalm | last post by:

I'd like to modify the python search path depending on the source directory of the script being started. The reason is: I use a version control system, and the python scripts and modules are...

Python

Safest manner to extend search path for modules?

by: Joseph Turian | last post by:

Hi, What is the safest manner to extend search path for modules, minimizing the likelihood of shooting oneself in the foot? The system (which includes scripts and their shared modules) may be...

Python

fileinput.input('test.txt') => ERROR: input() already active

by: cyberco | last post by:

Using fileinput.input('test.txt') I probably forgot to process all lines or so, since I get the error 'input() already active' when i try to call fileinput.input('test.txt') again. But how can I...

Python

Does fileinput.input() read STDIN all at once?

by: Adam Funk | last post by:

I'm using this sort of standard thing: for line in fileinput.input(): do_stuff(line) and wondering whether it reads until it hits an EOF and then passes lines (one at a time) into the...

Python

use fileinput to read a specific line

by: jo3c | last post by:

hi everybody im a newbie in python i need to read line 4 from a header file using linecache will crash my computer due to memory loading, because i am working on 2000 files each is 8mb ...

Python

counting lines using fileinput module

by: Robert | last post by:

I would like to count lines in a file using the fileinput module and I am getting an unusual output. ------------------------------------------------------------------------------...

Python

sys.path and importing modules from other directories

by: Martin P. Hellwig | last post by:

Hello all, I had some troubles in the past how to arrange my packages and modules, because I usually don't develop my stuff in the Lib\site-packages directory I have some troubles when importing...

Python

modules path

by: Python | last post by:

Hi there, I moved a few modules into the modules folder (on OSX: /opt/local/lib/ python2.5/site-packages/). They don't show up though when I start IDLE... Is there a way to reload the modules...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice