By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,446 Members | 1,566 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,446 IT Pros & Developers. It's quick & easy.

recursive file editing

P: n/a
I'm a python newbie; here are a few questions relative to a
problem I'm trying to solve; I'm wandering if python is the best
instrument or if awk or a mix of bash and sed would be better:

1) how would I get recursively descend
through all files in all subdirectories of
the one in which my script is called ?

2) each file examined should be edited; IF a string of this type is found

foo.asp=dev?bar (where bar can be a digit or an empty space)

it should always be substituted with this string

foo-bar.html (if bar is an empty space the new string is foo-.html)

3) the names of files read may themselves be of the sort foo.asp=dev?bar;
the edited output file should also be renamed according to the same rule
as above ... or would this be better handled by a bash script ?

Any hints appreciated

--
Michele Alzetta

Jul 18 '05 #1
Share this Question
Share on Google+
15 Replies


P: n/a
This is a Perl one-liner:

perl -p -i -e 's/foo/bar/gi' `find ./`

regards,
--
Leif Biberg Kristensen
http://solumslekt.org/
Validare necesse est
Jul 18 '05 #2

P: n/a
> I'm a python newbie; here are a few questions relative to a
problem I'm trying to solve; I'm wandering if python is the best
instrument or if awk or a mix of bash and sed would be better:

1) how would I get recursively descend
through all files in all subdirectories of
the one in which my script is called ?
Check out os.path.walk.

2) each file examined should be edited; IF a string of this type is found

foo.asp=dev?bar (where bar can be a digit or an empty space)

it should always be substituted with this string

foo-bar.html (if bar is an empty space the new string is foo-.html)
Check out the re module.

3) the names of files read may themselves be of the sort foo.asp=dev?bar;
the edited output file should also be renamed according to the same rule
as above ... or would this be better handled by a bash script ?


Do it however you feel more comfortable, Python can do it.
- Josiah
Jul 18 '05 #3

P: n/a
Il Sat, 03 Apr 2004 22:35:30 +0200, Leif B. Kristensen ha scritto:
This is a Perl one-liner:

perl -p -i -e 's/foo/bar/gi' `find ./`


Didn't work; however I realized I could just repeatedly run variations of

sed -i s/foo/bar/g *

and then rename the files with a bash script called renna.

I'm sure python could do it but I got stuck -

for file in os.walk('mydir'):
print file[2]

gives me the names of all files

but how do I open each with r+ mode ?

for thing in os.walk('mydir'):
file(thing,mode=r+)

is invalid syntax

--
Michele Alzetta

Jul 18 '05 #4

P: n/a
> for thing in os.walk('mydir'):
file(thing,mode=r+)


for thing in os.walk('mydir'):
filehandle = file(thing, 'r+')
- Josiah
Jul 18 '05 #5

P: n/a
TaeKyon wrote:
I'm a python newbie; here are a few questions relative to a
problem I'm trying to solve; I'm wandering if python is the best
instrument or if awk or a mix of bash and sed would be better:

1) how would I get recursively descend
through all files in all subdirectories of
the one in which my script is called ?

2) each file examined should be edited; IF a string of this type is found

foo.asp=dev?bar (where bar can be a digit or an empty space)

it should always be substituted with this string

foo-bar.html (if bar is an empty space the new string is foo-.html)

3) the names of files read may themselves be of the sort foo.asp=dev?bar;
the edited output file should also be renamed according to the same rule
as above ... or would this be better handled by a bash script ?

Any hints appreciated


The following code comes with no warranties. Be sure to backup valuable data
before trying it. You may need to edit the regular expressions. Call the
script with the directory you want to process.

Peter

import os, re, sys

class Path(object):
def __init__(self, folder, name):
self.folder = folder
self.name = name

def _get_path(self):
return os.path.join(self.folder, self.name)
path = property(_get_path)

def rename(self, newname):
if self.name != newname:
os.rename(self.path, os.path.join(self.folder, newname))
self.name = newname

def processContents(self, operation):
data = file(self.path).read()
newdata = operation(data)
if data != newdata:
file(self.path, "w").write(newdata)

def __str__(self):
return self.path

def files(rootfolder):
for folder, folders, files in os.walk(rootfolder):
for name in files:
yield Path(folder, name)

fileExpr = re.compile(r"^(.+?)\.asp\=dev\?(.*)$")
filePattern = r"\1-\2.html"

textExpr = re.compile(r"([/\'\"])(.+?)\.asp\=dev\?(.*?)([\'\"])")
textPattern = r"\1\2-\3.html\4"

if __name__ == "__main__":
for f in files(sys.argv[1]):
f.rename(fileExpr.sub(filePattern, f.name))
f.processContents(lambda s: textExpr.sub(textPattern, s))

Jul 18 '05 #6

P: n/a
Il Sat, 03 Apr 2004 17:22:04 -0800, Josiah Carlson ha scritto:
for thing in os.walk('mydir'):
filehandle = file(thing, 'r+')


I'm such a newbe I can't get it to work. Here is an example:

in empty directory foo I touch a b c d;
suppose I want to write "This works !" in each of these files.

I run python
import os
for thing in os.walk('foo'): .... thingopen = file(thing,'r+')
.... thingopen.write("This works !")
.... thingopen.close()
....
Traceback (most recent call last):
File "<stdin>", line 2, in ?
TypeError: coercing to Unicode: need string or buffer, tuple found

And in fact:
for thing in os.walk('foo'):

.... print thing
....
('foo', [], ['a', 'b', 'c', 'd'])

which is a tuple, I suppose.

Selecting thing[2] doesn't help, because it now complains of it being a
list.

In the end I get this to work:

for filetuple in os.walk('foo'):
.... for filename in filetuple[2]:
.... fileopen = file(filename, 'r+')
fileopen.write("This works !")
fileopen.close()

which seems a bit of a clumsy way to do it.
And besides it only works if I run python from directory foo,
otherwise it tells me "no such file or directory".

--
Michele Alzetta

Jul 18 '05 #7

P: n/a
TaeKyon wrote:
Il Sat, 03 Apr 2004 17:22:04 -0800, Josiah Carlson ha scritto:
for thing in os.walk('mydir'):
filehandle = file(thing, 'r+')


I'm such a newbe I can't get it to work. Here is an example:

in empty directory foo I touch a b c d;
suppose I want to write "This works !" in each of these files.

I run python
import os
for thing in os.walk('foo'): ... thingopen = file(thing,'r+')
... thingopen.write("This works !")
... thingopen.close()
...
Traceback (most recent call last):
File "<stdin>", line 2, in ?
TypeError: coercing to Unicode: need string or buffer, tuple found

And in fact:
for thing in os.walk('foo'):

... print thing
...
('foo', [], ['a', 'b', 'c', 'd'])

which is a tuple, I suppose.

Selecting thing[2] doesn't help, because it now complains of it being a
list.

In the end I get this to work:

for filetuple in os.walk('foo'):
... for filename in filetuple[2]:
... fileopen = file(filename, 'r+')
fileopen.write("This works !")
fileopen.close()

which seems a bit of a clumsy way to do it.
And besides it only works if I run python from directory foo,
otherwise it tells me "no such file or directory".


A minimal working example is:

import os
for path, folders, files in os.walk("/path/to/folder"):
for name in files:
filepath = os.path.join(path, name)
fileopen = file(filepath, 'r+')
fileopen.write("This works !")
fileopen.close()

You need to compose the filepath, and, yes, it's a bit clumsy.
I've written a little generator function to hide some of the clumsiness:

def files(folder):
for path, folders, files in os.walk(folder):
for name in files:
yield os.path.join(path, name)

With that the code is simplified to:

for filepath in files("/path/to/folder"):
fileopen = file(filepath, 'r+')
fileopen.write("This works !")
fileopen.close()

HTH,
Peter
Jul 18 '05 #8

P: n/a
> for filetuple in os.walk('foo'):
... for filename in filetuple[2]:
... fileopen = file(filename, 'r+')
fileopen.write("This works !")
fileopen.close()

which seems a bit of a clumsy way to do it.
You have the right idea, although this would be a bit cleaner:

for root, dirs, files in os.walk( 'foo' ):
for name in files:
etc...

You might want to take a look at the documentation for os.walk. It explains
all this and has a couple of good code samples.
And besides it only works if I run python from directory foo,
otherwise it tells me "no such file or directory".


You mean if you run Python from the *parent* directory of foo, right?

'foo' is a relative path, not an absolute one, so it gets appended to the
current directory.

-Mike
Jul 18 '05 #9

P: n/a
Il Mon, 05 Apr 2004 19:15:01 +0200, Peter Otten ha scritto:
You need to compose the filepath, and, yes, it's a bit clumsy.
I've written a little generator function to hide some of the clumsiness:

def files(folder):
for path, folders, files in os.walk(folder):
for name in files:
yield os.path.join(path, name)

With that the code is simplified to:

for filepath in files("/path/to/folder"):
fileopen = file(filepath, 'r+')
fileopen.write("This works !")
fileopen.close()


Great !

--
Michele Alzetta

Jul 18 '05 #10

P: n/a
>>for thing in os.walk('mydir'):
filehandle = file(thing, 'r+')

I'm such a newbe I can't get it to work. Here is an example:


Nah, it's my fault. I thought you were having issues with the file open
mode needing to be a string. I forgot the format of os.walk iteration.

Good to hear that you now have something that works the way you want it.

- Josiah
Jul 18 '05 #11

P: n/a
Il Sun, 04 Apr 2004 12:11:25 +0200, Peter Otten ha scritto:
The following code comes with no warranties. Be sure to backup valuable data
before trying it. You may need to edit the regular expressions. Call the
script with the directory you want to process.
Seems to work all right !
I have a question:
class Path(object): # multiple function definitions follow, amongst which:
def files(rootfolder):
for folder, folders, files in os.walk(rootfolder):
for name in files:
yield Path(folder, name)


So 'Path' is the name of a class and _contemporaneously_ the
result of one of the functions the class contains ?
Or are there really two separate 'Path' things which don't interfere
because each has its own namepace ?

I'm sorry for the repeated questions, maybe I should take this discussion
over to the tutor mailing list !

--
Michele Alzetta

Jul 18 '05 #12

P: n/a
TaeKyon wrote:
Il Sun, 04 Apr 2004 12:11:25 +0200, Peter Otten ha scritto:
The following code comes with no warranties. Be sure to backup valuable
data before trying it. You may need to edit the regular expressions. Call
the script with the directory you want to process.
Seems to work all right !
I have a question:
class Path(object):

# multiple function definitions follow, amongst which:
def files(rootfolder):
for folder, folders, files in os.walk(rootfolder):
for name in files:
yield Path(folder, name)


So 'Path' is the name of a class and _contemporaneously_ the
result of one of the functions the class contains ?


No, the functions up to __str__() are indented one level. This means they
belong to the Path class, i. e. they are methods.
In contrast, files() is a standalone function - or more precisely a
generator. As a rule of thumb you can tell functions from methods by
looking at the first parameter - if it's called "self" it's a method.

As a side note, though it's not the case here it is possible for a class to
have methods that return new instances of the same class (or even the same
instance which is what a considerable fraction of python users wishes for
list.sort()).
For example:

class Path(object):
# ... as above
def child(self, name):
""" create a new Path instance denoting a
child of the current path """
return Path(self.path, name)
def __repr__(self):
""" added for better commandline experience :-) """
return "Path(%r)" % self.path

Now try it:
from processtree import Path
p = Path("/path/to", "folder")
p.child("file") Path('/path/to/folder/file') p.child("sub").child("subsub")

Path('/path/to/folder/sub/subsub')
Or are there really two separate 'Path' things which don't interfere
because each has its own namepace ?
No, every Path(folder, name) creates a new Path instance as defined above.
When you see Class(arg1, arg2, ..., argN), under the hood Python creates a
new instance of Class and calls the special __init__(self, arg1, ..., argN)
method with the instance as the first (called self by convention) and
arg1,..., argN as the following arguments.
I'm sorry for the repeated questions, maybe I should take this discussion
over to the tutor mailing list !


I suggest that you stick with with the simpler approach in my later post
until you have a firm grip of classes. For the task at hand the Path class
seems overkill, now I'm reconsidering it.

Peter
Jul 18 '05 #13

P: n/a
Il Tue, 06 Apr 2004 15:08:37 +0200, Peter Otten ha scritto:
I suggest that you stick with with the simpler approach in my later post
until you have a firm grip of classes. For the task at hand the Path class
seems overkill, now I'm reconsidering it.


Here is a variation on the theme I came up with this afternoon:

#!/usr/bin/python
import os, sys, re, fileinput

try:
target_folder = (sys.argv[1])
original_pattern = (sys.argv[2])
result_pattern = (sys.argv[3])
except:
print "Substitutes a string with another in all files of a directory"
print " Use: ./MyScript.py directory string other_string"
sys.exit()
for folders, folder, filelist in os.walk(target_folder):
for filename in filelist:
file = os.path.join(folders,filename)
for line in fileinput.input(file,'inplace=1'):
line = re.sub(original_pattern,result_pattern,line)
print line
# Commented out because apparently useless, from the documentation I
# don't quite understand whether it ought to be here or not
# fileinput.close()

This works - almost.
1) It does substitute the pattern, however it seems to
add a newline for each newline present in the original file every time
it is run (so files get longer and longer), and I don't understand why.
2) The final fileinput.close() seems to be useless; the program works
without, and bug 1) isn't affected.

--
Michele Alzetta

Jul 18 '05 #14

P: n/a
TaeKyon wrote:
Il Tue, 06 Apr 2004 15:08:37 +0200, Peter Otten ha scritto:
I suggest that you stick with with the simpler approach in my later post
until you have a firm grip of classes. For the task at hand the Path
class seems overkill, now I'm reconsidering it.
Here is a variation on the theme I came up with this afternoon:

#!/usr/bin/python
import os, sys, re, fileinput

try:
target_folder = (sys.argv[1])
original_pattern = (sys.argv[2])
result_pattern = (sys.argv[3])
except:
print "Substitutes a string with another in all files of a directory"
print " Use: ./MyScript.py directory string other_string"
sys.exit()
for folders, folder, filelist in os.walk(target_folder):
for filename in filelist:
file = os.path.join(folders,filename)


file as a variable name is not recommended, because it hides the builtin
file.
for line in fileinput.input(file,'inplace=1'):
That 'inplace=1' works as expected is sheer luck because any non-empty
string works as a True value - 'inplace=0' would have the same effect. Make
that

for line in fileinput.input(file, inplace=1)
line = re.sub(original_pattern,result_pattern,line)
print line
Add a trailing comma to the above line. Lines are always read including the
trailing newline, and the print statement adds another newline if it does
not end with a comma like so:
print line,
# Commented out because apparently useless, from the documentation I
# don't quite understand whether it ought to be here or not
# fileinput.close()
The file will eventually be closed anyway - if you omit the close() call
it's up to the python implementation to decide when that will happen.

This works - almost.
1) It does substitute the pattern, however it seems to
add a newline for each newline present in the original file every time
it is run (so files get longer and longer), and I don't understand why.
2) The final fileinput.close() seems to be useless; the program works
without, and bug 1) isn't affected.


I've never used fileinput, so I probably shouldn't comment on that, but the
first impression is that it does too much magic (like redirecting stdout,
and chaining multiple files) for my taste. If I read the documentation
correctly you could omit the intermediate loop like so (untested):

# using your variable names
for folders, folder, filelist in os.walk(target_folder):
os.chdir(folders)
for line in fileinput.input(filelist, inplace=1):
print re.sub(original_pattern,result_pattern,line),

Peter

Jul 18 '05 #15

P: n/a
Il Wed, 07 Apr 2004 23:04:51 +0200, Peter Otten ha scritto:
# using your variable names
for folders, folder, filelist in os.walk(target_folder):
os.chdir(folders)
for line in fileinput.input(filelist, inplace=1):
print re.sub(original_pattern,result_pattern,line),


I thought it might not work on files contained in subdirectories
but it does. The comma solved the problem with newlines too.

If we leave out the exception checking that's 5 lines of code:

import os, sys, re, fileinput
targetf, original, result = (sys.argv[1]), (sys.argv[2]), (sys.argv[3])
for folders, folder, filelist in os.walk(targetf):
os.chdir(folders)
for line in fileinput.input(filelist,inplace=1):
print re.sub(original,result,line),

Compact and elegant !
Tomorrow I'll go about adding the change filename stuff.

--
Michele Alzetta

Jul 18 '05 #16

This discussion thread is closed

Replies have been disabled for this discussion.