473,395 Members | 1,466 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

recursive file editing

I'm a python newbie; here are a few questions relative to a
problem I'm trying to solve; I'm wandering if python is the best
instrument or if awk or a mix of bash and sed would be better:

1) how would I get recursively descend
through all files in all subdirectories of
the one in which my script is called ?

2) each file examined should be edited; IF a string of this type is found

foo.asp=dev?bar (where bar can be a digit or an empty space)

it should always be substituted with this string

foo-bar.html (if bar is an empty space the new string is foo-.html)

3) the names of files read may themselves be of the sort foo.asp=dev?bar;
the edited output file should also be renamed according to the same rule
as above ... or would this be better handled by a bash script ?

Any hints appreciated

--
Michele Alzetta

Jul 18 '05 #1
15 2695
This is a Perl one-liner:

perl -p -i -e 's/foo/bar/gi' `find ./`

regards,
--
Leif Biberg Kristensen
http://solumslekt.org/
Validare necesse est
Jul 18 '05 #2
> I'm a python newbie; here are a few questions relative to a
problem I'm trying to solve; I'm wandering if python is the best
instrument or if awk or a mix of bash and sed would be better:

1) how would I get recursively descend
through all files in all subdirectories of
the one in which my script is called ?
Check out os.path.walk.

2) each file examined should be edited; IF a string of this type is found

foo.asp=dev?bar (where bar can be a digit or an empty space)

it should always be substituted with this string

foo-bar.html (if bar is an empty space the new string is foo-.html)
Check out the re module.

3) the names of files read may themselves be of the sort foo.asp=dev?bar;
the edited output file should also be renamed according to the same rule
as above ... or would this be better handled by a bash script ?


Do it however you feel more comfortable, Python can do it.
- Josiah
Jul 18 '05 #3
Il Sat, 03 Apr 2004 22:35:30 +0200, Leif B. Kristensen ha scritto:
This is a Perl one-liner:

perl -p -i -e 's/foo/bar/gi' `find ./`


Didn't work; however I realized I could just repeatedly run variations of

sed -i s/foo/bar/g *

and then rename the files with a bash script called renna.

I'm sure python could do it but I got stuck -

for file in os.walk('mydir'):
print file[2]

gives me the names of all files

but how do I open each with r+ mode ?

for thing in os.walk('mydir'):
file(thing,mode=r+)

is invalid syntax

--
Michele Alzetta

Jul 18 '05 #4
> for thing in os.walk('mydir'):
file(thing,mode=r+)


for thing in os.walk('mydir'):
filehandle = file(thing, 'r+')
- Josiah
Jul 18 '05 #5
TaeKyon wrote:
I'm a python newbie; here are a few questions relative to a
problem I'm trying to solve; I'm wandering if python is the best
instrument or if awk or a mix of bash and sed would be better:

1) how would I get recursively descend
through all files in all subdirectories of
the one in which my script is called ?

2) each file examined should be edited; IF a string of this type is found

foo.asp=dev?bar (where bar can be a digit or an empty space)

it should always be substituted with this string

foo-bar.html (if bar is an empty space the new string is foo-.html)

3) the names of files read may themselves be of the sort foo.asp=dev?bar;
the edited output file should also be renamed according to the same rule
as above ... or would this be better handled by a bash script ?

Any hints appreciated


The following code comes with no warranties. Be sure to backup valuable data
before trying it. You may need to edit the regular expressions. Call the
script with the directory you want to process.

Peter

import os, re, sys

class Path(object):
def __init__(self, folder, name):
self.folder = folder
self.name = name

def _get_path(self):
return os.path.join(self.folder, self.name)
path = property(_get_path)

def rename(self, newname):
if self.name != newname:
os.rename(self.path, os.path.join(self.folder, newname))
self.name = newname

def processContents(self, operation):
data = file(self.path).read()
newdata = operation(data)
if data != newdata:
file(self.path, "w").write(newdata)

def __str__(self):
return self.path

def files(rootfolder):
for folder, folders, files in os.walk(rootfolder):
for name in files:
yield Path(folder, name)

fileExpr = re.compile(r"^(.+?)\.asp\=dev\?(.*)$")
filePattern = r"\1-\2.html"

textExpr = re.compile(r"([/\'\"])(.+?)\.asp\=dev\?(.*?)([\'\"])")
textPattern = r"\1\2-\3.html\4"

if __name__ == "__main__":
for f in files(sys.argv[1]):
f.rename(fileExpr.sub(filePattern, f.name))
f.processContents(lambda s: textExpr.sub(textPattern, s))

Jul 18 '05 #6
Il Sat, 03 Apr 2004 17:22:04 -0800, Josiah Carlson ha scritto:
for thing in os.walk('mydir'):
filehandle = file(thing, 'r+')


I'm such a newbe I can't get it to work. Here is an example:

in empty directory foo I touch a b c d;
suppose I want to write "This works !" in each of these files.

I run python
import os
for thing in os.walk('foo'): .... thingopen = file(thing,'r+')
.... thingopen.write("This works !")
.... thingopen.close()
....
Traceback (most recent call last):
File "<stdin>", line 2, in ?
TypeError: coercing to Unicode: need string or buffer, tuple found

And in fact:
for thing in os.walk('foo'):

.... print thing
....
('foo', [], ['a', 'b', 'c', 'd'])

which is a tuple, I suppose.

Selecting thing[2] doesn't help, because it now complains of it being a
list.

In the end I get this to work:

for filetuple in os.walk('foo'):
.... for filename in filetuple[2]:
.... fileopen = file(filename, 'r+')
fileopen.write("This works !")
fileopen.close()

which seems a bit of a clumsy way to do it.
And besides it only works if I run python from directory foo,
otherwise it tells me "no such file or directory".

--
Michele Alzetta

Jul 18 '05 #7
TaeKyon wrote:
Il Sat, 03 Apr 2004 17:22:04 -0800, Josiah Carlson ha scritto:
for thing in os.walk('mydir'):
filehandle = file(thing, 'r+')


I'm such a newbe I can't get it to work. Here is an example:

in empty directory foo I touch a b c d;
suppose I want to write "This works !" in each of these files.

I run python
import os
for thing in os.walk('foo'): ... thingopen = file(thing,'r+')
... thingopen.write("This works !")
... thingopen.close()
...
Traceback (most recent call last):
File "<stdin>", line 2, in ?
TypeError: coercing to Unicode: need string or buffer, tuple found

And in fact:
for thing in os.walk('foo'):

... print thing
...
('foo', [], ['a', 'b', 'c', 'd'])

which is a tuple, I suppose.

Selecting thing[2] doesn't help, because it now complains of it being a
list.

In the end I get this to work:

for filetuple in os.walk('foo'):
... for filename in filetuple[2]:
... fileopen = file(filename, 'r+')
fileopen.write("This works !")
fileopen.close()

which seems a bit of a clumsy way to do it.
And besides it only works if I run python from directory foo,
otherwise it tells me "no such file or directory".


A minimal working example is:

import os
for path, folders, files in os.walk("/path/to/folder"):
for name in files:
filepath = os.path.join(path, name)
fileopen = file(filepath, 'r+')
fileopen.write("This works !")
fileopen.close()

You need to compose the filepath, and, yes, it's a bit clumsy.
I've written a little generator function to hide some of the clumsiness:

def files(folder):
for path, folders, files in os.walk(folder):
for name in files:
yield os.path.join(path, name)

With that the code is simplified to:

for filepath in files("/path/to/folder"):
fileopen = file(filepath, 'r+')
fileopen.write("This works !")
fileopen.close()

HTH,
Peter
Jul 18 '05 #8
> for filetuple in os.walk('foo'):
... for filename in filetuple[2]:
... fileopen = file(filename, 'r+')
fileopen.write("This works !")
fileopen.close()

which seems a bit of a clumsy way to do it.
You have the right idea, although this would be a bit cleaner:

for root, dirs, files in os.walk( 'foo' ):
for name in files:
etc...

You might want to take a look at the documentation for os.walk. It explains
all this and has a couple of good code samples.
And besides it only works if I run python from directory foo,
otherwise it tells me "no such file or directory".


You mean if you run Python from the *parent* directory of foo, right?

'foo' is a relative path, not an absolute one, so it gets appended to the
current directory.

-Mike
Jul 18 '05 #9
Il Mon, 05 Apr 2004 19:15:01 +0200, Peter Otten ha scritto:
You need to compose the filepath, and, yes, it's a bit clumsy.
I've written a little generator function to hide some of the clumsiness:

def files(folder):
for path, folders, files in os.walk(folder):
for name in files:
yield os.path.join(path, name)

With that the code is simplified to:

for filepath in files("/path/to/folder"):
fileopen = file(filepath, 'r+')
fileopen.write("This works !")
fileopen.close()


Great !

--
Michele Alzetta

Jul 18 '05 #10
>>for thing in os.walk('mydir'):
filehandle = file(thing, 'r+')

I'm such a newbe I can't get it to work. Here is an example:


Nah, it's my fault. I thought you were having issues with the file open
mode needing to be a string. I forgot the format of os.walk iteration.

Good to hear that you now have something that works the way you want it.

- Josiah
Jul 18 '05 #11
Il Sun, 04 Apr 2004 12:11:25 +0200, Peter Otten ha scritto:
The following code comes with no warranties. Be sure to backup valuable data
before trying it. You may need to edit the regular expressions. Call the
script with the directory you want to process.
Seems to work all right !
I have a question:
class Path(object): # multiple function definitions follow, amongst which:
def files(rootfolder):
for folder, folders, files in os.walk(rootfolder):
for name in files:
yield Path(folder, name)


So 'Path' is the name of a class and _contemporaneously_ the
result of one of the functions the class contains ?
Or are there really two separate 'Path' things which don't interfere
because each has its own namepace ?

I'm sorry for the repeated questions, maybe I should take this discussion
over to the tutor mailing list !

--
Michele Alzetta

Jul 18 '05 #12
TaeKyon wrote:
Il Sun, 04 Apr 2004 12:11:25 +0200, Peter Otten ha scritto:
The following code comes with no warranties. Be sure to backup valuable
data before trying it. You may need to edit the regular expressions. Call
the script with the directory you want to process.
Seems to work all right !
I have a question:
class Path(object):

# multiple function definitions follow, amongst which:
def files(rootfolder):
for folder, folders, files in os.walk(rootfolder):
for name in files:
yield Path(folder, name)


So 'Path' is the name of a class and _contemporaneously_ the
result of one of the functions the class contains ?


No, the functions up to __str__() are indented one level. This means they
belong to the Path class, i. e. they are methods.
In contrast, files() is a standalone function - or more precisely a
generator. As a rule of thumb you can tell functions from methods by
looking at the first parameter - if it's called "self" it's a method.

As a side note, though it's not the case here it is possible for a class to
have methods that return new instances of the same class (or even the same
instance which is what a considerable fraction of python users wishes for
list.sort()).
For example:

class Path(object):
# ... as above
def child(self, name):
""" create a new Path instance denoting a
child of the current path """
return Path(self.path, name)
def __repr__(self):
""" added for better commandline experience :-) """
return "Path(%r)" % self.path

Now try it:
from processtree import Path
p = Path("/path/to", "folder")
p.child("file") Path('/path/to/folder/file') p.child("sub").child("subsub")

Path('/path/to/folder/sub/subsub')
Or are there really two separate 'Path' things which don't interfere
because each has its own namepace ?
No, every Path(folder, name) creates a new Path instance as defined above.
When you see Class(arg1, arg2, ..., argN), under the hood Python creates a
new instance of Class and calls the special __init__(self, arg1, ..., argN)
method with the instance as the first (called self by convention) and
arg1,..., argN as the following arguments.
I'm sorry for the repeated questions, maybe I should take this discussion
over to the tutor mailing list !


I suggest that you stick with with the simpler approach in my later post
until you have a firm grip of classes. For the task at hand the Path class
seems overkill, now I'm reconsidering it.

Peter
Jul 18 '05 #13
Il Tue, 06 Apr 2004 15:08:37 +0200, Peter Otten ha scritto:
I suggest that you stick with with the simpler approach in my later post
until you have a firm grip of classes. For the task at hand the Path class
seems overkill, now I'm reconsidering it.


Here is a variation on the theme I came up with this afternoon:

#!/usr/bin/python
import os, sys, re, fileinput

try:
target_folder = (sys.argv[1])
original_pattern = (sys.argv[2])
result_pattern = (sys.argv[3])
except:
print "Substitutes a string with another in all files of a directory"
print " Use: ./MyScript.py directory string other_string"
sys.exit()
for folders, folder, filelist in os.walk(target_folder):
for filename in filelist:
file = os.path.join(folders,filename)
for line in fileinput.input(file,'inplace=1'):
line = re.sub(original_pattern,result_pattern,line)
print line
# Commented out because apparently useless, from the documentation I
# don't quite understand whether it ought to be here or not
# fileinput.close()

This works - almost.
1) It does substitute the pattern, however it seems to
add a newline for each newline present in the original file every time
it is run (so files get longer and longer), and I don't understand why.
2) The final fileinput.close() seems to be useless; the program works
without, and bug 1) isn't affected.

--
Michele Alzetta

Jul 18 '05 #14
TaeKyon wrote:
Il Tue, 06 Apr 2004 15:08:37 +0200, Peter Otten ha scritto:
I suggest that you stick with with the simpler approach in my later post
until you have a firm grip of classes. For the task at hand the Path
class seems overkill, now I'm reconsidering it.
Here is a variation on the theme I came up with this afternoon:

#!/usr/bin/python
import os, sys, re, fileinput

try:
target_folder = (sys.argv[1])
original_pattern = (sys.argv[2])
result_pattern = (sys.argv[3])
except:
print "Substitutes a string with another in all files of a directory"
print " Use: ./MyScript.py directory string other_string"
sys.exit()
for folders, folder, filelist in os.walk(target_folder):
for filename in filelist:
file = os.path.join(folders,filename)


file as a variable name is not recommended, because it hides the builtin
file.
for line in fileinput.input(file,'inplace=1'):
That 'inplace=1' works as expected is sheer luck because any non-empty
string works as a True value - 'inplace=0' would have the same effect. Make
that

for line in fileinput.input(file, inplace=1)
line = re.sub(original_pattern,result_pattern,line)
print line
Add a trailing comma to the above line. Lines are always read including the
trailing newline, and the print statement adds another newline if it does
not end with a comma like so:
print line,
# Commented out because apparently useless, from the documentation I
# don't quite understand whether it ought to be here or not
# fileinput.close()
The file will eventually be closed anyway - if you omit the close() call
it's up to the python implementation to decide when that will happen.

This works - almost.
1) It does substitute the pattern, however it seems to
add a newline for each newline present in the original file every time
it is run (so files get longer and longer), and I don't understand why.
2) The final fileinput.close() seems to be useless; the program works
without, and bug 1) isn't affected.


I've never used fileinput, so I probably shouldn't comment on that, but the
first impression is that it does too much magic (like redirecting stdout,
and chaining multiple files) for my taste. If I read the documentation
correctly you could omit the intermediate loop like so (untested):

# using your variable names
for folders, folder, filelist in os.walk(target_folder):
os.chdir(folders)
for line in fileinput.input(filelist, inplace=1):
print re.sub(original_pattern,result_pattern,line),

Peter

Jul 18 '05 #15
Il Wed, 07 Apr 2004 23:04:51 +0200, Peter Otten ha scritto:
# using your variable names
for folders, folder, filelist in os.walk(target_folder):
os.chdir(folders)
for line in fileinput.input(filelist, inplace=1):
print re.sub(original_pattern,result_pattern,line),


I thought it might not work on files contained in subdirectories
but it does. The comma solved the problem with newlines too.

If we leave out the exception checking that's 5 lines of code:

import os, sys, re, fileinput
targetf, original, result = (sys.argv[1]), (sys.argv[2]), (sys.argv[3])
for folders, folder, filelist in os.walk(targetf):
os.chdir(folders)
for line in fileinput.input(filelist,inplace=1):
print re.sub(original,result,line),

Compact and elegant !
Tomorrow I'll go about adding the change filename stuff.

--
Michele Alzetta

Jul 18 '05 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Magnus Lyck? | last post by:
Something really strange is happening to me (sometimes). I'm using Python 2.3.2 on NT 4.0 as well as win32all-157, adodbapi and db_row. During a recursive call to a method, it seems Python...
1
by: gusmeister | last post by:
Is there a Perl mod that has a recursive file copy function (similar to `cp -r` in Unix or `xcopy` in Windows)? File::Path does have a recursive file deletion function (rmtree) but no recursive...
2
by: Oxmard | last post by:
Armed with my new O'Reilly book Optimizing Oracle Performance I have been trying to get a better understanding of how Oracle works. The book makes the statement, " A database cal with dep=n + 1...
5
by: betterdie | last post by:
Dear guru I want to delete all file and folder recursivly under php code, can anyone give me commend for this. Thank very much
4
by: Elmo Watson | last post by:
Is there a way, with the System.IO class, to do a recursive list of a directory structure? For instance, in DirectoryInfo, you have GetDirectories and GetFiles .... In Directory, you have...
16
by: Indy | last post by:
Hi, I have a XHTML input file with custom tag which specifies html fragments to include For example: <html> .... <include frag1="frag1.html" frag2="frag2.html"> More html here </include>...
3
by: mizrandir | last post by:
Can someone tell me why python doesn't crash when I do the following: ] ] How does python handle this internally? Will it crash or use up lot's of memory in similar but more complicated cases?
10
by: AsheeG87 | last post by:
Hello Everyone! I have a linked list and am trying to include a recursive search. However, I am having trouble understanding how I would go about that. I don't quite understand a recursive...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.