recursive file editing

TaeKyon

I'm a python newbie; here are a few questions relative to a
problem I'm trying to solve; I'm wandering if python is the best
instrument or if awk or a mix of bash and sed would be better:

1) how would I get recursively descend
through all files in all subdirectories of
the one in which my script is called ?

2) each file examined should be edited; IF a string of this type is found

foo.asp=dev?bar (where bar can be a digit or an empty space)

it should always be substituted with this string

foo-bar.html (if bar is an empty space the new string is foo-.html)

3) the names of files read may themselves be of the sort foo.asp=dev?bar;
the edited output file should also be renamed according to the same rule
as above ... or would this be better handled by a bash script ?

Any hints appreciated

--
Michele Alzetta

Jul 18 '05 #1

Subscribe Post Reply

2695

Leif B. Kristensen

This is a Perl one-liner:

perl -p -i -e 's/foo/bar/gi' `find ./`

regards,
--
Leif Biberg Kristensen
http://solumslekt.org/
Validare necesse est

Jul 18 '05 #2

Josiah Carlson

> I'm a python newbie; here are a few questions relative to a

problem I'm trying to solve; I'm wandering if python is the best
instrument or if awk or a mix of bash and sed would be better:

1) how would I get recursively descend
through all files in all subdirectories of
the one in which my script is called ?
Check out os.path.walk.

2) each file examined should be edited; IF a string of this type is found

foo.asp=dev?bar (where bar can be a digit or an empty space)

it should always be substituted with this string

foo-bar.html (if bar is an empty space the new string is foo-.html)
Check out the re module.

3) the names of files read may themselves be of the sort foo.asp=dev?bar;
the edited output file should also be renamed according to the same rule
as above ... or would this be better handled by a bash script ?

Do it however you feel more comfortable, Python can do it.
- Josiah

Jul 18 '05 #3

TaeKyon

Il Sat, 03 Apr 2004 22:35:30 +0200, Leif B. Kristensen ha scritto:

This is a Perl one-liner:

perl -p -i -e 's/foo/bar/gi' `find ./`

Didn't work; however I realized I could just repeatedly run variations of

sed -i s/foo/bar/g *

and then rename the files with a bash script called renna.

I'm sure python could do it but I got stuck -

for file in os.walk('mydir'):
print file[2]

gives me the names of all files

but how do I open each with r+ mode ?

for thing in os.walk('mydir'):
file(thing,mode=r+)

is invalid syntax

--
Michele Alzetta

Jul 18 '05 #4

Josiah Carlson

> for thing in os.walk('mydir'):

file(thing,mode=r+)

for thing in os.walk('mydir'):
filehandle = file(thing, 'r+')
- Josiah

Jul 18 '05 #5

Peter Otten

TaeKyon wrote:

I'm a python newbie; here are a few questions relative to a
problem I'm trying to solve; I'm wandering if python is the best
instrument or if awk or a mix of bash and sed would be better:

1) how would I get recursively descend
through all files in all subdirectories of
the one in which my script is called ?

2) each file examined should be edited; IF a string of this type is found

foo.asp=dev?bar (where bar can be a digit or an empty space)

it should always be substituted with this string

foo-bar.html (if bar is an empty space the new string is foo-.html)

3) the names of files read may themselves be of the sort foo.asp=dev?bar;
the edited output file should also be renamed according to the same rule
as above ... or would this be better handled by a bash script ?

Any hints appreciated

The following code comes with no warranties. Be sure to backup valuable data
before trying it. You may need to edit the regular expressions. Call the
script with the directory you want to process.

Peter

import os, re, sys

class Path(object):
def __init__(self, folder, name):
self.folder = folder
self.name = name

def _get_path(self):
return os.path.join(self.folder, self.name)
path = property(_get_path)

def rename(self, newname):
if self.name != newname:
os.rename(self.path, os.path.join(self.folder, newname))
self.name = newname

def processContents(self, operation):
data = file(self.path).read()
newdata = operation(data)
if data != newdata:
file(self.path, "w").write(newdata)

def __str__(self):
return self.path

def files(rootfolder):
for folder, folders, files in os.walk(rootfolder):
for name in files:
yield Path(folder, name)

fileExpr = re.compile(r"^(.+?)\.asp\=dev\?(.*)$")
filePattern = r"\1-\2.html"

textExpr = re.compile(r"([/\'\"])(.+?)\.asp\=dev\?(.*?)([\'\"])")
textPattern = r"\1\2-\3.html\4"

if __name__ == "__main__":
for f in files(sys.argv[1]):
f.rename(fileExpr.sub(filePattern, f.name))
f.processContents(lambda s: textExpr.sub(textPattern, s))

Jul 18 '05 #6

TaeKyon

Il Sat, 03 Apr 2004 17:22:04 -0800, Josiah Carlson ha scritto:

for thing in os.walk('mydir'):
filehandle = file(thing, 'r+')

I'm such a newbe I can't get it to work. Here is an example:

in empty directory foo I touch a b c d;
suppose I want to write "This works !" in each of these files.

I run python

import os
for thing in os.walk('foo'): .... thingopen = file(thing,'r+')
.... thingopen.write("This works !")
.... thingopen.close()
....
Traceback (most recent call last):
File "<stdin>", line 2, in ?
TypeError: coercing to Unicode: need string or buffer, tuple found

And in fact:
for thing in os.walk('foo'):

.... print thing
....
('foo', [], ['a', 'b', 'c', 'd'])

which is a tuple, I suppose.

Selecting thing[2] doesn't help, because it now complains of it being a
list.

In the end I get this to work:

for filetuple in os.walk('foo'):
.... for filename in filetuple[2]:
.... fileopen = file(filename, 'r+')
fileopen.write("This works !")
fileopen.close()

which seems a bit of a clumsy way to do it.
And besides it only works if I run python from directory foo,
otherwise it tells me "no such file or directory".

--
Michele Alzetta

Jul 18 '05 #7

Peter Otten

TaeKyon wrote:

Il Sat, 03 Apr 2004 17:22:04 -0800, Josiah Carlson ha scritto:
for thing in os.walk('mydir'):
filehandle = file(thing, 'r+')

I'm such a newbe I can't get it to work. Here is an example:

in empty directory foo I touch a b c d;
suppose I want to write "This works !" in each of these files.

I run python
import os
for thing in os.walk('foo'): ... thingopen = file(thing,'r+')
... thingopen.write("This works !")
... thingopen.close()
...
Traceback (most recent call last):
File "<stdin>", line 2, in ?
TypeError: coercing to Unicode: need string or buffer, tuple found

And in fact:
for thing in os.walk('foo'):

... print thing
...
('foo', [], ['a', 'b', 'c', 'd'])

which is a tuple, I suppose.

Selecting thing[2] doesn't help, because it now complains of it being a
list.

In the end I get this to work:

for filetuple in os.walk('foo'):
... for filename in filetuple[2]:
... fileopen = file(filename, 'r+')
fileopen.write("This works !")
fileopen.close()

which seems a bit of a clumsy way to do it.
And besides it only works if I run python from directory foo,
otherwise it tells me "no such file or directory".

A minimal working example is:

import os
for path, folders, files in os.walk("/path/to/folder"):
for name in files:
filepath = os.path.join(path, name)
fileopen = file(filepath, 'r+')
fileopen.write("This works !")
fileopen.close()

You need to compose the filepath, and, yes, it's a bit clumsy.
I've written a little generator function to hide some of the clumsiness:

def files(folder):
for path, folders, files in os.walk(folder):
for name in files:
yield os.path.join(path, name)

With that the code is simplified to:

for filepath in files("/path/to/folder"):
fileopen = file(filepath, 'r+')
fileopen.write("This works !")
fileopen.close()

HTH,
Peter

Jul 18 '05 #8

Michael Geary

> for filetuple in os.walk('foo'):

... for filename in filetuple[2]:
... fileopen = file(filename, 'r+')
fileopen.write("This works !")
fileopen.close()

which seems a bit of a clumsy way to do it.
You have the right idea, although this would be a bit cleaner:

for root, dirs, files in os.walk( 'foo' ):
for name in files:
etc...

You might want to take a look at the documentation for os.walk. It explains
all this and has a couple of good code samples.
And besides it only works if I run python from directory foo,
otherwise it tells me "no such file or directory".

You mean if you run Python from the *parent* directory of foo, right?

'foo' is a relative path, not an absolute one, so it gets appended to the
current directory.

-Mike

Jul 18 '05 #9

TaeKyon

Il Mon, 05 Apr 2004 19:15:01 +0200, Peter Otten ha scritto:

You need to compose the filepath, and, yes, it's a bit clumsy.
I've written a little generator function to hide some of the clumsiness:

def files(folder):
for path, folders, files in os.walk(folder):
for name in files:
yield os.path.join(path, name)

With that the code is simplified to:

for filepath in files("/path/to/folder"):
fileopen = file(filepath, 'r+')
fileopen.write("This works !")
fileopen.close()

Great !

--
Michele Alzetta

Jul 18 '05 #10

Josiah Carlson

>>for thing in os.walk('mydir'):

filehandle = file(thing, 'r+')

I'm such a newbe I can't get it to work. Here is an example:

Nah, it's my fault. I thought you were having issues with the file open
mode needing to be a string. I forgot the format of os.walk iteration.

Good to hear that you now have something that works the way you want it.

- Josiah

Jul 18 '05 #11

TaeKyon

Il Sun, 04 Apr 2004 12:11:25 +0200, Peter Otten ha scritto:

The following code comes with no warranties. Be sure to backup valuable data
before trying it. You may need to edit the regular expressions. Call the
script with the directory you want to process.
Seems to work all right !
I have a question:
class Path(object): # multiple function definitions follow, amongst which:
def files(rootfolder):
for folder, folders, files in os.walk(rootfolder):
for name in files:
yield Path(folder, name)

So 'Path' is the name of a class and _contemporaneously_ the
result of one of the functions the class contains ?
Or are there really two separate 'Path' things which don't interfere
because each has its own namepace ?

I'm sorry for the repeated questions, maybe I should take this discussion
over to the tutor mailing list !

--
Michele Alzetta

Jul 18 '05 #12

Peter Otten

TaeKyon wrote:

Il Sun, 04 Apr 2004 12:11:25 +0200, Peter Otten ha scritto:
The following code comes with no warranties. Be sure to backup valuable
data before trying it. You may need to edit the regular expressions. Call
the script with the directory you want to process.
Seems to work all right !
I have a question:
class Path(object):

# multiple function definitions follow, amongst which:
def files(rootfolder):
for folder, folders, files in os.walk(rootfolder):
for name in files:
yield Path(folder, name)

So 'Path' is the name of a class and _contemporaneously_ the
result of one of the functions the class contains ?

No, the functions up to __str__() are indented one level. This means they
belong to the Path class, i. e. they are methods.
In contrast, files() is a standalone function - or more precisely a
generator. As a rule of thumb you can tell functions from methods by
looking at the first parameter - if it's called "self" it's a method.

As a side note, though it's not the case here it is possible for a class to
have methods that return new instances of the same class (or even the same
instance which is what a considerable fraction of python users wishes for
list.sort()).
For example:

class Path(object):
# ... as above
def child(self, name):
""" create a new Path instance denoting a
child of the current path """
return Path(self.path, name)
def __repr__(self):
""" added for better commandline experience :-) """
return "Path(%r)" % self.path

Now try it:

from processtree import Path
p = Path("/path/to", "folder")
p.child("file") Path('/path/to/folder/file') p.child("sub").child("subsub")

Path('/path/to/folder/sub/subsub')
Or are there really two separate 'Path' things which don't interfere
because each has its own namepace ?
No, every Path(folder, name) creates a new Path instance as defined above.
When you see Class(arg1, arg2, ..., argN), under the hood Python creates a
new instance of Class and calls the special __init__(self, arg1, ..., argN)
method with the instance as the first (called self by convention) and
arg1,..., argN as the following arguments.
I'm sorry for the repeated questions, maybe I should take this discussion
over to the tutor mailing list !

I suggest that you stick with with the simpler approach in my later post
until you have a firm grip of classes. For the task at hand the Path class
seems overkill, now I'm reconsidering it.

Peter

Jul 18 '05 #13

TaeKyon

Il Tue, 06 Apr 2004 15:08:37 +0200, Peter Otten ha scritto:

I suggest that you stick with with the simpler approach in my later post
until you have a firm grip of classes. For the task at hand the Path class
seems overkill, now I'm reconsidering it.

Here is a variation on the theme I came up with this afternoon:

#!/usr/bin/python
import os, sys, re, fileinput

try:
target_folder = (sys.argv[1])
original_pattern = (sys.argv[2])
result_pattern = (sys.argv[3])
except:
print "Substitutes a string with another in all files of a directory"
print " Use: ./MyScript.py directory string other_string"
sys.exit()
for folders, folder, filelist in os.walk(target_folder):
for filename in filelist:
file = os.path.join(folders,filename)
for line in fileinput.input(file,'inplace=1'):
line = re.sub(original_pattern,result_pattern,line)
print line
# Commented out because apparently useless, from the documentation I
# don't quite understand whether it ought to be here or not
# fileinput.close()

This works - almost.
1) It does substitute the pattern, however it seems to
add a newline for each newline present in the original file every time
it is run (so files get longer and longer), and I don't understand why.
2) The final fileinput.close() seems to be useless; the program works
without, and bug 1) isn't affected.

--
Michele Alzetta

Jul 18 '05 #14

Peter Otten

TaeKyon wrote:

Il Tue, 06 Apr 2004 15:08:37 +0200, Peter Otten ha scritto:
I suggest that you stick with with the simpler approach in my later post
until you have a firm grip of classes. For the task at hand the Path
class seems overkill, now I'm reconsidering it.
Here is a variation on the theme I came up with this afternoon:

#!/usr/bin/python
import os, sys, re, fileinput

try:
target_folder = (sys.argv[1])
original_pattern = (sys.argv[2])
result_pattern = (sys.argv[3])
except:
print "Substitutes a string with another in all files of a directory"
print " Use: ./MyScript.py directory string other_string"
sys.exit()
for folders, folder, filelist in os.walk(target_folder):
for filename in filelist:
file = os.path.join(folders,filename)

file as a variable name is not recommended, because it hides the builtin
file.
for line in fileinput.input(file,'inplace=1'):
That 'inplace=1' works as expected is sheer luck because any non-empty
string works as a True value - 'inplace=0' would have the same effect. Make
that

for line in fileinput.input(file, inplace=1)
line = re.sub(original_pattern,result_pattern,line)
print line
Add a trailing comma to the above line. Lines are always read including the
trailing newline, and the print statement adds another newline if it does
not end with a comma like so:
print line,
# Commented out because apparently useless, from the documentation I
# don't quite understand whether it ought to be here or not
# fileinput.close()
The file will eventually be closed anyway - if you omit the close() call
it's up to the python implementation to decide when that will happen.

This works - almost.
1) It does substitute the pattern, however it seems to
add a newline for each newline present in the original file every time
it is run (so files get longer and longer), and I don't understand why.
2) The final fileinput.close() seems to be useless; the program works
without, and bug 1) isn't affected.

I've never used fileinput, so I probably shouldn't comment on that, but the
first impression is that it does too much magic (like redirecting stdout,
and chaining multiple files) for my taste. If I read the documentation
correctly you could omit the intermediate loop like so (untested):

# using your variable names
for folders, folder, filelist in os.walk(target_folder):
os.chdir(folders)
for line in fileinput.input(filelist, inplace=1):
print re.sub(original_pattern,result_pattern,line),

Peter

Jul 18 '05 #15

TaeKyon

Il Wed, 07 Apr 2004 23:04:51 +0200, Peter Otten ha scritto:

# using your variable names
for folders, folder, filelist in os.walk(target_folder):
os.chdir(folders)
for line in fileinput.input(filelist, inplace=1):
print re.sub(original_pattern,result_pattern,line),

I thought it might not work on files contained in subdirectories
but it does. The comma solved the problem with newlines too.

If we leave out the exception checking that's 5 lines of code:

import os, sys, re, fileinput
targetf, original, result = (sys.argv[1]), (sys.argv[2]), (sys.argv[3])
for folders, folder, filelist in os.walk(targetf):
os.chdir(folders)
for line in fileinput.input(filelist,inplace=1):
print re.sub(original,result,line),

Compact and elegant !
Tomorrow I'll go about adding the change filename stuff.

--
Michele Alzetta

Jul 18 '05 #16

by: Magnus Lyck? | last post by:

Something really strange is happening to me (sometimes). I'm using Python 2.3.2 on NT 4.0 as well as win32all-157, adodbapi and db_row. During a recursive call to a method, it seems Python...

Python

Recursive copy

by: gusmeister | last post by:

Is there a Perl mod that has a recursive file copy function (similar to `cp -r` in Unix or `xcopy` in Windows)? File::Path does have a recursive file deletion function (rmtree) but no recursive...

Perl

Recursive SQL in a events 10046 trace file

by: Oxmard | last post by:

Armed with my new O'Reilly book Optimizing Oracle Performance I have been trying to get a better understanding of how Oracle works. The book makes the statement, " A database cal with dep=n + 1...

Oracle Database

recursive delete all files/folders

by: betterdie | last post by:

Dear guru I want to delete all file and folder recursivly under php code, can anyone give me commend for this. Thank very much

PHP

Recursive Directory Structure - System.IO

by: Elmo Watson | last post by:

Is there a way, with the System.IO class, to do a recursive list of a directory structure? For instance, in DirectoryInfo, you have GetDirectories and GetFiles .... In Directory, you have...

ASP.NET

XSL for recursive transformation

by: Indy | last post by:

Hi, I have a XHTML input file with custom tag which specifies html fragments to include For example: <html> .... <include frag1="frag1.html" frag2="frag2.html"> More html here </include>...

.NET Framework

Recursive lists

by: mizrandir | last post by:

Can someone tell me why python doesn't crash when I do the following: ] ] How does python handle this internally? Will it crash or use up lot's of memory in similar but more complicated cases?

Python

Recursive Search

by: AsheeG87 | last post by:

Hello Everyone! I have a linked list and am trying to include a recursive search. However, I am having trouble understanding how I would go about that. I don't quite understand a recursive...

C / C++

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

recursive file editing

Similar topics