file_name_fixer.py

ewaguespack

i put this together to fix a bunch of files with wierd names, please
gimme feedback, i am a newbie
#!/usr/bin/env python
import os
import sys
import string
import platform
dir = sys.argv[1]
noworky = sys.argv[2]
if platform.system() == 'Linux':
uglychars = ''.join( set(string.punctuation+' ') - set('/_.') )
else:
if platform.system() == 'Windows':#this is broken because windows
is gay with case
uglychars = ''.join( set(string.punctuation+' ') -
set(':\\/_.') )
else:
print "wtf... what platform is this anyway?"
underscore = '_'
underscore = underscore * len(uglychars)
chars = string.maketrans(uglychars, underscore)
print "# PHASE I, DIRECTORIES"
for path, subdirs, files in os.walk(dir, topdown=True):
oldname = path
newname = oldname.translate(chars)
newname = string.lower(newname)
while string.count(newname, "__") > 0:
newname = string.replace(newname,"__","_")
while string.count(newname, "..") > 0:
newname = string.replace(newname,"..",".")
if oldname != newname:
if os.path.isfile(newname) or os.path.isdir(newname):
print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
exists\n"
else:
print oldname, "-->\n", newname, "\t\t\tYAY: file
renamed\n"
if noworky == "doit":
os.renames(oldname, newname)
print "# PHASE II, FILES"
for path, subdirs, files in os.walk(dir, topdown=True):
for oldname in files:
oldname = os.path.join(path, oldname)
newname = oldname.translate(chars)
newname = string.lower(newname)
newname = string.replace(newname,".mpeg",".mpg")
newname = string.replace(newname,".ram",".rm")
newname = string.replace(newname,".jpeg",".jpg")
newname = string.replace(newname,".qt",".mov")
while string.count(newname, "__") > 0:
newname = string.replace(newname,"__","_")
while string.count(newname, "..") > 0:
newname = string.replace(newname,"..",".")
newname = string.replace(newname,"._","_")
newname = string.replace(newname,"_.",".")
if oldname != newname:
if os.path.isfile(newname) or os.path.isdir(newname):
print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
exists\n"
else:
print oldname, "-->\n", newname, "\t\t\tYAY: file
renamed\n"
if noworky == "doit":
os.renames(oldname, newname)

Jan 24 '06 #1

Subscribe Post Reply

1342

bruno at modulix

ew*********@gmail.com wrote:

i put this together to fix a bunch of files with wierd names, please
gimme feedback, i am a newbie
#!/usr/bin/env python
import os
import sys
import string
import platform dir = sys.argv[1] attention, ça masque la fonction dir(). Utilise de préférence un autre
identifiant
noworky = sys.argv[2]
Si l'utilsateur ne passe pas deux arguments, le programme va planter
(IndexError), sans que l'utilisateur ne sache pourquoi.

if platform.system() == 'Linux':
uglychars = ''.join( set(string.punctuation+' ') - set('/_.') )
else:
if platform.system() == 'Windows':#this is broken because windows
is gay with case
utilise if/elif/else, ce sera plus simple. Ou un dictionnaire
nom_de_plateforme -> uglychars.
uglychars = ''.join( set(string.punctuation+' ') -
set(':\\/_.') )
else:
print "wtf... what platform is this anyway?"
MacOS classic, MacOSX, xxxBSD, n'importe quel Unix, ou n'importe quelle
autre plateforme supportant Python (et il y en a un paquet).
underscore = '_'
underscore = underscore * len(uglychars)
chars = string.maketrans(uglychars, underscore)
print "# PHASE I, DIRECTORIES"
for path, subdirs, files in os.walk(dir, topdown=True):
oldname = path
newname = oldname.translate(chars)
newname = string.lower(newname)
Utilise plutôt les méthodes de l'objet str, et n'hésite pas à chainer
les appels de méthodes:
newname = oldname.translate(chars).lower()
while string.count(newname, "__") > 0:
newname = string.replace(newname,"__","_")
Pas besoin de compter le nombre d'occurrences de '__', il suffit qu'il y
en ait au moins une:
while '__' in newname:
newname = newname.replace('__', '_')

Et tant qu'à faire, pour ce genre de traitement, une regexp serait
peut-être plus adaptée (tester quand même point de vue perfs).
while string.count(newname, "..") > 0:
newname = string.replace(newname,"..",".")
<*n*x>
Attention aussi aux fichiers spéciaux '.' et '..'
</*n*x>
if oldname != newname:
if os.path.isfile(newname) or os.path.isdir(newname): if os.path.exists(newname)
print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
exists\n"
La sortie standard est pour les sorties "normales" du programme. Pour
les messages destinés à l'utilisateur, utiliser la sortie d'erreur
(sys.stderr)

Prévoir également un flag -q / --quiet (et/ou son contraire un flag -v /
--verbose) pour activer/désactiver ces messages.
else:
print oldname, "-->\n", newname, "\t\t\tYAY: file
renamed\n"
if noworky == "doit":
os.renames(oldname, newname)
il serait peut-être bon de préciser les options possibles, non?
BTW, la coutume pour un flag signalant qu'on veut juste une simulation
de l'exécution est '--dry-run'
print "# PHASE II, FILES"
Y a t'il une raison de faire le traitement en deux temps ?
for path, subdirs, files in os.walk(dir, topdown=True):
for oldname in files:
oldname = os.path.join(path, oldname)
newname = oldname.translate(chars)
newname = string.lower(newname)
pour accélérer la résolution des noms, il vaut mieux créer un alias
local avant la boucle :

path_join = os.path.join
for path, subdirs, files in os.walk(dir, topdown=True):
for oldname in files:
oldname = path_join(path, oldname)

Tu peux aussi chainer les appels:
oldname = path_join(path, oldname).translate(chars).lower()

Attention aussi, tu travailles maintenant sur le chemin complet (ie:
/path/to/myfile.ext). Tu devrais peut-être d'abord traiter le nom de
fichier, et ne rajouter le chemin qu'à la fin pour le rename().
newname = string.replace(newname,".mpeg",".mpg")
newname = string.replace(newname,".ram",".rm")
newname = string.replace(newname,".jpeg",".jpg")
newname = string.replace(newname,".qt",".mov")
# TODO : rendre ça paramétrable
ext_map = {'mpeg' : 'mpg',
'ram' : 'rm',
'jpeg' : 'jpg',
'qt' : 'mov',
# etc
}
name, ext = os.path.splitext(newname)
if ext in ext_map:
newname = "%s.%s" % (name, ext_map[ext])

while string.count(newname, "__") > 0:
newname = string.replace(newname,"__","_")
while string.count(newname, "..") > 0:
newname = string.replace(newname,"..",".")
Tu n'a pas l'impression de répéter du code ?-)
newname = string.replace(newname,"._","_")
newname = string.replace(newname,"_.",".")
if oldname != newname:
if os.path.isfile(newname) or os.path.isdir(newname):
print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
exists\n"
else:
print oldname, "-->\n", newname, "\t\t\tYAY: file
renamed\n"
if noworky == "doit":
os.renames(oldname, newname)

Idem...
Tel que, ce script n'est ni réutilisable, ni même maintenable
(duplication de code...). Il faut factoriser tout le code dupliqué dans
des fonctions 'utilitaires', mettre le code 'principal' dans une
fonction aussi, et ajouter une fonction 'main' qui va gérer les options
et appeler les autres fonctions.

Il faut aussi normaliser les options et documenter le programme
(fonctions usage() et help()). Le module optparse peut être une bonne
solution (gère les options et génère les messages d'aide).

Tant qu'on est dans les options, la norme est plutôt de mettre les flags
d'abord, et les arguments (ici, le/les chemin(s) à visiter) après. Dans
ton cas, cela permettrait de passer plusieurs chemins à traiter.
Egalement, il pourrait être intéressant de prévoir des options
permettant de paramétrer les "uglychars" et les extensions à remplacer,
soit directement sur la ligne de commande :
monprog --uglychars="._?!%*" --extensions="jpeg:jpg,qt:mov"
soit via un fichier de config
monprog --init=myinit.ini

avec par exemple:
# myinit.ini
uglychars="./!*$"
extensions="jpeg:jpg,qt:mov"

Tu peux aussi prévoir des règles précisant des chemins à ne pas toucher
(sous-répertoire entier, noms de fichiers correspondants à un motif
etc), s'il faut descendre dans les sous-répertoires ou non etc...

Ca va, j'ai pas tapé trop fort ?-)

HTH
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"

Jan 24 '06 #2

bruno at modulix

ew*********@gmail.com wrote:

i put this together to fix a bunch of files with wierd names, please
gimme feedback, i am a newbie
Ok, so let's go... Hope you won't hate me too much !-)

#!/usr/bin/env python
import os
import sys
import string
import platform
dir = sys.argv[1] This will shadow the builtin 'dir' function.
noworky = sys.argv[2]
If the user fails to provide args, the program will crash with an
IndexError - which may not be very helpful.

Also, a better scheme would be
myprog [-opt1 [-opt2=val [-optN]]] arg1 arg2 argN

Hint: the optparse module is your friend
http://www.python.org/doc/2.4.2/lib/...-optparse.html
if platform.system() == 'Linux':
uglychars = ''.join( set(string.punctuation+' ') - set('/_.') )
else:
if platform.system() == 'Windows':#this is broken because windows
is gay with case
uglychars = ''.join( set(string.punctuation+' ') -
set(':\\/_.') )
else:
print "wtf... what platform is this anyway?"
May be MacOS Classic, MacOS X or any *n*x or *BSD variant, or any other
platform supporting Python - are there are some...
underscore = '_'
underscore = underscore * len(uglychars)
You don't need the intermediate value:
underscores = '_' * len(uglychars)
chars = string.maketrans(uglychars, underscore) print "# PHASE I, DIRECTORIES"
Why not processing dirs and files in one pass ?
for path, subdirs, files in os.walk(dir, topdown=True):
Err... is the 'dir' argument supposed to be an absolute path or a
relative path ?

And why using the topdown option ?
oldname = path
woops ! this may be the absolute path. Are you sure you want to process
an absolute path ?

I think you'd better process files and dirs in one path, walking bottom
up (so you don't process any file twice).
newname = oldname.translate(chars)
newname = string.lower(newname)
Use the 'str' object methods instead of functions from the string module:

newname = newname.lower()

You can also chain method/function calls:
newname = oldname.translate(chars).lower()

while string.count(newname, "__") > 0:
in the context of a boolean expression, 0 evaluate to False, non-zero to
True. So you don't have to be so explicit:
while newname.count('__'):
newname = string.replace(newname,"__","_")
You don't need to actually *count* the occurrences of '__' - if there's
one, that's enough:
while '__' in newname:
# proceed

Also, a regexp may be more effective here.
while string.count(newname, "..") > 0:
newname = string.replace(newname,"..",".")
Don't forget the '..' and '.' special directories in unix filesystems...
if oldname != newname:
if os.path.isfile(newname) or os.path.isdir(newname):
And if there's a special file (device etc) ?
hint : os.path.exists()

print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
exists\n"
stdout is for 'normal' program outputs (ie: outputs that may be used as
inputs to another program). This kind of output should go to stderr:
print >> sys.stdout, "%s --> %s : \t\t\tERROR: "
"file/dir exists" % (oldname,
newname,)
else:
print oldname, "-->\n", newname, "\t\t\tYAY: file
renamed\n"
if noworky == "doit":
os.renames(oldname, newname)
How is the user supposed to know that he has to pass the string "doit"
as a second arg ?

There are some (more or less agreed upon) conventions about cli options.
Like '--dry-run' to express the fact that the program shouldn't actually
do more than simulate it's execution.
print "# PHASE II, FILES"
for path, subdirs, files in os.walk(dir, topdown=True):
for oldname in files:
oldname = os.path.join(path, oldname)
Are you *sure* you want to operate on the *whole* *absolute* path ?
(cf my thoughts about the "first phase" and the whole algorithm)
newname = oldname.translate(chars)
newname = string.lower(newname)
Aren't you repeating some code here ?
hint : all duplicated code should be factored out into a function.
newname = string.replace(newname,".mpeg",".mpg")
newname = string.replace(newname,".ram",".rm")
newname = string.replace(newname,".jpeg",".jpg")
newname = string.replace(newname,".qt",".mov")
# outside the loop, define a dict like:
ext_map = {'mpeg': 'mpg',
'ram' : 'rm',
'jpeg' : 'jpg',
# etc
}

# then in the loop:
base, ext = os.path.split(newname)
if ext in ext_map:
newname = "%s.%s" % (base, ext_map[ext])

while string.count(newname, "__") > 0:
newname = string.replace(newname,"__","_")
duplicated code...
while string.count(newname, "..") > 0:
newname = string.replace(newname,"..",".")
newname = string.replace(newname,"._","_")
newname = string.replace(newname,"_.",".")
all this is a perfect usecase for regexps...
if oldname != newname:
if os.path.isfile(newname) or os.path.isdir(newname):
print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
exists\n"
else:
print oldname, "-->\n", newname, "\t\t\tYAY: file
renamed\n"
if noworky == "doit":
os.renames(oldname, newname)

duplicated code.
Still alive ?-)

I thing the first step would be to correct your algorithm, then use some
more efficient or idiomatic constructs where possible (like using dicts
instead of repeated tests, sending messages to stderr etc).

Then here are some more hints:

- put the processing algorithm in a dedicated function, if possible one
that doesn't rely on any global variable,
- factor out any duplicated code into helper functions

- then add a 'main' function that take cares of options and call the
'processing' function if everything's ok. The main function should
return 0 if ok, non-zero if errors (usually 2 for invalid cli options or
args, 1 for other errors)

- and finally add this at the end of your module:

if __name__ == '__main__':
sys.exit(main(sys.argv))
This will allow your script to be used either as a program or as a module.
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"

Jan 24 '06 #3

mdelliot

ew*********@gmail.com wrote:

i put this together to fix a bunch of files with wierd names, please
gimme feedback, i am a newbie

See http://aspn.activestate.com/ASPN/Coo.../Recipe/442517

Jan 24 '06 #4

eww

thanks for the feedback!

I'll work on your suggestions.
bruno at modulix wrote:

ew*********@gmail.com wrote:
i put this together to fix a bunch of files with wierd names, please
gimme feedback, i am a newbie

Ok, so let's go... Hope you won't hate me too much !-)

#!/usr/bin/env python
import os
import sys
import string
import platform
dir = sys.argv[1]

This will shadow the builtin 'dir' function.
noworky = sys.argv[2]

If the user fails to provide args, the program will crash with an
IndexError - which may not be very helpful.

Also, a better scheme would be
myprog [-opt1 [-opt2=val [-optN]]] arg1 arg2 argN

Hint: the optparse module is your friend
http://www.python.org/doc/2.4.2/lib/...-optparse.html
if platform.system() == 'Linux':
uglychars = ''.join( set(string.punctuation+' ') - set('/_.') )
else:
if platform.system() == 'Windows':#this is broken because windows
is gay with case
uglychars = ''.join( set(string.punctuation+' ') -
set(':\\/_.') )
else:
print "wtf... what platform is this anyway?"

May be MacOS Classic, MacOS X or any *n*x or *BSD variant, or any other
platform supporting Python - are there are some...
underscore = '_'
underscore = underscore * len(uglychars)

You don't need the intermediate value:
underscores = '_' * len(uglychars)
chars = string.maketrans(uglychars, underscore)

print "# PHASE I, DIRECTORIES"

Why not processing dirs and files in one pass ?
for path, subdirs, files in os.walk(dir, topdown=True):

Err... is the 'dir' argument supposed to be an absolute path or a
relative path ?

And why using the topdown option ?
oldname = path

woops ! this may be the absolute path. Are you sure you want to process
an absolute path ?

I think you'd better process files and dirs in one path, walking bottom
up (so you don't process any file twice).
newname = oldname.translate(chars)
newname = string.lower(newname)

Use the 'str' object methods instead of functions from the string module:

newname = newname.lower()

You can also chain method/function calls:
newname = oldname.translate(chars).lower()

while string.count(newname, "__") > 0:

in the context of a boolean expression, 0 evaluate to False, non-zero to
True. So you don't have to be so explicit:
while newname.count('__'):
newname = string.replace(newname,"__","_")

You don't need to actually *count* the occurrences of '__' - if there's
one, that's enough:
while '__' in newname:
# proceed

Also, a regexp may be more effective here.
while string.count(newname, "..") > 0:
newname = string.replace(newname,"..",".")

Don't forget the '..' and '.' special directories in unix filesystems...
if oldname != newname:
if os.path.isfile(newname) or os.path.isdir(newname):

And if there's a special file (device etc) ?
hint : os.path.exists()

print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
exists\n"

stdout is for 'normal' program outputs (ie: outputs that may be used as
inputs to another program). This kind of output should go to stderr:
print >> sys.stdout, "%s --> %s : \t\t\tERROR: "
"file/dir exists" % (oldname,
newname,)
else:
print oldname, "-->\n", newname, "\t\t\tYAY: file
renamed\n"
if noworky == "doit":
os.renames(oldname, newname)

How is the user supposed to know that he has to pass the string "doit"
as a second arg ?

There are some (more or less agreed upon) conventions about cli options.
Like '--dry-run' to express the fact that the program shouldn't actually
do more than simulate it's execution.
print "# PHASE II, FILES"
for path, subdirs, files in os.walk(dir, topdown=True):
for oldname in files:
oldname = os.path.join(path, oldname)

Are you *sure* you want to operate on the *whole* *absolute* path ?
(cf my thoughts about the "first phase" and the whole algorithm)
newname = oldname.translate(chars)
newname = string.lower(newname)

Aren't you repeating some code here ?
hint : all duplicated code should be factored out into a function.
newname = string.replace(newname,".mpeg",".mpg")
newname = string.replace(newname,".ram",".rm")
newname = string.replace(newname,".jpeg",".jpg")
newname = string.replace(newname,".qt",".mov")

# outside the loop, define a dict like:
ext_map = {'mpeg': 'mpg',
'ram' : 'rm',
'jpeg' : 'jpg',
# etc
}

# then in the loop:
base, ext = os.path.split(newname)
if ext in ext_map:
newname = "%s.%s" % (base, ext_map[ext])

while string.count(newname, "__") > 0:
newname = string.replace(newname,"__","_")

duplicated code...
while string.count(newname, "..") > 0:
newname = string.replace(newname,"..",".")
newname = string.replace(newname,"._","_")
newname = string.replace(newname,"_.",".")

all this is a perfect usecase for regexps...
if oldname != newname:
if os.path.isfile(newname) or os.path.isdir(newname):
print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
exists\n"
else:
print oldname, "-->\n", newname, "\t\t\tYAY: file
renamed\n"
if noworky == "doit":
os.renames(oldname, newname)

duplicated code.
Still alive ?-)

I thing the first step would be to correct your algorithm, then use some
more efficient or idiomatic constructs where possible (like using dicts
instead of repeated tests, sending messages to stderr etc).

Then here are some more hints:

- put the processing algorithm in a dedicated function, if possible one
that doesn't rely on any global variable,
- factor out any duplicated code into helper functions

- then add a 'main' function that take cares of options and call the
'processing' function if everything's ok. The main function should
return 0 if ok, non-zero if errors (usually 2 for invalid cli options or
args, 1 for other errors)

- and finally add this at the end of your module:

if __name__ == '__main__':
sys.exit(main(sys.argv))
This will allow your script to be used either as a program or as a module.
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"

Jan 24 '06 #5

Bruno Desthuilliers

eww a écrit :
(top-post corrected)

bruno at modulix wrote:
ew*********@gmail.com wrote:
i put this together to fix a bunch of files with wierd names, please
gimme feedback, i am a newbie
Ok, so let's go... Hope you won't hate me too much !-)

(snip program and comments)
thanks for the feedback!
You're welcome. Feel free to ask for more details here if you're unsure
about some particular point of it.
I'll work on your suggestions.

<ot>
So while you're at it, please follow this one too: avoid posting back
the whole post you're answering to. Keep the relevant parts only and put
your answers where appropriate... Readability is important - and not
only in source code !-)
</ot>

--
bruno at modulix

Jan 24 '06 #6

Steven D'Aprano

ew*********@gmail.com wrote:

i put this together to fix a bunch of files with wierd names, please
gimme feedback, i am a newbie
Others have already made comments, here is some more
food for thought.

You should consider factoring out some repeated code
into functions. E.g.:

# warning: untested!!!
def replace_all(s, old, new):
"""Replaces all instances of substring old with
substring new."""
if old == new:
# make no changes
return s
elif old in new:
raise ValueError("old substring can't be "
"part of the replacement substring.")
while old in s:
s = s.replace(old, new)
return s

Now you can call it in your loop:
for path, subdirs, files in os.walk(dir, topdown=True):
oldname = path
newname = oldname.translate(chars)
newname = string.lower(newname) while string.count(newname, "__") > 0:
newname = string.replace(newname,"__","_")
while string.count(newname, "..") > 0:
newname = string.replace(newname,"..",".")
becomes:

newname = replace_all(newname, "__", "_")
newname = replace_all(newname, "..", ".")
if oldname != newname:
if os.path.isfile(newname) or os.path.isdir(newname):
print oldname, "-->\n", newname, "\t\t\tERROR: file/dir
exists\n"
else:
print oldname, "-->\n", newname, "\t\t\tYAY: file
renamed\n"
if noworky == "doit":
os.renames(oldname, newname)
print "# PHASE II, FILES"
for path, subdirs, files in os.walk(dir, topdown=True):
for oldname in files:
oldname = os.path.join(path, oldname)
newname = oldname.translate(chars)
newname = string.lower(newname)
More refactoring:
newname = string.replace(newname,".mpeg",".mpg")
newname = string.replace(newname,".ram",".rm")
newname = string.replace(newname,".jpeg",".jpg")
newname = string.replace(newname,".qt",".mov")
becomes:

# warning: untested!!!
def fix_extension(s, old, new):
# there are other, better ways of doing this
# see the os.path.splitext function
if s.endswith(old):
s = s[:-len(old)] + new
return s

def fix_all_extensions(s, extensions):
for old, new in extensions:
s = fix_extension(s, old, new)
return s
newname = fix_all_extensions(newname,
[ (".mpeg", ".mpg"), (".ram", ".rm"),
(".jpeg", ".jpg"), (".qt", ".mov") ]

while string.count(newname, "__") > 0:
newname = string.replace(newname,"__","_")
while string.count(newname, "..") > 0:
newname = string.replace(newname,"..",".")

We've already refactored those calls:

newname = replace_all(newname, "__", "_")
newname = replace_all(newname, "..", ".")

That will do for starters.
--
Steven.

Jan 25 '06 #7

Fredrik Lundh

Steven D'Aprano wrote:

You should consider factoring out some repeated code
into functions. E.g.:

# warning: untested!!!
def replace_all(s, old, new):
"""Replaces all instances of substring old with
substring new."""
if old == new:
# make no changes
return s
elif old in new:
raise ValueError("old substring can't be "
"part of the replacement substring.")
while old in s:
s = s.replace(old, new)
return s

Now you can call it in your loop:
for path, subdirs, files in os.walk(dir, topdown=True):
oldname = path
newname = oldname.translate(chars)
newname = string.lower(newname)

while string.count(newname, "__") > 0:
newname = string.replace(newname,"__","_")
while string.count(newname, "..") > 0:
newname = string.replace(newname,"..",".")

becomes:

newname = replace_all(newname, "__", "_")
newname = replace_all(newname, "..", ".")

or you can use a more well-suited function:

# replace runs of _ and . with a single character
newname = re.sub("_+", "_", newname)
newname = re.sub("\.+", ".", newname)

or, slightly more obscure:

newname = re.sub("([_.])\\1+", "\\1", newname)

</F>

Jan 25 '06 #8

Steven D'Aprano

Fredrik Lundh wrote:

or you can use a more well-suited function:

# replace runs of _ and . with a single character
newname = re.sub("_+", "_", newname)
newname = re.sub("\.+", ".", newname)
You know, I really must sit down and learn how to use
reg exes one of these days. But somehow, every time I
try, I get the feeling that the work required to learn
to use them effectively is infinitely greater than the
work required to re-invent the wheel every time.

*wink*
or, slightly more obscure:

newname = re.sub("([_.])\\1+", "\\1", newname)

_Slightly_?
--
Steven.

Jan 25 '06 #9

Fredrik Lundh

Steven D'Aprano wrote:

or you can use a more well-suited function:

# replace runs of _ and . with a single character
newname = re.sub("_+", "_", newname)
newname = re.sub("\.+", ".", newname)

You know, I really must sit down and learn how to use
reg exes one of these days. But somehow, every time I
try, I get the feeling that the work required to learn
to use them effectively is infinitely greater than the
work required to re-invent the wheel every time.

here's all you need to understand the code above:

. ^ $ * + ? ( ) [] { } | \ are reserved characters
all other characters match themselves
reserved characters must be escaped to match themselves;
to match a dot, use \. (which the RE engine sees as \.)
+ means match one or more of the preceeding item
so _+ matches one or more underscores, and \.+ matches
one or more dots
re.sub(pattern, replacement, text) replaces all matches for
the given pattern in text with the given replacement string

so re.sub("_+", "_", newname) replaces runs of underscores with
a single underscore.

or, slightly more obscure:

newname = re.sub("([_.])\\1+", "\\1", newname)

_Slightly_?

this introduces three new concepts:

[ ] defines a set of characters
so [_.] will match either _ or .
( ) defines a group of matched characters.
\\1 (which the RE engine sees as \1) refers to the first group
this can be used both in the pattern and in the replacement
string

so re.sub("([_.])\\1+", "\\1", newname) replaces runs consisting
of either a . or an _ followed by one or more copies of itself, with
a single instance of itself.

(using r-strings lets you remove some of extra backslashes, btw)

</F>

Jan 25 '06 #10

Richie Hindle

[Fredrik]

so re.sub("([_.])\\1+", "\\1", newname) replaces runs consisting
of either a . or an _ followed by one or more copies of itself, with
a single instance of itself.

....and this:

def isprime(n):
return n > 1 and not re.match(r'(xx+)\1+$', 'x'*n)

finds prime numbers.

I'll get me coat.

--
Richie Hindle
ri****@entrian.com

Jan 25 '06 #11

Thomas Heller

Richie Hindle <ri****@entrian.com> writes:

[Fredrik]
so re.sub("([_.])\\1+", "\\1", newname) replaces runs consisting
of either a . or an _ followed by one or more copies of itself, with
a single instance of itself.

...and this:
def isprime(n):
return n > 1 and not re.match(r'(xx+)\1+$', 'x'*n)

finds prime numbers.

So, who will post a 'Pior' (Python in one regex)?

Thomas

Jan 25 '06 #12

Steven D'Aprano

On Wed, 25 Jan 2006 10:37:12 +0000, Richie Hindle wrote:

[Fredrik]
so re.sub("([_.])\\1+", "\\1", newname) replaces runs consisting
of either a . or an _ followed by one or more copies of itself, with
a single instance of itself.

...and this:
def isprime(n):
return n > 1 and not re.match(r'(xx+)\1+$', 'x'*n)

finds prime numbers.

I'll get me coat.

See, now that's exactly the sort of thing that makes me wake screaming
in the night. That's just deeply, deeply wrong -- I don't know what's
worse, that it *works*, or that somebody thought of it.

:-)

Thank you to Fredrik for trying to teach me something, but it is 11pm on
the night before a public holiday (Australia Day), my house feels like an
oven, and in the last three hours I've suddenly started coming down with a
cold -- in the middle of our summer. So this is not the time for me to try
to learn anything new.
--
Steven.

Jan 25 '06 #13