Connecting Tech Pros Worldwide Forums | Help | Site Map

regex problem with re and fnmatch

Fabian Braennstroem
Guest
 
Posts: n/a
#1: Nov 20 '07
Hi,

I would like to use re to search for lines in a files with
the word "README_x.org", where x is any number.
E.g. the structure would look like this:
[[file:~/pfm_v99/README_1.org]]

I tried to use these kind of matchings:
# org_files='.*README\_1.org]]'
org_files='.*README\_*.org]]'
if re.match(org_files,line):

Unfortunately, it matches all entries with "README.org", but
not the wanted number!?

After some splitting and replacing I am able to check, if
the above file exists. If it does not, I start to search for
it using the 'walk' procedure:

for root, dirs, files in
os.walk("/home/fab/org"):
for name in dirs:
dirs=os.path.join(root, name) + '/'
for name in files:
files=os.path.join(root, name)
if fnmatch.fnmatch(str(files), "README*"):
print "File Found"
print str(files)
break

As soon as it finds the file, it should stop the searching
process; but there is the same matching problem like above.
Does anyone have any suggestions about the regex problem?
Greetings!
Fabian


John Machin
Guest
 
Posts: n/a
#2: Nov 20 '07

re: regex problem with re and fnmatch


On Nov 21, 8:05 am, Fabian Braennstroem <f.braennstr...@gmx.dewrote:
Quote:
Hi,
>
I would like to use re to search for lines in a files with
the word "README_x.org", where x is any number.
E.g. the structure would look like this:
[[file:~/pfm_v99/README_1.org]]
>
I tried to use these kind of matchings:
# org_files='.*README\_1.org]]'
org_files='.*README\_*.org]]'
if re.match(org_files,line):
First tip is to drop the leading '.*' and use search() instead of
match(). The second tip is to use raw strings always for your
patterns.
Quote:
>
Unfortunately, it matches all entries with "README.org", but
not the wanted number!?
\_* matches 0 or more occurrences of _ (the \ is redundant). You need
to specify one or more digits -- use \d+ or [0-9]+

The . in .org matches ANY character except a newline. You need to
escape it with a \.
Quote:
Quote:
Quote:
>>pat = r'README_\d+\.org'
>>re.search(pat, 'xxxxREADME.org')
>>re.search(pat, 'xxxxREADME_.org')
>>re.search(pat, 'xxxxREADME_1.org')
<_sre.SRE_Match object at 0x00B899C0>
Quote:
Quote:
Quote:
>>re.search(pat, 'xxxxREADME_9999.org')
<_sre.SRE_Match object at 0x00B899F8>
Quote:
Quote:
Quote:
>>re.search(pat, 'xxxxREADME_9999Zorg')
>>>
Quote:
>
After some splitting and replacing I am able to check, if
the above file exists. If it does not, I start to search for
it using the 'walk' procedure:
I presume that you mean something like: """.. check if the above file
exists in some directory. If it does not, I start to search for it
somewhere else ..."""
Quote:
>
for root, dirs, files in
os.walk("/home/fab/org"):
Quote:
for name in dirs:
dirs=os.path.join(root, name) + '/'
The above looks rather suspicious ...
for thing in container:
container = something_else
????
What are you trying to do?

Quote:
for name in files:
files=os.path.join(root, name)
and again ....
Quote:
if fnmatch.fnmatch(str(files), "README*"):
Why str(name) ?
Quote:
print "File Found"
print str(files)
break

fnmatch is not as capable as re; in particular it can't express "one
or more digits". To search a directory tree for the first file whose
name matches a pattern, you need something like this:
def find_one(top, pat):
for root, dirs, files in os.walk(top):
for fname in files:
if re.match(pat + '$', fname):
return os.path.join(root, fname)

Quote:
As soon as it finds the file,
"the" file or "a" file???

Ummm ... aren't you trying to locate a file whose EXACT name you found
in the first exercise??

def find_it(top, required):
for root, dirs, files in os.walk(top):
if required in files:
return os.path.join(root, required)

Quote:
it should stop the searching
process; but there is the same matching problem like above.
HTH,
John
Fabian Braennstroem
Guest
 
Posts: n/a
#3: Nov 21 '07

re: regex problem with re and fnmatch


Hi John,

John Machin schrieb am 11/20/2007 09:40 PM:
Quote:
On Nov 21, 8:05 am, Fabian Braennstroem <f.braennstr...@gmx.dewrote:
Quote:
>Hi,
>>
>I would like to use re to search for lines in a files with
>the word "README_x.org", where x is any number.
>E.g. the structure would look like this:
>[[file:~/pfm_v99/README_1.org]]
>>
>I tried to use these kind of matchings:
># org_files='.*README\_1.org]]'
> org_files='.*README\_*.org]]'
> if re.match(org_files,line):
>
First tip is to drop the leading '.*' and use search() instead of
match(). The second tip is to use raw strings always for your
patterns.
>
Quote:
>Unfortunately, it matches all entries with "README.org", but
>not the wanted number!?
>
\_* matches 0 or more occurrences of _ (the \ is redundant). You need
to specify one or more digits -- use \d+ or [0-9]+
>
The . in .org matches ANY character except a newline. You need to
escape it with a \.
>
Quote:
Quote:
>>>pat = r'README_\d+\.org'
>>>re.search(pat, 'xxxxREADME.org')
>>>re.search(pat, 'xxxxREADME_.org')
>>>re.search(pat, 'xxxxREADME_1.org')
<_sre.SRE_Match object at 0x00B899C0>
Quote:
Quote:
>>>re.search(pat, 'xxxxREADME_9999.org')
<_sre.SRE_Match object at 0x00B899F8>
Quote:
Quote:
>>>re.search(pat, 'xxxxREADME_9999Zorg')
>>>>
Thanks a lot, works really nice!
Quote:
Quote:
>After some splitting and replacing I am able to check, if
>the above file exists. If it does not, I start to search for
>it using the 'walk' procedure:
>
I presume that you mean something like: """.. check if the above file
exists in some directory. If it does not, I start to search for it
somewhere else ..."""
>
Quote:
> for root, dirs, files in
>os.walk("/home/fab/org"):
>
Quote:
> for name in dirs:
> dirs=os.path.join(root, name) + '/'
>
The above looks rather suspicious ...
for thing in container:
container = something_else
????
What are you trying to do?
>
>
Quote:
> for name in files:
> files=os.path.join(root, name)
>
and again ....
>
Quote:
> if fnmatch.fnmatch(str(files), "README*"):
>
Why str(name) ?
>
Quote:
> print "File Found"
> print str(files)
> break
>
>
fnmatch is not as capable as re; in particular it can't express "one
or more digits". To search a directory tree for the first file whose
name matches a pattern, you need something like this:
def find_one(top, pat):
for root, dirs, files in os.walk(top):
for fname in files:
if re.match(pat + '$', fname):
return os.path.join(root, fname)
>
>
Quote:
>As soon as it finds the file,
>
"the" file or "a" file???
>
Ummm ... aren't you trying to locate a file whose EXACT name you found
in the first exercise??
>
def find_it(top, required):
for root, dirs, files in os.walk(top):
if required in files:
return os.path.join(root, required)
Great :-) Thanks a lot for your help... it can be so easy :-)
Fabian


Closed Thread