By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,551 Members | 1,127 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,551 IT Pros & Developers. It's quick & easy.

Multiple regex match idiom

P: n/a
I often have the need to match multiple regexes against a single
string, typically a line of input, like this:

if (matchobj = re1.match(line)):
... re1 matched; do something with matchobj ...
elif (matchobj = re2.match(line)):
... re2 matched; do something with matchobj ...
elif (matchobj = re3.match(line)):
.....

Of course, that doesn't work as written because Python's assignments
are statements rather than expressions. The obvious rewrite results
in deeply nested if's:

matchobj = re1.match(line)
if matchobj:
... re1 matched; do something with matchobj ...
else:
matchobj = re2.match(line)
if matchobj:
... re2 matched; do something with matchobj ...
else:
matchobj = re3.match(line)
if matchobj:
...

Normally I have nothing against nested ifs, but in this case the deep
nesting unnecessarily complicates the code without providing
additional value -- the logic is still exactly equivalent to the
if/elif/elif/... shown above.

There are ways to work around the problem, for example by writing a
utility predicate that passes the match object as a side effect, but
that feels somewhat non-standard. I'd like to know if there is a
Python idiom that I'm missing. What would be the Pythonic way to
write the above code?
May 9 '07 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Hrvoje Niksic wrote:
I often have the need to match multiple regexes against a single
string, typically a line of input, like this:

if (matchobj = re1.match(line)):
... re1 matched; do something with matchobj ...
elif (matchobj = re2.match(line)):
... re2 matched; do something with matchobj ...
elif (matchobj = re3.match(line)):
....
[snip]
>
There are ways to work around the problem, for example by writing a
utility predicate that passes the match object as a side effect, but
that feels somewhat non-standard. I'd like to know if there is a
Python idiom that I'm missing. What would be the Pythonic way to
write the above code?
Only just learning Python, but to me this seems better.
Completely untested.

re_list = [ re1, re2, re3, ... ]
for re in re_list:
matchob = re.match(line)
if matchob:
....
break

Of course this only works it the "do something" is the same
for all matches. If not, maybe a function for each case,
something like

re1 = re.compile(....)
def fn1( s, m ):
....
re2 = ....
def fn2( s, m ):
....

re_list = [ (re1, fn1), (re2, fn2), ... ]

for (r,f) in re_list:
matchob = r.match(line)
if matchob:
f( line, matchob )
break
f(line,m)

Probably better ways than this exist.
Charles
May 9 '07 #2

P: n/a
On May 9, 5:00 am, Hrvoje Niksic <hnik...@xemacs.orgwrote:
I often have the need to match multiple regexes against a single
string, typically a line of input, like this:

if (matchobj = re1.match(line)):
... re1 matched; do something with matchobj ...
elif (matchobj = re2.match(line)):
... re2 matched; do something with matchobj ...
elif (matchobj = re3.match(line)):
....

Of course, that doesn't work as written because Python's assignments
are statements rather than expressions. The obvious rewrite results
in deeply nested if's:

matchobj = re1.match(line)
if matchobj:
... re1 matched; do something with matchobj ...
else:
matchobj = re2.match(line)
if matchobj:
... re2 matched; do something with matchobj ...
else:
matchobj = re3.match(line)
if matchobj:
...

Normally I have nothing against nested ifs, but in this case the deep
nesting unnecessarily complicates the code without providing
additional value -- the logic is still exactly equivalent to the
if/elif/elif/... shown above.

There are ways to work around the problem, for example by writing a
utility predicate that passes the match object as a side effect, but
that feels somewhat non-standard. I'd like to know if there is a
Python idiom that I'm missing. What would be the Pythonic way to
write the above code?
Hrvoje,

To make it more elegant I would do this:

1. Put all the ...do somethings... in functions like
re1_do_something(), re2_do_something(),...

2. Create a list of pairs of (re,func) in other words:
dispatch=[ (re1, re1_do_something), (re2, re2_do_something), ... ]

3. Then do:
for regex,func in dispatch:
if regex.match(line):
func(...)
Hope this helps,
-Nick Vatamaniuc

May 9 '07 #3

P: n/a
On 9 Mai, 11:00, Hrvoje Niksic <hnik...@xemacs.orgwrote:
I often have the need to match multiple regexes against a single
string, typically a line of input, like this:

if (matchobj = re1.match(line)):
... re1 matched; do something with matchobj ...
elif (matchobj = re2.match(line)):
... re2 matched; do something with matchobj ...
elif (matchobj = re3.match(line)):
....

Of course, that doesn't work as written because Python's assignments
are statements rather than expressions. The obvious rewrite results
in deeply nested if's:

matchobj = re1.match(line)
if matchobj:
... re1 matched; do something with matchobj ...
else:
matchobj = re2.match(line)
if matchobj:
... re2 matched; do something with matchobj ...
else:
matchobj = re3.match(line)
if matchobj:
...

Normally I have nothing against nested ifs, but in this case the deep
nesting unnecessarily complicates the code without providing
additional value -- the logic is still exactly equivalent to the
if/elif/elif/... shown above.

There are ways to work around the problem, for example by writing a
utility predicate that passes the match object as a side effect, but
that feels somewhat non-standard. I'd like to know if there is a
Python idiom that I'm missing. What would be the Pythonic way to
write the above code?
Instead of scanning the same input over and over again with different,
maybe complex, regexes and ugly looking, nested ifs, i would suggest
defining a grammar and do parsing the input once with registered hooks
for your matching expressions.

SimpleParse (http://simpleparse.sourceforge.net) with a
DispatchProcessor or pyparsing (http://pyparsing.wikispaces.com/) in
combination with setParseAction or something similar are your friends
for such a task.

Steffen

May 10 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.