472,133 Members | 1,474 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,133 software developers and data experts.

checking a string against multiple patterns

Hi,

here is a piece of pseudo-code (taken from Ruby) that illustrates the
problem I'd like to solve in Python:

str = 'abc'
if str =~ /(b)/ # Check if str matches a pattern
str = $` + $1 # Perform some action
elsif str =~ /(a)/ # Check another pattern
str = $1 + $' # Perform some other action
elsif str =~ /(c)/
str = $1
end

The task is to check a string against a number of different patterns
(containing groupings).
For each pattern, different actions need to be taken.

In Python, a single match of this kind can be done as follows:

str = 'abc'
match = re.search( '(b)' , str )
if match: str = str[0:m.start()] + m.group(1) # I'm not sure if
this way of accessing 'pre-match'
# is
optimal, but let's ignore it now

The problem is that you you can't extend this example to multiple
matches with 'elif'
because the match must be performed separately from the conditional.

This obviously won't work in Python:

if match=re.search( pattern1 , str ):
...
elif match=re.search( pattern2 , str ):
...

So the only way seems to be:

match = re.search( pattern1 , str ):
if match:
....
else:
match = re.search( pattern2 , str ):
if match:
....
else:
match = re.search( pattern3 , str ):
if match:
....

and we end up having a very nasty, multiply-nested code.

Is there an alternative to it? Am I missing something? Python doesn't
have special variables $1, $2 (right?) so you must assign the result
of a match to a variable, to be able to access the groups.

I'd appreciate any hints.

Tomasz



Dec 18 '07 #1
5 21808
kib
tomasz a écrit :
Is there an alternative to it? Am I missing something? Python doesn't
have special variables $1, $2 (right?) so you must assign the result
of a match to a variable, to be able to access the groups.
Hi Thomasz,

See ie :

http://www.regular-expressions.info/python.html [Search and Replace section]

And you'll see that Python supports numbered groups and even named
groups in regular expressions.

Christophe K.
Dec 18 '07 #2
On 18 dic, 09:41, tomasz <tmkm...@googlemail.comwrote:
Hi,

here is a piece of pseudo-code (taken from Ruby) that illustrates the
problem I'd like to solve in Python:

str = 'abc'
if str =~ /(b)/ # Check if str matches a pattern
str = $` + $1 # Perform some action
elsif str =~ /(a)/ # Check another pattern
str = $1 + $' # Perform some other action
elsif str =~ /(c)/
str = $1
end

The task is to check a string against a number of different patterns
(containing groupings).
For each pattern, different actions need to be taken.

In Python, a single match of this kind can be done as follows:

str = 'abc'
match = re.search( '(b)' , str )
if match: str = str[0:m.start()] + m.group(1) # I'm not sure if
this way of accessing 'pre-match'
# is
optimal, but let's ignore it now

The problem is that you you can't extend this example to multiple
matches with 'elif'
because the match must be performed separately from the conditional.

This obviously won't work in Python:

if match=re.search( pattern1 , str ):
...
elif match=re.search( pattern2 , str ):
...

So the only way seems to be:

match = re.search( pattern1 , str ):
if match:
....
else:
match = re.search( pattern2 , str ):
if match:
....
else:
match = re.search( pattern3 , str ):
if match:
....

and we end up having a very nasty, multiply-nested code.
Define a small function with each test+action, and iterate over them
until a match is found:

def check1(input):
match = re.search(pattern1, input)
if match:
return input[:match.end(1)]

def check2(input):
match = re.search(pattern2, input)
if match:
return ...

def check3(input):
match = ...
if match:
return ...

for check in check1, check2, check3:
result = check(input)
if result is not None:
break
else:
# no match found

--
Gabriel Genellina
Dec 18 '07 #3
On Dec 18, 1:41 pm, tomasz <tmkm...@googlemail.comwrote:
Hi,

here is a piece of pseudo-code (taken from Ruby) that illustrates the
problem I'd like to solve in Python:

str = 'abc'
if str =~ /(b)/ # Check if str matches a pattern
str = $` + $1 # Perform some action
elsif str =~ /(a)/ # Check another pattern
str = $1 + $' # Perform some other action
elsif str =~ /(c)/
str = $1
end

The task is to check a string against a number of different patterns
(containing groupings).
For each pattern, different actions need to be taken.
In the `re.sub` function (and `sub` method of regex object), the
`repl` parameter can be a callback function as well as a string:

http://docs.python.org/lib/node46.html

Does that help?

Eg.

def multireplace(text, mapping):
rx = re.compile('|'.join(re.escape(key) for key in mapping))
def callback(match):
key = match.group(0)
repl = mapping[key]
log.info("Replacing '%s' with '%s'", key, repl)
return repl
return rx.subn(callback, text)

(I'm not sure, but I think I adapted this from: http://effbot.org/zone/python-replace.htm)

Gerard
Dec 18 '07 #4
tomasz <tm*****@googlemail.comwrites:
here is a piece of pseudo-code (taken from Ruby) that illustrates the
problem I'd like to solve in Python:
[...]

I asked the very same question in
http://groups.google.com/group/comp....eb5631ade8b393
It seems that people either write more elaborate constructs or learn
to tolerate the nesting.
Is there an alternative to it?
A simple workaround is to write a trivial function that returns a
boolean, and also stores the match object in either a global storage
or an object. It's not really elegant, especially in smaller scripts,
but it works:

def search(pattern, s, store):
match = re.search(pattern, s)
store.match = match
return match is not None

class MatchStore(object):
pass # irrelevant, any object with a 'match' attr would do

where = MatchStore()
if search(pattern1, s, where):
pattern1 matched, matchobj in where.match
elif search(pattern2, s, where):
pattern2 matched, matchobj in where.match
....
Dec 18 '07 #5
On Dec 18, 4:41 am, tomasz <tmkm...@googlemail.comwrote:
Is there an alternative to it? Am I missing something? Python doesn't
have special variables $1, $2 (right?) so you must assign the result
of a match to a variable, to be able to access the groups.

I'd appreciate any hints.
Don't use regexes for something as simple as this. Try find().

Most of the time I use regexes in perl (90%+) I am doing something
that can be done much better using the string methods and some simple
operations. Plus, it turns out to be faster than perl usually.
Dec 18 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.