By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,550 Members | 1,189 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,550 IT Pros & Developers. It's quick & easy.

how to handle repetitive regexp match checks

P: n/a

Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'struct {')
rx2 = re.compile(r'typedef struct {')
rx3 = re.compile(r'something else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error

(In Scheme I was able to do this cleanly with macros.)

Matt
Jul 18 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Matt Wette <ma********@earthlink.net> writes:
Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'struct {')
rx2 = re.compile(r'typedef struct {')
rx3 = re.compile(r'something else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error


I usually define a class like this:

class Matcher:
def __init__(self, text):
self.m = None
self.text = text
def match(self, pat):
self.m = pat.match(self.text)
return self.m
def __getitem__(self, name):
return self.m.group(name)

Then, use it like

for line in fo:
m = Matcher(line)
if m.match(rx1):
do something
elif m.match(rx2):
do something
else:
error

--
|>|\/|<
David M. Cooke
cookedm(at)physics(dot)mcmaster(dot)ca
Jul 18 '05 #2

P: n/a
Matt Wette wrote:
I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'struct {')
rx2 = re.compile(r'typedef struct {')
rx3 = re.compile(r'something else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error

(In Scheme I was able to do this cleanly with macros.)


My preferred way to do this is something like this:

import re

RX = re.compile(r'''
(?P<rx1> struct\s{ )|
(?P<rx2> typedef\sstruct\s{ )|
(?P<rx3> something\selse )
''', re.VERBOSE)

class Matcher:
def rx1(self, m):
print "rx1 matched", m.group(0)

def rx2(self, m):
print "rx2 matched", m.group(0)

def rx3(self, m):
print "rx3 matched", m.group(0)

def processLine(self, line):
m = RX.match(line)
if m:
getattr(self, m.lastgroup)(m)
else:
print "error",repr(line),"did not match"

matcher = Matcher()
matcher.processLine('struct { something')
matcher.processLine('typedef struct { something')
matcher.processLine('something else')
matcher.processLine('will not match')

Jul 18 '05 #3

P: n/a
Matt Wette wrote:

Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a
match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?


I had a similar situation along with the requirement that the text to be
scanned was being read in chunks. After looking at the Python re module
and various other regex packages, I eventually wrote my own multiple
pattern scanning matcher.

However, since then I've discovered that the sre Python module has a
Scanner class that does something similar.

Anyway, you can see my code at:
http://users.cs.cf.ac.uk/J.P.Giddy/p...respass/2.0.0/

Using it, your code could look like:

# do this once
import Trespass
pattern = Trespass.Pattern()
pattern.addRegExp(r'struct {', 1)
pattern.addRegExp(r'typedef struct {', 2)
pattern.addRegExp(r'something else', 3)

# do this for each line
match = pattern.match(line)
if match:
value = match.value()
if value == 1:
# struct
do something
elif value == 2:
# typedef
do something
elif value == 3:
# something else
do something
else:
error
Jul 18 '05 #4

P: n/a
GiddyJP wrote:

# do this once
import Trespass
pattern = Trespass.Pattern()
pattern.addRegExp(r'struct {', 1)
pattern.addRegExp(r'typedef struct {', 2)
pattern.addRegExp(r'something else', 3)


Minor correction... in this module { always needs to be escaped if not
indicating a bounded repeat:
pattern.addRegExp(r'struct \{', 1)
pattern.addRegExp(r'typedef struct \{', 2)
pattern.addRegExp(r'something else', 3)
Jul 18 '05 #5

P: n/a
Matt -

Pyparsing may be of interest to you. One of its core features is the
ability to associate an action method with a parsing pattern. During
parsing, the action is called with the original source string, the
location within the string of the match, and the matched tokens.

Your code would look something like :

lbrace = Literal('{')
typedef = Literal('typedef')
struct = Literal('struct')
rx1 = struct + lbrace
rx2 = typedef + struct + lbrace
rx3 = Literal('something') + Literal('else')

def rx1Action(strg, loc, tokens):
.... put stuff to do here...

rx1.setParseAction( rx1Action )
rx2.setParseAction( rx2Action )
rx3.setParseAction( rx3Action )

# read code into Python string variable 'code'
patterns = (rx1 | rx2 | rx3)
patterns.scanString( code )

(I've broken up some of your literals, which allows for intervening
variable whitespace - that is Literal('struct') +Literal('{') will
accommodate one, two, or more blanks (even line breaks) between the
'struct' and the '{'.)

Get pyparsing at http://pyparsing.sourceforge.net.

-- Paul

Jul 18 '05 #6

P: n/a
Matt Wette wrote:

Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a
match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'struct {')
rx2 = re.compile(r'typedef struct {')
rx3 = re.compile(r'something else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error


If you don't need the match object as part of "do something", you
could do a fairly literal translation of the Perl:

if rx1.match(line):
do something
elif rx2.match(line):
do something else
elif rx3.match(line):
do other thing
else:
raise ValueError("...")

Alternatively, if each of the "do something" phrases can be easily
reduced to a function call, then you could do something like:

def do_something(line, match): ...
def do_something_else(line, match): ...
def do_other_thing(line, match): ...

table = [ (re.compile(r'struct {'), do_something),
(re.compile(r'typedef struct {'), do_something_else),
(re.compile(r'something else'), do_other_thing) ]

for pattern, func in table:
m = pattern.match(line)
if m:
func(line, m)
break
else:
raise ValueError("...")

The for/else pattern may look a bit odd, but the key feature here is
that the else clause only runs if the for loop terminates normally --
if you break out of the loop, the else does *not* run.

Jeff Shannon

Jul 18 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.