473,725 Members | 2,197 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

how to handle repetitive regexp match checks


Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'st ruct {')
rx2 = re.compile(r'ty pedef struct {')
rx3 = re.compile(r'so mething else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error

(In Scheme I was able to do this cleanly with macros.)

Matt
Jul 18 '05 #1
6 2321
Matt Wette <ma********@ear thlink.net> writes:
Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'st ruct {')
rx2 = re.compile(r'ty pedef struct {')
rx3 = re.compile(r'so mething else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error


I usually define a class like this:

class Matcher:
def __init__(self, text):
self.m = None
self.text = text
def match(self, pat):
self.m = pat.match(self. text)
return self.m
def __getitem__(sel f, name):
return self.m.group(na me)

Then, use it like

for line in fo:
m = Matcher(line)
if m.match(rx1):
do something
elif m.match(rx2):
do something
else:
error

--
|>|\/|<
David M. Cooke
cookedm(at)phys ics(dot)mcmaste r(dot)ca
Jul 18 '05 #2
Matt Wette wrote:
I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'st ruct {')
rx2 = re.compile(r'ty pedef struct {')
rx3 = re.compile(r'so mething else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error

(In Scheme I was able to do this cleanly with macros.)


My preferred way to do this is something like this:

import re

RX = re.compile(r'''
(?P<rx1> struct\s{ )|
(?P<rx2> typedef\sstruct \s{ )|
(?P<rx3> something\selse )
''', re.VERBOSE)

class Matcher:
def rx1(self, m):
print "rx1 matched", m.group(0)

def rx2(self, m):
print "rx2 matched", m.group(0)

def rx3(self, m):
print "rx3 matched", m.group(0)

def processLine(sel f, line):
m = RX.match(line)
if m:
getattr(self, m.lastgroup)(m)
else:
print "error",repr(li ne),"did not match"

matcher = Matcher()
matcher.process Line('struct { something')
matcher.process Line('typedef struct { something')
matcher.process Line('something else')
matcher.process Line('will not match')

Jul 18 '05 #3
Matt Wette wrote:

Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a
match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?


I had a similar situation along with the requirement that the text to be
scanned was being read in chunks. After looking at the Python re module
and various other regex packages, I eventually wrote my own multiple
pattern scanning matcher.

However, since then I've discovered that the sre Python module has a
Scanner class that does something similar.

Anyway, you can see my code at:
http://users.cs.cf.ac.uk/J.P.Giddy/p...respass/2.0.0/

Using it, your code could look like:

# do this once
import Trespass
pattern = Trespass.Patter n()
pattern.addRegE xp(r'struct {', 1)
pattern.addRegE xp(r'typedef struct {', 2)
pattern.addRegE xp(r'something else', 3)

# do this for each line
match = pattern.match(l ine)
if match:
value = match.value()
if value == 1:
# struct
do something
elif value == 2:
# typedef
do something
elif value == 3:
# something else
do something
else:
error
Jul 18 '05 #4
GiddyJP wrote:

# do this once
import Trespass
pattern = Trespass.Patter n()
pattern.addRegE xp(r'struct {', 1)
pattern.addRegE xp(r'typedef struct {', 2)
pattern.addRegE xp(r'something else', 3)


Minor correction... in this module { always needs to be escaped if not
indicating a bounded repeat:
pattern.addRegE xp(r'struct \{', 1)
pattern.addRegE xp(r'typedef struct \{', 2)
pattern.addRegE xp(r'something else', 3)
Jul 18 '05 #5
Matt -

Pyparsing may be of interest to you. One of its core features is the
ability to associate an action method with a parsing pattern. During
parsing, the action is called with the original source string, the
location within the string of the match, and the matched tokens.

Your code would look something like :

lbrace = Literal('{')
typedef = Literal('typede f')
struct = Literal('struct ')
rx1 = struct + lbrace
rx2 = typedef + struct + lbrace
rx3 = Literal('someth ing') + Literal('else')

def rx1Action(strg, loc, tokens):
.... put stuff to do here...

rx1.setParseAct ion( rx1Action )
rx2.setParseAct ion( rx2Action )
rx3.setParseAct ion( rx3Action )

# read code into Python string variable 'code'
patterns = (rx1 | rx2 | rx3)
patterns.scanSt ring( code )

(I've broken up some of your literals, which allows for intervening
variable whitespace - that is Literal('struct ') +Literal('{') will
accommodate one, two, or more blanks (even line breaks) between the
'struct' and the '{'.)

Get pyparsing at http://pyparsing.sourceforge.net.

-- Paul

Jul 18 '05 #6
Matt Wette wrote:

Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a
match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'st ruct {')
rx2 = re.compile(r'ty pedef struct {')
rx3 = re.compile(r'so mething else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error


If you don't need the match object as part of "do something", you
could do a fairly literal translation of the Perl:

if rx1.match(line) :
do something
elif rx2.match(line) :
do something else
elif rx3.match(line) :
do other thing
else:
raise ValueError("... ")

Alternatively, if each of the "do something" phrases can be easily
reduced to a function call, then you could do something like:

def do_something(li ne, match): ...
def do_something_el se(line, match): ...
def do_other_thing( line, match): ...

table = [ (re.compile(r's truct {'), do_something),
(re.compile(r't ypedef struct {'), do_something_el se),
(re.compile(r's omething else'), do_other_thing) ]

for pattern, func in table:
m = pattern.match(l ine)
if m:
func(line, m)
break
else:
raise ValueError("... ")

The for/else pattern may look a bit odd, but the key feature here is
that the else clause only runs if the for loop terminates normally --
if you break out of the loop, the else does *not* run.

Jeff Shannon

Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2194
by: python_charmer2000 | last post by:
I want to match several regexps against a large body of text. What I have so far is similar to this: re1 = <some regexp> re2 = <some regexp> re3 = <some regexp> big_re = re.compile(re1 + '|' + re2 + '|' + re3) matches = big_re.finditer(file_list)
0
1815
by: Chris Croughton | last post by:
I'm trying to use the EXSLT regexp package from http://www.exslt.org/regexp/functions/match/index.html (specifically the match function) with the libxml xltproc (which supports EXSLT), but whatever I do gets errors. The examples use namespace regExp, but the supplied files use regexp, I've got it so that it at least doesn't complain about namespaces but it then complains that it can't find the match function. My stylesheet is:
3
2692
by: jasonkester | last post by:
Just a heads up for anybody that comes across this in the future. Noticed a strange behavior in RegExp.test() today. Check out the following code. It will alternately display "chokes" and null, as every second call to .test() fails. <script> var str = "RegExp (chokes) on this every 2nd time"; var re = /chokes/g;
8
2029
by: Dmitry Korolyov | last post by:
ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web server. A single-line asp:textbox control and regexp validator attached to it. ^\d+$ expression does match an empty string (when you don't enter any values) - this is wrong d+ expression does not match, for example "g24" string - this is also wrong www.regexplib.com test validator works fine for both cases, i.e. it is reporting "not match" for the...
7
3450
by: Csaba Gabor | last post by:
I need to come up with a function function regExpPos (text, re, parenNum) { ... } that will return the position within text of RegExp.$parenNum if there is a match, and -1 otherwise. For example: var re = /some(thing|or other)?.*(n(est)(?:ed)?.*(parens) )/ var text = "There were some nesting parens in the test"; alert (regExpPos (text, re, 3));
4
2749
by: conan | last post by:
This regexp '<widget class=".*" id=".*">' works well with 'grep' for matching lines of the kind <widget class="GtkWindow" id="window1"> on a XML .glade file However that's not true for the re module in python, since this one takes the regexp as if were specified this way: '^<widget class=".*"
27
3241
by: SQL Learner | last post by:
Hi all, I have an Access db with two large tables - 3,100,000 (tblA) and 7,000 (tblB) records. I created a select query using Inner Join by partial matching two fields (X from tblA and Y from tblB). The size of the db is about 200MBs. Now my issue is, the query has been running for over 3 hours already - I have no idea when it will end. I am using Access 2003. Are there ways to improve the speed performance? (Also, would the...
12
1856
by: Torsten Bronger | last post by:
Hallöchen! I need some help with finding matches in a string that has some characters which are marked as escaped (in a separate list of indices). Escaped means that they must not be part of any match. My current approach is to look for matches in substrings with the escaped characters as boundaries between the substrings. However, then ^ and $ in the patterns are treated wrongly. (Although I use startpos and endpos parameters for...
4
2537
by: r | last post by:
Hello, It seems delimiters can cause trouble sometimes. Look at this : <script type="text/javascript"> function isDigit(s) { var DECIMAL = '\\.'; var exp = '/(^?0(' + DECIMAL
0
8888
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9257
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9176
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9113
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8097
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6011
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4519
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
2635
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2157
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.