Connecting Tech Pros Worldwide Forums | Help | Site Map

regexp question

python_charmer2000
Guest
 
Posts: n/a
#1: Jul 18 '05
I want to match several regexps against a large body of text. What I
have so far is similar to this:

re1 = <some regexp>
re2 = <some regexp>
re3 = <some regexp>

big_re = re.compile(re1 + '|' + re2 + '|' + re3)

matches = big_re.finditer(file_list)
for match in matches:
span = match.span()
print "matched text =", file_list[span[0]:span[1]]
print "matched re =", match.re.pattern

Now the "match.re.pattern" is the entire regexp, big_re. But I want
to print out the portion of the big re that was matched -- was it re1?
re2? or re3? Is it possible to determine this, or do I have to make
a second pass through the collection of re's and compare them against
the "matched text" in order to determine which part of the big_re was
matched?

thanks!!




Ben Finney
Guest
 
Posts: n/a
#2: Jul 18 '05

re: regexp question


On Fri, 05 Dec 2003 02:26:53 -0000, python_charmer2000 wrote:[color=blue]
> re1 = <some regexp>
> re2 = <some regexp>
> re3 = <some regexp>
>
> big_re = re.compile(re1 + '|' + re2 + '|' + re3)
>
> Now the "match.re.pattern" is the entire regexp, big_re. But I want
> to print out the portion of the big re that was matched -- was it re1?
> re2? or re3? Is it possible to determine this, or do I have to make
> a second pass through the collection of re's and compare them against
> the "matched text" in order to determine which part of the big_re was
> matched?[/color]

That will work no matter what your regexes hapen to be, and is easily
understood. Implement that, and see if it's fast enough. (Doing
otherwise is known as "premature optimisation" and is a bad practice.)
In fact, it may be better (from a readability standpoint) to simply
compile each of the regexes and match them all each time.

An alternative, if it's not fast enough: Group the regexes and inspect
them with the re.MatchObject.group() method.
[color=blue][color=green][color=darkred]
>>> import re
>>> regex1 = 'abc'
>>> regex2 = 'def'
>>> regex3 = 'ghi'
>>> big_regex = re.compile([/color][/color][/color]
... '(' + regex1 + ')'
... + '|(' + regex2 + ')'
... + '|(' + regex3 + ')'
... )[color=blue][color=green][color=darkred]
>>> match = re.match( big_regex, 'def' )
>>> match.groups()[/color][/color][/color]
(None, 'def', None)[color=blue][color=green][color=darkred]
>>> match.group(1)
>>> match.group(2)[/color][/color][/color]
'def'[color=blue][color=green][color=darkred]
>>> match.group(3)
>>>[/color][/color][/color]


--
\ "As the evening sky faded from a salmon color to a sort of |
`\ flint gray, I thought back to the salmon I caught that morning, |
_o__) and how gray he was, and how I named him Flint." -- Jack Handey |
Ben Finney <http://bignose.squidly.org/>
Closed Thread


Similar Python bytes