A found some clues on lexing using the re module in Python in an
article by Martin L÷wis.
http://www.python.org/community/sigs...ards-standard/
He writes:
[...]
A scanner based on regular expressions is usually implemented
as an alternative of all token definitions. For XPath, a
fragment of this expressions looks like this:
(?P<Number>\\d+(\\.\\d*)?|\\.\\d+)|
(?P<VariableReference>\\$""" + QName + """)|
(?P<NCName>"""+NCName+""")|
(?P<QName>"""+QName+""")|
(?P<LPAREN>\\()|
Here, each alternative in the regular expression defines a
named group. Scanning proceeds in the following steps:
1. Given the complete input, match the regular expression
with the beginning of the input.
2. Find out which alternative matched.
[...]
Item 2 is where I get stuck. There doesn't seem to be an obvious
way to do it, which I understand is a bad thing in Python.
Whatever source code went with the article originally is not
linked from the above page, so I don't know what Martin did.
Here's what I came up with (with a trivial example regex):
import re
r = re.compile('(?P<x>x+)|(?P<a>a+)')
m = r.match('aaxaxx')
if m:
for k in r.groupindex:
if m.group(k):
# Find the token type.
token = (k, m.group())
I wish I could do something obvious instead, like m.name().
--
Neil Cerutti
After finding no qualified candidates for the position of principal, the
school board is pleased to announce the appointment of David Steele to the
post. --Philip Streifer