On Oct 9, 5:20*pm, Joe Strout <j...@strout.netwrote:
Wow, this was harder than I thought (at least for a rusty Pythoneer *
like myself). *Here's my stab at an implementation. *Remember, the *
goal is to add a "match" method to Template which works like *
Template.substitute, but in reverse: given a string, if that string *
matches the template, then it should return a dictionary mapping each *
template field to the corresponding value in the given string.
Oh, and as one extra feature, I want to support a ".greedy" attribute *
on the Template object, which determines whether the matching of *
fields should be done in a greedy or non-greedy manner.
------------------------------------------------------------
#!/usr/bin/python
from string import Template
import re
def templateMatch(self, s):
* * * * # start by finding the fields in our template, and building a map
* * * * # from field position (index) to field name.
* * * * posToName = {}
* * * * pos = 1
* * * * for item in self.pattern.findall(self.template):
* * * * * * * * # each item is a tuple where item 1 is the field name
* * * * * * * * posToName[pos] = item[1]
* * * * * * * * pos += 1
* * * * # determine if we should match greedy or non-greedy
* * * * greedy = False
* * * * if self.__dict__.has_key('greedy'):
* * * * * * * * greedy = self.greedy
* * * * # now, build a regex pattern to compare against s
* * * * # (taking care to escape any characters in our template that
* * * * # would have special meaning in regex)
* * * * pat = self.template.replace('.', '\\.')
* * * * pat = pat.replace('(', '\\(')
* * * * pat = pat.replace(')', '\\)') # there must be a better way...
* * * * if greedy:
* * * * * * * * pat = self.pattern.sub('(.*)', pat)
* * * * else:
* * * * * * * * pat = self.pattern.sub('(.*?)', pat)
* * * * p = re.compile(pat)
* * * * # try to match this to the given string
* * * * match = p.match(s)
* * * * if match is None: return None
* * * * out = {}
* * * * for i in posToName.keys():
* * * * * * * * out[posToName[i]] = match.group(i)
* * * * return out
Template.match = templateMatch
t = Template("The $object in $location falls mainly in the $subloc.")
print t.match( "The rain in Spain falls mainly in the train." )
------------------------------------------------------------
This sort-of works, but it won't properly handle $$ in the template, *
and I'm not too sure whether it handles the ${fieldname} form, *
either. *Also, it only escapes '.', '(', and ')' in the template... *
there must be a better way of escaping all characters that have *
special meaning to RegEx, except for '$' (which is why I can't use *
re.escape).
Probably the rest of the code could be improved too. *I'm eager to *
hear your feedback.
Thanks,
- Joe
How about something like:
import re
def placeholder(m):
if m.group(1):
return "(?P<%s>.+)" % m.group(1)
elif m.group(2):
return "\\$"
else:
return re.escape(m.group(3))
regex = re.compile(r"\$(\w+)|(\$\$)")
t = "The $object in $location falls mainly in the $subloc."
print regex.sub(placeholder, t)