467,169 Members | 1,013 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,169 developers. It's quick & easy.

Re: template strings for matching?

Wow, this was harder than I thought (at least for a rusty Pythoneer
like myself). Here's my stab at an implementation. Remember, the
goal is to add a "match" method to Template which works like
Template.substitute, but in reverse: given a string, if that string
matches the template, then it should return a dictionary mapping each
template field to the corresponding value in the given string.

Oh, and as one extra feature, I want to support a ".greedy" attribute
on the Template object, which determines whether the matching of
fields should be done in a greedy or non-greedy manner.

------------------------------------------------------------
#!/usr/bin/python

from string import Template
import re

def templateMatch(self, s):
# start by finding the fields in our template, and building a map
# from field position (index) to field name.
posToName = {}
pos = 1
for item in self.pattern.findall(self.template):
# each item is a tuple where item 1 is the field name
posToName[pos] = item[1]
pos += 1

# determine if we should match greedy or non-greedy
greedy = False
if self.__dict__.has_key('greedy'):
greedy = self.greedy

# now, build a regex pattern to compare against s
# (taking care to escape any characters in our template that
# would have special meaning in regex)
pat = self.template.replace('.', '\\.')
pat = pat.replace('(', '\\(')
pat = pat.replace(')', '\\)') # there must be a better way...

if greedy:
pat = self.pattern.sub('(.*)', pat)
else:
pat = self.pattern.sub('(.*?)', pat)
p = re.compile(pat)

# try to match this to the given string
match = p.match(s)
if match is None: return None
out = {}
for i in posToName.keys():
out[posToName[i]] = match.group(i)
return out
Template.match = templateMatch

t = Template("The $object in $location falls mainly in the $subloc.")
print t.match( "The rain in Spain falls mainly in the train." )
------------------------------------------------------------

This sort-of works, but it won't properly handle $$ in the template,
and I'm not too sure whether it handles the ${fieldname} form,
either. Also, it only escapes '.', '(', and ')' in the template...
there must be a better way of escaping all characters that have
special meaning to RegEx, except for '$' (which is why I can't use
re.escape).

Probably the rest of the code could be improved too. I'm eager to
hear your feedback.

Thanks,
- Joe
Oct 9 '08 #1
  • viewed: 2929
Share:
1 Reply
On Oct 9, 5:20*pm, Joe Strout <j...@strout.netwrote:
Wow, this was harder than I thought (at least for a rusty Pythoneer *
like myself). *Here's my stab at an implementation. *Remember, the *
goal is to add a "match" method to Template which works like *
Template.substitute, but in reverse: given a string, if that string *
matches the template, then it should return a dictionary mapping each *
template field to the corresponding value in the given string.

Oh, and as one extra feature, I want to support a ".greedy" attribute *
on the Template object, which determines whether the matching of *
fields should be done in a greedy or non-greedy manner.

------------------------------------------------------------
#!/usr/bin/python

from string import Template
import re

def templateMatch(self, s):
* * * * # start by finding the fields in our template, and building a map
* * * * # from field position (index) to field name.
* * * * posToName = {}
* * * * pos = 1
* * * * for item in self.pattern.findall(self.template):
* * * * * * * * # each item is a tuple where item 1 is the field name
* * * * * * * * posToName[pos] = item[1]
* * * * * * * * pos += 1

* * * * # determine if we should match greedy or non-greedy
* * * * greedy = False
* * * * if self.__dict__.has_key('greedy'):
* * * * * * * * greedy = self.greedy

* * * * # now, build a regex pattern to compare against s
* * * * # (taking care to escape any characters in our template that
* * * * # would have special meaning in regex)
* * * * pat = self.template.replace('.', '\\.')
* * * * pat = pat.replace('(', '\\(')
* * * * pat = pat.replace(')', '\\)') # there must be a better way...

* * * * if greedy:
* * * * * * * * pat = self.pattern.sub('(.*)', pat)
* * * * else:
* * * * * * * * pat = self.pattern.sub('(.*?)', pat)
* * * * p = re.compile(pat)

* * * * # try to match this to the given string
* * * * match = p.match(s)
* * * * if match is None: return None
* * * * out = {}
* * * * for i in posToName.keys():
* * * * * * * * out[posToName[i]] = match.group(i)
* * * * return out

Template.match = templateMatch

t = Template("The $object in $location falls mainly in the $subloc.")
print t.match( "The rain in Spain falls mainly in the train." )
------------------------------------------------------------

This sort-of works, but it won't properly handle $$ in the template, *
and I'm not too sure whether it handles the ${fieldname} form, *
either. *Also, it only escapes '.', '(', and ')' in the template... *
there must be a better way of escaping all characters that have *
special meaning to RegEx, except for '$' (which is why I can't use *
re.escape).

Probably the rest of the code could be improved too. *I'm eager to *
hear your feedback.

Thanks,
- Joe
How about something like:

import re

def placeholder(m):
if m.group(1):
return "(?P<%s>.+)" % m.group(1)
elif m.group(2):
return "\\$"
else:
return re.escape(m.group(3))

regex = re.compile(r"\$(\w+)|(\$\$)")

t = "The $object in $location falls mainly in the $subloc."
print regex.sub(placeholder, t)
Oct 9 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Joachim Spoerhase | last post: by
1 post views Thread by George2 | last post: by
6 posts views Thread by abir | last post: by
2 posts views Thread by Joe Strout | last post: by
reply views Thread by Robin Becker | last post: by
2 posts views Thread by Bruce !C!+ | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.