473,326 Members | 2,133 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Re: template strings for matching?

Wow, this was harder than I thought (at least for a rusty Pythoneer
like myself). Here's my stab at an implementation. Remember, the
goal is to add a "match" method to Template which works like
Template.substitute, but in reverse: given a string, if that string
matches the template, then it should return a dictionary mapping each
template field to the corresponding value in the given string.

Oh, and as one extra feature, I want to support a ".greedy" attribute
on the Template object, which determines whether the matching of
fields should be done in a greedy or non-greedy manner.

------------------------------------------------------------
#!/usr/bin/python

from string import Template
import re

def templateMatch(self, s):
# start by finding the fields in our template, and building a map
# from field position (index) to field name.
posToName = {}
pos = 1
for item in self.pattern.findall(self.template):
# each item is a tuple where item 1 is the field name
posToName[pos] = item[1]
pos += 1

# determine if we should match greedy or non-greedy
greedy = False
if self.__dict__.has_key('greedy'):
greedy = self.greedy

# now, build a regex pattern to compare against s
# (taking care to escape any characters in our template that
# would have special meaning in regex)
pat = self.template.replace('.', '\\.')
pat = pat.replace('(', '\\(')
pat = pat.replace(')', '\\)') # there must be a better way...

if greedy:
pat = self.pattern.sub('(.*)', pat)
else:
pat = self.pattern.sub('(.*?)', pat)
p = re.compile(pat)

# try to match this to the given string
match = p.match(s)
if match is None: return None
out = {}
for i in posToName.keys():
out[posToName[i]] = match.group(i)
return out
Template.match = templateMatch

t = Template("The $object in $location falls mainly in the $subloc.")
print t.match( "The rain in Spain falls mainly in the train." )
------------------------------------------------------------

This sort-of works, but it won't properly handle $$ in the template,
and I'm not too sure whether it handles the ${fieldname} form,
either. Also, it only escapes '.', '(', and ')' in the template...
there must be a better way of escaping all characters that have
special meaning to RegEx, except for '$' (which is why I can't use
re.escape).

Probably the rest of the code could be improved too. I'm eager to
hear your feedback.

Thanks,
- Joe
Oct 9 '08 #1
1 3388
On Oct 9, 5:20*pm, Joe Strout <j...@strout.netwrote:
Wow, this was harder than I thought (at least for a rusty Pythoneer *
like myself). *Here's my stab at an implementation. *Remember, the *
goal is to add a "match" method to Template which works like *
Template.substitute, but in reverse: given a string, if that string *
matches the template, then it should return a dictionary mapping each *
template field to the corresponding value in the given string.

Oh, and as one extra feature, I want to support a ".greedy" attribute *
on the Template object, which determines whether the matching of *
fields should be done in a greedy or non-greedy manner.

------------------------------------------------------------
#!/usr/bin/python

from string import Template
import re

def templateMatch(self, s):
* * * * # start by finding the fields in our template, and building a map
* * * * # from field position (index) to field name.
* * * * posToName = {}
* * * * pos = 1
* * * * for item in self.pattern.findall(self.template):
* * * * * * * * # each item is a tuple where item 1 is the field name
* * * * * * * * posToName[pos] = item[1]
* * * * * * * * pos += 1

* * * * # determine if we should match greedy or non-greedy
* * * * greedy = False
* * * * if self.__dict__.has_key('greedy'):
* * * * * * * * greedy = self.greedy

* * * * # now, build a regex pattern to compare against s
* * * * # (taking care to escape any characters in our template that
* * * * # would have special meaning in regex)
* * * * pat = self.template.replace('.', '\\.')
* * * * pat = pat.replace('(', '\\(')
* * * * pat = pat.replace(')', '\\)') # there must be a better way...

* * * * if greedy:
* * * * * * * * pat = self.pattern.sub('(.*)', pat)
* * * * else:
* * * * * * * * pat = self.pattern.sub('(.*?)', pat)
* * * * p = re.compile(pat)

* * * * # try to match this to the given string
* * * * match = p.match(s)
* * * * if match is None: return None
* * * * out = {}
* * * * for i in posToName.keys():
* * * * * * * * out[posToName[i]] = match.group(i)
* * * * return out

Template.match = templateMatch

t = Template("The $object in $location falls mainly in the $subloc.")
print t.match( "The rain in Spain falls mainly in the train." )
------------------------------------------------------------

This sort-of works, but it won't properly handle $$ in the template, *
and I'm not too sure whether it handles the ${fieldname} form, *
either. *Also, it only escapes '.', '(', and ')' in the template... *
there must be a better way of escaping all characters that have *
special meaning to RegEx, except for '$' (which is why I can't use *
re.escape).

Probably the rest of the code could be improved too. *I'm eager to *
hear your feedback.

Thanks,
- Joe
How about something like:

import re

def placeholder(m):
if m.group(1):
return "(?P<%s>.+)" % m.group(1)
elif m.group(2):
return "\\$"
else:
return re.escape(m.group(3))

regex = re.compile(r"\$(\w+)|(\$\$)")

t = "The $object in $location falls mainly in the $subloc."
print regex.sub(placeholder, t)
Oct 9 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Joachim Spoerhase | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I am a XSLT-beginner and i read the XSLT-recommendation of the W3C through. But I did'nt really understand section 5.5 of the latest...
8
by: Bob | last post by:
I need to create a Regex to extract all strings (including quotations) from a C# or C++ source file. After being unsuccessful myself, I found this sample on the internet: ...
6
by: Neal | last post by:
Hi All, I used an article on XSLT and XML and creating a TOC written on the MSDN CodeCorner. ms-help://MS.VSCC.2003/MS.MSDNQTR.2003FEB.1033/dncodecorn/html/corner042699.htm However, it did'nt...
9
by: | last post by:
I am interested in scanning web pages for content of interest, and then auto-classifying that content. I have tables of metadata that I can use for the classification, e.g. : "John P. Jones" "Jane...
1
by: George2 | last post by:
Hello everyone, I am feeling template function is more tricky than template class. For the reason that the compiler will do the matching automatically for template function, but for template...
4
by: abir | last post by:
I am matching a template, and specializing based of a template, rather than a single class. The codes are like, template<template<typename T,typename Alloc = std::allocator<T> class pix{ }; ...
6
by: abir | last post by:
i have a template as shown template<typename Sclass Indexer{}; i want to have a specialization for std::vector both const & non const version. template<typename T,typename Aclass...
2
by: Joe Strout | last post by:
Catching up on what's new in Python since I last used it a decade ago, I've just been reading up on template strings. These are pretty cool! However, just as a template string has some advantages...
0
by: Robin Becker | last post by:
Joe Strout wrote: ........ you could use something like this to record the lookups .... def __new__(cls,*args,**kwds): .... self = dict.__new__(cls,*args,**kwds) .... self.__record =...
2
by: Bruce !C!+ | last post by:
as we known , we can use function pointer as: float Minus (float a, float b) { return a-b; } float (*getOp())(float, float) { return &Minus; } int main() { float (*opFun)(float, float) =...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.