By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,772 Members | 906 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,772 IT Pros & Developers. It's quick & easy.

use a regex or not?

P: n/a
I am looking for a function that takes an input string
and a pattern, and outputs a dictionary.

# @param s str, lowercase letters
# @param p str, lowercase and uppercase letters
# @return dict
def fill(s, p):
d = {}
....
return d

String s has characters from the lowercase letters.
String p is a pattern, a string of characters from the
lowercase and uppercase letters. The idea is to match
s with p, where lowercase letters have to match
exactly, and to fill variables (with an uppercase
letter name) with the rest of s. The variables are
collected in a dictionary with the resulting bindings.
A variable that occurs in more than one place in p must
bind to the same substring of s.

Tests:
fill('ab', p='aA') {'A': 'b'} fill('ab', p='Ab') {'A': 'a'} fill('bb', p='Ab') # no match {} fill('aa', p='Aa') {'A': 'a'} fill('aa', p='Ab') # no match {} fill('abb', p='aA') {'A': 'bb'} fill('aba', p='aAa') {'A': 'b'} fill('abb', p='aAa') # no match {} fill('abab', p='aAaA') # A-matches must be equal {'A': 'b'} fill('abac', p='aAaA') # no match {} fill('abac', p='aAaB')

{'A': 'b', 'B': 'c'}

Can you do it? Is trying a solution with a regex a
good idea?

Jul 19 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Oops, the 3rd test should be

fill('bb', p='Aa')

resulting in the empty dict {} because no binding for A can be found.

Jul 19 '05 #2

P: n/a
Joost Jacob wrote:
I am looking for a function that takes an input string
and a pattern, and outputs a dictionary.


Here is one way. I won't say it's pretty but it is fairly straightforward and it passes all your tests.

Kent

def fill(s, p):
"""
fill('ab', p='aA') {'A': 'b'}
fill('ab', p='Ab') {'A': 'a'}
fill('bb', p='Aa') # no match {}
fill('aa', p='Aa') {'A': 'a'}
fill('aa', p='Ab') # no match {}
fill('abb', p='aA') {'A': 'bb'}
fill('aba', p='aAa') {'A': 'b'}
fill('abb', p='aAa') # no match {}
fill('abab', p='aAaA') # A-matches must be equal {'A': 'b'}
fill('abac', p='aAaA') # no match {}
fill('abac', p='aAaB')

{'A': 'b', 'B': 'c'}
"""

# If s is longer than p the extra chars of s become part of the
# last item of s
if len(s) > len(p):
ss = list(s[:len(p)])
ss[-1] = ss[-1] + s[len(p):]
s = ss

d = {}
seen = {}

for s1, p1 in zip(s, p):
# Lower case have to map to themselves
if p1.islower() and s1 != p1:
# print 'Non-identity map for %s: %s' % (p1, s1)
return {}

try:
# Check for multiple mappings from p1
old_s = d[p1]

if old_s != s1:
# We saw this s before but the mapping is different
return {}

# We saw this s1, p1 before with the same mapping
continue

except KeyError:
# Check for multiple mappings to p1
if seen.get(s1, p1.lower()) != p1.lower():
return {}

# This is a new mapping
d[p1] = s1
seen[s1] = p1.lower()
# Strip out the lower case mappings
return dict((k, v) for k, v in d.iteritems() if k.isupper())

def _test():
import doctest
doctest.testmod()

if __name__ == "__main__":
_test()
Jul 19 '05 #3

P: n/a
Here's a hybrid solution, using pyparsing to parse your input pattern
string (p), and transforming it into a regexp string. Then uses
re.match using the regexp to process string (s), and then builds a
dictionary from the matched groups.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

===================
import re
from pyparsing import Word,OneOrMore

previousKeys = []
def createUpperKey(s,l,t):
if t[0] not in previousKeys:
ret = "(?P<%s>.*)" % t[0]
previousKeys.append(t[0])
else:
ret = "(?P=%s)" % t[0]
return ret

lowers = "abcdefghijklmnopqrstuvwxyz"
uppers = lowers.upper()
lowerLetter = Word(lowers,max=1)
upperLetter = Word(uppers,max=1).setParseAction(createUpperKey)
pattern = OneOrMore( lowerLetter | upperLetter )

def fill(s,p):
# reset keys found in p-patterns
del previousKeys[:]

# create regexp by running pyparsing transform
regexp = pattern.transformString(p) + "$"

# create dict from re-matched groups
match = re.match(regexp,s)
if match:
ret = dict( [ (k,match.group(k)) for k in previousKeys ] )
else:
ret = dict()
return ret

tests = [
('ab','aA'),
('ab','Ab'),
('bb','Aa'),
('aa','Aa'),
('aa','Ab'),
('abb','aA'),
('aba','aAa'),
('abbba','aAa'),
('abbbaba','aAa'),
('abb','aAa'),
('abab','aAaA'),
('abac','aAaA'),
('abac','aAaB'),
]

for t in tests:
print t,fill( *t )

================
Gives:
('ab', 'aA') {'A': 'b'}
('ab', 'Ab') {'A': 'a'}
('bb', 'Aa') {}
('aa', 'Aa') {'A': 'a'}
('aa', 'Ab') {}
('abb', 'aA') {'A': 'bb'}
('aba', 'aAa') {'A': 'b'}
('abbba', 'aAa') {'A': 'bbb'}
('abbbaba', 'aAa') {'A': 'bbbab'}
('abb', 'aAa') {}
('abab', 'aAaA') {'A': 'b'}
('abac', 'aAaA') {}
('abac', 'aAaB') {'A': 'b', 'B': 'c'}

Jul 19 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.