By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
 429,262 Members | 2,664 Online + Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,262 IT Pros & Developers. It's quick & easy.

# String pattern matching

 P: n/a Anyone have experience with string pattern matching? I need a fast way to match variables to strings. Example: string - variables ============ abcaaab - xyz abca - xy eeabcac - vxw x matches abc y matches a z matches aab w maches ac v maches ee Apr 1 '06 #1
Share this Question
9 Replies

 P: n/a Firstly sort variable expressions by its length xy = 'abca' xyz = 'abcaaab' vxw = 'eeabcac' Expand all expressions by its values except itself xy = 'abca' 'abca' z = 'abcaaab' vxw = 'eeabcac' Cut all left and right matches xy = 'abca' z = 'aab' vxw = 'eeabcac' Repeat until you can. z = 'aab' xy = 'abca' vxw = 'eeabcac' Get first variable expression intersection - variables: x is longest intersection of xy and vxw - value : 'abca' starting at position 2 of vxw (longest intersection of 'abca' and 'eeabcac') 'a' starting at position 5 of vxw Then try matching: x='abca' (at position 2 of vxw) x='abc' (at position 2 of vxw) x='ab' (at position 2 of vxw) x='a' (at position 2 of vxw) x='a' (at position 5 of vxw) x='' (at position arbitrary position of vxw) and calculate the others. Repeat all step until you can. In your example, there are 13 results. x='abca' y='' z='aab' v='ee' w='c' x='abc' y='a' z='aab' v='ee' w='ac' x='ab' y='ca' z='aab' v='ee' w='cac' x='a' y='bca' z='aab' v='ee' w='bcac' x='a' y='bca' z='aab' v='eeabc' w='c' x='' y='abca' z='aab' v='' w='' x='' y='abca' z='aab' v='e' w='eabcac' x='' y='abca' z='aab' v='ee' w='abcac' x='' y='abca' z='aab' v='eea' w='bcac' x='' y='abca' z='aab' v='eeab' w='cac' x='' y='abca' z='aab' v='eeabc' w='ac' x='' y='abca' z='aab' v='eeabca' w='c' x='' y='abca' z='aab' v='eeabcac' w='' with the same original expressions xy = 'abca' xyz = 'abcaaab' vxw = 'eeabcac' Note that maximum matching is best matching in some kind of problems. Cutting off empty values, you can get 4 results: x='abc' y='a' z='aab' v='ee' w='ac' x='ab' y='ca' z='aab' v='ee' w='cac' x='a' y='bca' z='aab' v='ee' w='bcac' x='a' y='bca' z='aab' v='eeabc' w='c' Apr 1 '06 #2

 P: n/a Thanks for the interesting and detailed analysis. In my case I don't need all possible answers by rather the first "greedy" match. Seems like there might be some recursive approach. Apr 2 '06 #3

 P: n/a "Jim Lewis" writes: Anyone have experience with string pattern matching?I need a fast way to match variables to strings. Example: string - variables============abcaaab - xyzabca - xyeeabcac - vxw x matches abcy matches az matches aabw maches acv maches ee Off topic I know but I've been learning snobol pattern matching recently so I thought I'd give this one a bash. Here is my effort: define('foo(str)') &fullscan = 1 '/abcaaab/abca/eeabcac/' '/' arb \$ x arb \$ y arb \$ z '/' *x *y '/' + arb \$ v *x arb \$ w '/' *foo(v '/' w '/' x '/' y '/' z) :F(end) foo output = str :(FRETURN) end Good eh? Here's the output: /eeabcac//abca/aab e/eabcac//abca/aab ee/abcac//abca/aab eea/bcac//abca/aab eeab/cac//abca/aab eeabc/ac//abca/aab eeabca/c//abca/aab eeabcac///abca/aab ee/bcac/a/bca/aab eeabc/c/a/bca/aab ee/cac/ab/ca/aab ee/ac/abc/a/aab ee/c/abca//aab Removing empty entries is quite easy: '/abcaaab/abca/eeabcac/' '/' arb \$ x arb \$ y arb \$ z '/' *x *y '/' + arb \$ v *x arb \$ w '/' *differ(v) *differ(w) *differ(x) + *differ(y) *differ(z) *foo(v '/' w '/' x '/' y '/' z) :F(end) generates: ee/bcac/a/bca/aab eeabc/c/a/bca/aab ee/cac/ab/ca/aab ee/ac/abc/a/aab If I get time I'll try to get this working in Sam Wilmott's Python pattern matching library. What fun! Eddie Apr 3 '06 #4

 P: n/a Eddie Corns wrote: Off topic I know but I've been learning snobol pattern matching recently so I thought I'd give this one a bash. Here is my effort: define('foo(str)') &fullscan = 1 '/abcaaab/abca/eeabcac/' '/' arb \$ x arb \$ y arb \$ z '/' *x *y '/' + arb \$ v *x arb \$ w '/' *foo(v '/' w '/' x '/' y '/' z) :F(end) foo output = str :(FRETURN) end If I get time I'll try to get this working in Sam Wilmott's Python pattern matching library. What fun! Cool! I have to get in on the fun :-) This program uses Sam Wilmott's library to find one solution. It is a simple translation of your program above. I couldn't figure out how to get multiple solutions. http://www.wilmott.ca/python/patternmatching.html import string from patterns_b import * subject = MatchingInput ('/abcaaab/abca/eeabcac/') Letters = AnyOfP(string.letters)[1:] subject ^ IsP('/') & Letters >> 'x' & Letters >> 'y' & Letters >> 'z' & IsP('/') \ & AnotherP('x') & AnotherP('y') & IsP('/') \ & Letters >> 'v' & AnotherP('x') & Letters >> 'w' & IsP('/') print 'v =', subject['v'] print 'w =', subject['w'] print 'x =', subject['x'] print 'y =', subject['y'] print 'z =', subject['z'] Output is v = ee w = ac x = abc y = a z = aab Kent Apr 3 '06 #5

 P: n/a Jim Lewis wrote: Anyone have experience with string pattern matching? I need a fast way to match variables to strings. Example: string - variables ============ abcaaab - xyz abca - xy eeabcac - vxw x matches abc y matches a z matches aab w maches ac v maches ee You can do this with a regular expression, actually: testText = '/abcaaab/abca/eeabcac/' import re m = re.search(r'/(?P.+)(?P.+)(?P.+)/(?P=x)(?P=y)/(?P.+)(?P=x)(?P.+)/', testText) print m.group('v', 'w', 'x', 'y', 'z') Output is: ('ee', 'ac', 'abc', 'a', 'aab') For variety, change the matches to non-greedy (.+?) and get this result: ('ee', 'bcac', 'a', 'bca', 'aab') Kent Apr 3 '06 #6

 P: n/a Kent Johnson writes: Eddie Corns wrote: If I get time I'll try to get this working in Sam Wilmott's Python pattern matching library. What fun! Cool! I have to get in on the fun :-) This program uses Sam Wilmott's library to find one solution. It is asimple translation of your program above. I couldn't figure out how toget multiple solutions. Nice! FWIW, import string from patterns_b import * # investigate captured text and force a failure for backtracking. # A bit clumsy perhaps! passing a lambda would be more elegant. class DumpP (Pattern): def Match (self, subject): print print 'v =', subject['v'] print 'w =', subject['w'] print 'x =', subject['x'] print 'y =', subject['y'] print 'z =', subject['z'] if 1 == 2: yield None subject = MatchingInput ('/abcaaab/abca/eeabcac/') Letters = AnyOfP(string.letters)[1:] while True: subject ^ FenceP() & IsP('/') & Letters >> 'x' & Letters >> 'y' & Letters >> 'z' & IsP('/') \ & AnotherP('x') & AnotherP('y') & IsP('/') \ & Letters >> 'v' & AnotherP('x') & Letters >> 'w' & IsP('/') & DumpP() Not sure if it's supposed to exit in quite the way it does. With a more compact notation, I think this beats regexs for many uses especially when dealing with end users. Note: if anyone else plays with this interesting pattern library, note there are a couple of small bugs in it. Eddie Apr 3 '06 #7

 P: n/a Thanks to all for the various good approaches. Kent's plain RE approach seems the most straightforward - did not know that RE can handle this situation - good to know! Apr 5 '06 #8

 P: n/a ji******@emachineshop.com wrote: Thanks to all for the various good approaches. Kent's plain RE approach seems the most straightforward - did not know that RE can handle this situation - good to know! Thanks to Eddie Corns also who showed how to express the problem as a parsing problem. I am also trying a pyparsing solution but I don't see a way to repeat a previous element in pyparsing. Paul McGuire, are you listening? Kent Apr 5 '06 #9

 P: n/a >You can do this with a regular expression... I tried the plain RE approach and found it no faster than my direct-coded version. Anyone have any ideas on how to code this problem elegantly without RE? My code is long and cumbersome - 200 lines! Speed is my primary concern but low LOC would be nice of course. The sezman approach above seems a bit complex. Apr 16 '06 #10

### This discussion thread is closed

Replies have been disabled for this discussion. 