Connecting Tech Pros Worldwide Help | Site Map

String pattern matching

Jim Lewis
Guest
 
Posts: n/a
#1: Apr 1 '06
Anyone have experience with string pattern matching?
I need a fast way to match variables to strings. Example:

string - variables
============
abcaaab - xyz
abca - xy
eeabcac - vxw

x matches abc
y matches a
z matches aab
w maches ac
v maches ee

forward@seznam.cz
Guest
 
Posts: n/a
#2: Apr 1 '06

re: String pattern matching


Firstly sort variable expressions by its length

xy = 'abca'
xyz = 'abcaaab'
vxw = 'eeabcac'

Expand all expressions by its values except itself

xy = 'abca'
'abca' z = 'abcaaab'
vxw = 'eeabcac'

Cut all left and right matches

xy = 'abca'
z = 'aab'
vxw = 'eeabcac'

Repeat until you can.

z = 'aab'
xy = 'abca'
vxw = 'eeabcac'

Get first variable expression intersection
- variables: x is longest intersection of xy and vxw
- value : 'abca' starting at position 2 of vxw (longest intersection
of 'abca' and 'eeabcac')
'a' starting at position 5 of vxw

Then try matching:
x='abca' (at position 2 of vxw)
x='abc' (at position 2 of vxw)
x='ab' (at position 2 of vxw)
x='a' (at position 2 of vxw)
x='a' (at position 5 of vxw)
x='' (at position arbitrary position of vxw)
and calculate the others.

Repeat all step until you can.


In your example, there are 13 results.
x='abca' y='' z='aab' v='ee' w='c'
x='abc' y='a' z='aab' v='ee' w='ac'
x='ab' y='ca' z='aab' v='ee' w='cac'
x='a' y='bca' z='aab' v='ee' w='bcac'
x='a' y='bca' z='aab' v='eeabc' w='c'
x='' y='abca' z='aab' v='' w=''
x='' y='abca' z='aab' v='e' w='eabcac'
x='' y='abca' z='aab' v='ee' w='abcac'
x='' y='abca' z='aab' v='eea' w='bcac'
x='' y='abca' z='aab' v='eeab' w='cac'
x='' y='abca' z='aab' v='eeabc' w='ac'
x='' y='abca' z='aab' v='eeabca' w='c'
x='' y='abca' z='aab' v='eeabcac' w=''

with the same original expressions

xy = 'abca'
xyz = 'abcaaab'
vxw = 'eeabcac'

Note that maximum matching is best matching in some kind of problems.

Cutting off empty values, you can get 4 results:
x='abc' y='a' z='aab' v='ee' w='ac'
x='ab' y='ca' z='aab' v='ee' w='cac'
x='a' y='bca' z='aab' v='ee' w='bcac'
x='a' y='bca' z='aab' v='eeabc' w='c'

Jim Lewis
Guest
 
Posts: n/a
#3: Apr 2 '06

re: String pattern matching


Thanks for the interesting and detailed analysis. In my case I don't
need all possible answers by rather the first "greedy" match. Seems
like there might be some recursive approach.

Eddie Corns
Guest
 
Posts: n/a
#4: Apr 3 '06

re: String pattern matching


"Jim Lewis" <jimlewis@miclog.com> writes:
[color=blue]
>Anyone have experience with string pattern matching?
>I need a fast way to match variables to strings. Example:[/color]
[color=blue]
>string - variables
>============
>abcaaab - xyz
>abca - xy
>eeabcac - vxw[/color]
[color=blue]
>x matches abc
>y matches a
>z matches aab
>w maches ac
>v maches ee[/color]

Off topic I know but I've been learning snobol pattern matching recently so I
thought I'd give this one a bash. Here is my effort:

define('foo(str)')
&fullscan = 1
'/abcaaab/abca/eeabcac/' '/' arb $ x arb $ y arb $ z '/' *x *y '/'
+ arb $ v *x arb $ w '/' *foo(v '/' w '/' x '/' y '/' z) :F(end)
foo output = str :(FRETURN)

end

Good eh? Here's the output:

/eeabcac//abca/aab
e/eabcac//abca/aab
ee/abcac//abca/aab
eea/bcac//abca/aab
eeab/cac//abca/aab
eeabc/ac//abca/aab
eeabca/c//abca/aab
eeabcac///abca/aab
ee/bcac/a/bca/aab
eeabc/c/a/bca/aab
ee/cac/ab/ca/aab
ee/ac/abc/a/aab
ee/c/abca//aab

Removing empty entries is quite easy:

'/abcaaab/abca/eeabcac/' '/' arb $ x arb $ y arb $ z '/' *x *y '/'
+ arb $ v *x arb $ w '/' *differ(v) *differ(w) *differ(x)
+ *differ(y) *differ(z) *foo(v '/' w '/' x '/' y '/' z) :F(end)

generates:

ee/bcac/a/bca/aab
eeabc/c/a/bca/aab
ee/cac/ab/ca/aab
ee/ac/abc/a/aab

If I get time I'll try to get this working in Sam Wilmott's Python pattern
matching library.

What fun!

Eddie
Kent Johnson
Guest
 
Posts: n/a
#5: Apr 3 '06

re: String pattern matching


Eddie Corns wrote:[color=blue]
> Off topic I know but I've been learning snobol pattern matching recently so I
> thought I'd give this one a bash. Here is my effort:
>
> define('foo(str)')
> &fullscan = 1
> '/abcaaab/abca/eeabcac/' '/' arb $ x arb $ y arb $ z '/' *x *y '/'
> + arb $ v *x arb $ w '/' *foo(v '/' w '/' x '/' y '/' z) :F(end)
> foo output = str :(FRETURN)
>
> end
>[/color]
[color=blue]
> If I get time I'll try to get this working in Sam Wilmott's Python pattern
> matching library.
>
> What fun![/color]

Cool! I have to get in on the fun :-)

This program uses Sam Wilmott's library to find one solution. It is a
simple translation of your program above. I couldn't figure out how to
get multiple solutions.
http://www.wilmott.ca/python/patternmatching.html

import string
from patterns_b import *

subject = MatchingInput ('/abcaaab/abca/eeabcac/')
Letters = AnyOfP(string.letters)[1:]

subject ^ IsP('/') & Letters >> 'x' & Letters >> 'y' & Letters >> 'z' &
IsP('/') \
& AnotherP('x') & AnotherP('y') & IsP('/') \
& Letters >> 'v' & AnotherP('x') & Letters >> 'w' & IsP('/')

print 'v =', subject['v']
print 'w =', subject['w']
print 'x =', subject['x']
print 'y =', subject['y']
print 'z =', subject['z']


Output is
v = ee
w = ac
x = abc
y = a
z = aab


Kent
Kent Johnson
Guest
 
Posts: n/a
#6: Apr 3 '06

re: String pattern matching


Jim Lewis wrote:[color=blue]
> Anyone have experience with string pattern matching?
> I need a fast way to match variables to strings. Example:
>
> string - variables
> ============
> abcaaab - xyz
> abca - xy
> eeabcac - vxw
>
> x matches abc
> y matches a
> z matches aab
> w maches ac
> v maches ee
>[/color]

You can do this with a regular expression, actually:

testText = '/abcaaab/abca/eeabcac/'
import re

m =
re.search(r'/(?P<x>.+)(?P<y>.+)(?P<z>.+)/(?P=x)(?P=y)/(?P<v>.+)(?P=x)(?P<w>.+)/',
testText)
print m.group('v', 'w', 'x', 'y', 'z')


Output is:
('ee', 'ac', 'abc', 'a', 'aab')

For variety, change the matches to non-greedy (.+?) and get this result:
('ee', 'bcac', 'a', 'bca', 'aab')


Kent
Eddie Corns
Guest
 
Posts: n/a
#7: Apr 3 '06

re: String pattern matching


Kent Johnson <kent@kentsjohnson.com> writes:
[color=blue]
>Eddie Corns wrote:[/color]
[color=blue][color=green]
>> If I get time I'll try to get this working in Sam Wilmott's Python pattern
>> matching library.
>>
>> What fun![/color][/color]
[color=blue]
>Cool! I have to get in on the fun :-)[/color]
[color=blue]
>This program uses Sam Wilmott's library to find one solution. It is a
>simple translation of your program above. I couldn't figure out how to
>get multiple solutions.[/color]

Nice!

FWIW,
import string
from patterns_b import *

# investigate captured text and force a failure for backtracking.
# A bit clumsy perhaps! passing a lambda would be more elegant.
class DumpP (Pattern):
def Match (self, subject):
print
print 'v =', subject['v']
print 'w =', subject['w']
print 'x =', subject['x']
print 'y =', subject['y']
print 'z =', subject['z']
if 1 == 2: yield None

subject = MatchingInput ('/abcaaab/abca/eeabcac/')
Letters = AnyOfP(string.letters)[1:]

while True:
subject ^ FenceP() & IsP('/') & Letters >> 'x' & Letters >> 'y' & Letters >> 'z' & IsP('/') \
& AnotherP('x') & AnotherP('y') & IsP('/') \
& Letters >> 'v' & AnotherP('x') & Letters >> 'w' & IsP('/') & DumpP()


Not sure if it's supposed to exit in quite the way it does.

With a more compact notation, I think this beats regexs for many uses
especially when dealing with end users.

Note: if anyone else plays with this interesting pattern library, note there
are a couple of small bugs in it.

Eddie
jimlewis@emachineshop.com
Guest
 
Posts: n/a
#8: Apr 5 '06

re: String pattern matching


Thanks to all for the various good approaches. Kent's plain RE approach
seems the most straightforward - did not know that RE can handle this
situation - good to know!

Kent Johnson
Guest
 
Posts: n/a
#9: Apr 5 '06

re: String pattern matching


jimlewis@emachineshop.com wrote:[color=blue]
> Thanks to all for the various good approaches. Kent's plain RE approach
> seems the most straightforward - did not know that RE can handle this
> situation - good to know![/color]

Thanks to Eddie Corns also who showed how to express the problem as a
parsing problem.

I am also trying a pyparsing solution but I don't see a way to repeat a
previous element in pyparsing. Paul McGuire, are you listening?

Kent
Jim Lewis
Guest
 
Posts: n/a
#10: Apr 16 '06

re: String pattern matching


>You can do this with a regular expression...

I tried the plain RE approach and found it no faster than my
direct-coded version. Anyone have any ideas on how to code this problem
elegantly without RE? My code is long and cumbersome - 200 lines! Speed
is my primary concern but low LOC would be nice of course. The sezman
approach above seems a bit complex.

Closed Thread