472,337 Members | 1,488 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,337 software developers and data experts.

String pattern matching

Anyone have experience with string pattern matching?
I need a fast way to match variables to strings. Example:

string - variables
============
abcaaab - xyz
abca - xy
eeabcac - vxw

x matches abc
y matches a
z matches aab
w maches ac
v maches ee

Apr 1 '06 #1
9 4944
Firstly sort variable expressions by its length

xy = 'abca'
xyz = 'abcaaab'
vxw = 'eeabcac'

Expand all expressions by its values except itself

xy = 'abca'
'abca' z = 'abcaaab'
vxw = 'eeabcac'

Cut all left and right matches

xy = 'abca'
z = 'aab'
vxw = 'eeabcac'

Repeat until you can.

z = 'aab'
xy = 'abca'
vxw = 'eeabcac'

Get first variable expression intersection
- variables: x is longest intersection of xy and vxw
- value : 'abca' starting at position 2 of vxw (longest intersection
of 'abca' and 'eeabcac')
'a' starting at position 5 of vxw

Then try matching:
x='abca' (at position 2 of vxw)
x='abc' (at position 2 of vxw)
x='ab' (at position 2 of vxw)
x='a' (at position 2 of vxw)
x='a' (at position 5 of vxw)
x='' (at position arbitrary position of vxw)
and calculate the others.

Repeat all step until you can.
In your example, there are 13 results.
x='abca' y='' z='aab' v='ee' w='c'
x='abc' y='a' z='aab' v='ee' w='ac'
x='ab' y='ca' z='aab' v='ee' w='cac'
x='a' y='bca' z='aab' v='ee' w='bcac'
x='a' y='bca' z='aab' v='eeabc' w='c'
x='' y='abca' z='aab' v='' w=''
x='' y='abca' z='aab' v='e' w='eabcac'
x='' y='abca' z='aab' v='ee' w='abcac'
x='' y='abca' z='aab' v='eea' w='bcac'
x='' y='abca' z='aab' v='eeab' w='cac'
x='' y='abca' z='aab' v='eeabc' w='ac'
x='' y='abca' z='aab' v='eeabca' w='c'
x='' y='abca' z='aab' v='eeabcac' w=''

with the same original expressions

xy = 'abca'
xyz = 'abcaaab'
vxw = 'eeabcac'

Note that maximum matching is best matching in some kind of problems.

Cutting off empty values, you can get 4 results:
x='abc' y='a' z='aab' v='ee' w='ac'
x='ab' y='ca' z='aab' v='ee' w='cac'
x='a' y='bca' z='aab' v='ee' w='bcac'
x='a' y='bca' z='aab' v='eeabc' w='c'

Apr 1 '06 #2
Thanks for the interesting and detailed analysis. In my case I don't
need all possible answers by rather the first "greedy" match. Seems
like there might be some recursive approach.

Apr 2 '06 #3
"Jim Lewis" <ji******@miclog.com> writes:
Anyone have experience with string pattern matching?
I need a fast way to match variables to strings. Example: string - variables
============
abcaaab - xyz
abca - xy
eeabcac - vxw x matches abc
y matches a
z matches aab
w maches ac
v maches ee


Off topic I know but I've been learning snobol pattern matching recently so I
thought I'd give this one a bash. Here is my effort:

define('foo(str)')
&fullscan = 1
'/abcaaab/abca/eeabcac/' '/' arb $ x arb $ y arb $ z '/' *x *y '/'
+ arb $ v *x arb $ w '/' *foo(v '/' w '/' x '/' y '/' z) :F(end)
foo output = str :(FRETURN)

end

Good eh? Here's the output:

/eeabcac//abca/aab
e/eabcac//abca/aab
ee/abcac//abca/aab
eea/bcac//abca/aab
eeab/cac//abca/aab
eeabc/ac//abca/aab
eeabca/c//abca/aab
eeabcac///abca/aab
ee/bcac/a/bca/aab
eeabc/c/a/bca/aab
ee/cac/ab/ca/aab
ee/ac/abc/a/aab
ee/c/abca//aab

Removing empty entries is quite easy:

'/abcaaab/abca/eeabcac/' '/' arb $ x arb $ y arb $ z '/' *x *y '/'
+ arb $ v *x arb $ w '/' *differ(v) *differ(w) *differ(x)
+ *differ(y) *differ(z) *foo(v '/' w '/' x '/' y '/' z) :F(end)

generates:

ee/bcac/a/bca/aab
eeabc/c/a/bca/aab
ee/cac/ab/ca/aab
ee/ac/abc/a/aab

If I get time I'll try to get this working in Sam Wilmott's Python pattern
matching library.

What fun!

Eddie
Apr 3 '06 #4
Eddie Corns wrote:
Off topic I know but I've been learning snobol pattern matching recently so I
thought I'd give this one a bash. Here is my effort:

define('foo(str)')
&fullscan = 1
'/abcaaab/abca/eeabcac/' '/' arb $ x arb $ y arb $ z '/' *x *y '/'
+ arb $ v *x arb $ w '/' *foo(v '/' w '/' x '/' y '/' z) :F(end)
foo output = str :(FRETURN)

end
If I get time I'll try to get this working in Sam Wilmott's Python pattern
matching library.

What fun!


Cool! I have to get in on the fun :-)

This program uses Sam Wilmott's library to find one solution. It is a
simple translation of your program above. I couldn't figure out how to
get multiple solutions.
http://www.wilmott.ca/python/patternmatching.html

import string
from patterns_b import *

subject = MatchingInput ('/abcaaab/abca/eeabcac/')
Letters = AnyOfP(string.letters)[1:]

subject ^ IsP('/') & Letters >> 'x' & Letters >> 'y' & Letters >> 'z' &
IsP('/') \
& AnotherP('x') & AnotherP('y') & IsP('/') \
& Letters >> 'v' & AnotherP('x') & Letters >> 'w' & IsP('/')

print 'v =', subject['v']
print 'w =', subject['w']
print 'x =', subject['x']
print 'y =', subject['y']
print 'z =', subject['z']
Output is
v = ee
w = ac
x = abc
y = a
z = aab
Kent
Apr 3 '06 #5
Jim Lewis wrote:
Anyone have experience with string pattern matching?
I need a fast way to match variables to strings. Example:

string - variables
============
abcaaab - xyz
abca - xy
eeabcac - vxw

x matches abc
y matches a
z matches aab
w maches ac
v maches ee


You can do this with a regular expression, actually:

testText = '/abcaaab/abca/eeabcac/'
import re

m =
re.search(r'/(?P<x>.+)(?P<y>.+)(?P<z>.+)/(?P=x)(?P=y)/(?P<v>.+)(?P=x)(?P<w>.+)/',
testText)
print m.group('v', 'w', 'x', 'y', 'z')
Output is:
('ee', 'ac', 'abc', 'a', 'aab')

For variety, change the matches to non-greedy (.+?) and get this result:
('ee', 'bcac', 'a', 'bca', 'aab')
Kent
Apr 3 '06 #6
Kent Johnson <ke**@kentsjohnson.com> writes:
Eddie Corns wrote:
If I get time I'll try to get this working in Sam Wilmott's Python pattern
matching library.

What fun!

Cool! I have to get in on the fun :-) This program uses Sam Wilmott's library to find one solution. It is a
simple translation of your program above. I couldn't figure out how to
get multiple solutions.


Nice!

FWIW,
import string
from patterns_b import *

# investigate captured text and force a failure for backtracking.
# A bit clumsy perhaps! passing a lambda would be more elegant.
class DumpP (Pattern):
def Match (self, subject):
print
print 'v =', subject['v']
print 'w =', subject['w']
print 'x =', subject['x']
print 'y =', subject['y']
print 'z =', subject['z']
if 1 == 2: yield None

subject = MatchingInput ('/abcaaab/abca/eeabcac/')
Letters = AnyOfP(string.letters)[1:]

while True:
subject ^ FenceP() & IsP('/') & Letters >> 'x' & Letters >> 'y' & Letters >> 'z' & IsP('/') \
& AnotherP('x') & AnotherP('y') & IsP('/') \
& Letters >> 'v' & AnotherP('x') & Letters >> 'w' & IsP('/') & DumpP()
Not sure if it's supposed to exit in quite the way it does.

With a more compact notation, I think this beats regexs for many uses
especially when dealing with end users.

Note: if anyone else plays with this interesting pattern library, note there
are a couple of small bugs in it.

Eddie
Apr 3 '06 #7
Thanks to all for the various good approaches. Kent's plain RE approach
seems the most straightforward - did not know that RE can handle this
situation - good to know!

Apr 5 '06 #8
ji******@emachineshop.com wrote:
Thanks to all for the various good approaches. Kent's plain RE approach
seems the most straightforward - did not know that RE can handle this
situation - good to know!


Thanks to Eddie Corns also who showed how to express the problem as a
parsing problem.

I am also trying a pyparsing solution but I don't see a way to repeat a
previous element in pyparsing. Paul McGuire, are you listening?

Kent
Apr 5 '06 #9
>You can do this with a regular expression...

I tried the plain RE approach and found it no faster than my
direct-coded version. Anyone have any ideas on how to code this problem
elegantly without RE? My code is long and cumbersome - 200 lines! Speed
is my primary concern but low LOC would be nice of course. The sezman
approach above seems a bit complex.

Apr 16 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Thomas Mlynarczyk | last post by:
Which is the simplest way to remove all whitespace from a string? Is there a simpler method than a regex replace? Or how can I tell a regex pattern...
176
by: Thomas Reichelt | last post by:
Moin, short question: is there any language combining the syntax, flexibility and great programming experience of Python with static typing? Is...
9
by: Xah Lee | last post by:
# -*- coding: utf-8 -*- # Python # Matching string patterns # # Sometimes you want to know if a string is of # particular pattern. Let's say...
29
by: zoro | last post by:
Hi, I am new to C#, coming from Delphi. In Delphi, I am using a 3rd party string handling library that includes some very useful string functions,...
5
by: olaufr | last post by:
Hi, I'd need to perform simple pattern matching within a string using a list of possible patterns. For example, I want to know if the substring...
4
by: Kelly B | last post by:
This is a simple string matching code from K&R2 (modified to match a user entered string from a text file) I tried to code an alternative strindex...
0
NeoPa
by: NeoPa | last post by:
ANSI-89 v ANSI-92 Before we get into all the various types of pattern matching that can be used, there are two ANSI standards used for the main...
11
by: tech | last post by:
Hi, I need a function to specify a match pattern including using wildcard characters as below to find chars in a std::string. The match pattern...
1
by: Ben | last post by:
I apologize in advance for the newbie question. I'm trying to figure out a way to find all of the occurrences of a regular expression in a string...
0
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
0
by: CD Tom | last post by:
This happens in runtime 2013 and 2016. When a report is run and then closed a toolbar shows up and the only way to get it to go away is to right...
0
by: CD Tom | last post by:
This only shows up in access runtime. When a user select a report from my report menu when they close the report they get a menu I've called Add-ins...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.