473,383 Members | 1,877 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

regular expression reverse match?


Is it possible to match a string to regular expression pattern instead
of the other way around?

For example, instead of finding a match within a string, I want to
find out, (pass or fail), if a string is a partial match to an re.

Given an re of 'abcd and a bunch of other stuff'

This is what i'm looking for:

string / result
'a' / pass
'ab' / pass
'abc' / pass
'abd' / fail
'aaaa' / fail
'abcd and a bunch of other stuff and then some' / fail

Is there a way to form a regular expression that will do this?

I'm hoping to use more complex regular expressions than this. But
the left to right precedence will still be the same.

_Ron Adam


Jul 18 '05 #1
9 10319

"Ron Adam" <ra****@tampabay.rr.com> wrote in message
news:fq********************************@4ax.com...

Is it possible to match a string to regular expression pattern instead
of the other way around?
You can abuse the implmentation details to discover the original search
string.

For example, instead of finding a match within a string, I want to
find out, (pass or fail), if a string is a partial match to an re.

Given an re of 'abcd and a bunch of other stuff'
I'll assume you mean comething like:

x = re.compile('abcd and a bunch of other stuff')


This is what i'm looking for:

string / result
'a' / pass
'ab' / pass
'abc' / pass
'abd' / fail
'aaaa' / fail
'abcd and a bunch of other stuff and then some' / fail

Is there a way to form a regular expression that will do this?


for k,v in re._cache.items():
if v == x:
ss=k[0]
break

Then it's a normal:
re.match('a',ss) <_sre.SRE_Match object at 0x009828E0> re.match('abd',ss)

Not sure that's what you're looking for, but reasonably sure it won't work
in all cases.

HTH,

Emile van Sebille
em***@fenx.com
Jul 18 '05 #2
Ron Adam wrote:
Is it possible to match a string to regular expression pattern instead
of the other way around?

For example, instead of finding a match within a string, I want to
find out, (pass or fail), if a string is a partial match to an re.

Given an re of 'abcd and a bunch of other stuff'

This is what i'm looking for:

string / result
'a' / pass
'ab' / pass
'abc' / pass
'abd' / fail
'aaaa' / fail
'abcd and a bunch of other stuff and then some' / fail


How about:
matcher = "abcd and a bunch of other stuff"
phrases = ["a", "ab", "abc", "abd", "aaaa", "abcd and a bunch of other stuff and then some"] for phrase in phrases:

.... if phrase == matcher[:len(phrase)]: print "pass"
.... else: print "fail"
....
pass
pass
pass
fail
fail
fail
Jay
Jul 18 '05 #3
On Tue, 28 Oct 2003 20:09:38 -0800, "Emile van Sebille"
<em***@fenx.com> wrote:

"Ron Adam" <ra****@tampabay.rr.com> wrote in message
news:fq********************************@4ax.com.. .

Is it possible to match a string to regular expression pattern instead
of the other way around?
You can abuse the implmentation details to discover the original search
string.

I already know the search string... Or will once I understand the how
to form them to do what I want. <hopefully> What I don't know is
the completed string that will match to it beforehand. Or at least
that's the idea.

I'm writing an interactive keyboard input routine and want to use re
to specify the allowed input. Some things are easy like only allowing
digits or characters.

So far I can check for %i and %f first to specify simple ints and
floats. A test cast operation with an exception handles those
perfectly.

As it is, the program checks the input buffer after each keystroke and
determines if the buffer is acceptable against a pattern as the string
is being built. A reverse match. It allows new keystrokes to be
added to the buffer as long as it's within the pattern.

I was hoping to use regular expression patterns as a function argument
to specify a wide range of input patterns.

I'm still a little new to Python and a lot new to the regular
expressions although I've used variations of them before.
So how do I say.... accept only 1 character out of set [YyNn] but
only one character and do not match Ye, yy nn etc...

.... accept only 10 of any type characters but not 11

.... accept only the letters of "a particular string" sequentially and
then no more.

.... accept a number in the form of "nnn-nnn-nnn" sequentially with
required dashes.

..... accept any sequence of characters and numbers as long as they
contain at least 1 of each type, cap letter, small letter, and digit
and be a minimum of 6 characters long. ie.... a password check.

It's probably easy to do all of these given a completed string first.

This started out as a python learning project.. <grin> but it's sort
turned into a real interesting coding situation.


Thanks for the reply.

_Ron Adam


For example, instead of finding a match within a string, I want to
find out, (pass or fail), if a string is a partial match to an re.

Given an re of 'abcd and a bunch of other stuff'


I'll assume you mean comething like:

x = re.compile('abcd and a bunch of other stuff')


This is what i'm looking for:

string / result
'a' / pass
'ab' / pass
'abc' / pass
'abd' / fail
'aaaa' / fail
'abcd and a bunch of other stuff and then some' / fail

Is there a way to form a regular expression that will do this?


for k,v in re._cache.items():
if v == x:
ss=k[0]
break

Then it's a normal:
re.match('a',ss)<_sre.SRE_Match object at 0x009828E0> re.match('abd',ss)


I'll give it a try... not sure either. Like I said, this is kind of
new to me. :)


Not sure that's what you're looking for, but reasonably sure it won't work
in all cases.

HTH,

Emile van Sebille
em***@fenx.com


Jul 18 '05 #4
On Tue, 28 Oct 2003 21:19:56 -0600, Jay Dorsey <ja*@jaydorsey.com>
wrote:

How about:
matcher = "abcd and a bunch of other stuff"
phrases = ["a", "ab", "abc", "abd", "aaaa", "abcd and a bunch ofother stuff and then some"] for phrase in phrases:

... if phrase == matcher[:len(phrase)]: print "pass"
... else: print "fail"
...
pass
pass
pass
fail
fail
fail
Jay


Hi Jay, That was an overly simple example I gave I think. The
'pattern' will in most case not be just a simple string. And I'll
only be testing on one string at a time as it forms.

It's an interactive input routine. So as the person types, I want to
test the buffer and accept or reject each key according to the
pattern. Pass or fail.
This method would work though if it's possible to expand a regular
expression out to a string of single matching re characters I think.
Then it would be a matter of doing re.match('ibuffer',
pattern[:len(ibuffer)].

I just don't know enough yet, but am learning quickly though,
thanks.

_Ron

Jul 18 '05 #5
On Tue, 28 Oct 2003 20:09:38 -0800, "Emile van Sebille"
<em***@fenx.com> wrote:
I'll assume you mean comething like:

x = re.compile('abcd and a bunch of other stuff')

This is what i'm looking for:

string / result
'a' / pass
'ab' / pass
'abc' / pass
'abd' / fail
'aaaa' / fail
'abcd and a bunch of other stuff and then some' / fail

Is there a way to form a regular expression that will do this?


for k,v in re._cache.items():
if v == x:
ss=k[0]
break

Then it's a normal:
re.match('a',ss)<_sre.SRE_Match object at 0x009828E0> re.match('abd',ss)

Hi again Emile,

I tried it and get an error.
Here's my result. Am I missing something?

import re
x = re.compile('abcd and a bunch of other stuff')
for k,v in re._cache.items(): if v==x:
ss=k[0]
break
Traceback (most recent call last):
File "<pyshell#21>", line 1, in ?
for k,v in re._cache.items():
AttributeError: 'module' object has no attribute '_cache'

With a trial and error method, (works for me eventually), this is what
I've been able to get to work so far. I found the '$' is what I
needed to limit the buffer length to the pattern.

e = kb_input('Enter "Y" or "N" please: ', '[YyNn]$')
f = kb_input('Enter "yes" please: ', 'y$|ye$|yes$')
g = kb_input('Enter "yes" or "no": ', '(y$|ye$|yes$)|(n$|no$)')
h = kb_input('Enter a 5 digit number:','\d$|\d{2}$\d{3}$\d{4}$\d{5}$')

New problem: Is there a way to expand an re from:

'yes$' to 'y$|ye$|yes$'

and

'(yes$)|(no$)' to '(y$|ye$|yes$)|(n$|no$)'

and

'\d{30}$' to '\d{1}$|\d{2}$|\d{3}$|\d{4}$|\d{5}$| .......'
Other expressions that I might use would be:

'\d{3}-\d{3}-d\{4}$' to match a phone number

or '\c{40}' to specify a character string 40 characters long.

but if I have to manually expend these to the above formats they can
get pretty long.

''abcd and a bunch of other stuff' becomes...
'a$|ab$|abc$|abcd$|abcd $|abcd a$|abc... etc... etc... ....stuff$'

Well at least I know what the re's look like now. Time to sleep on it
and see what tomorrow brings.

_Ron
Jul 18 '05 #6

"Ron Adam" <ra****@tampabay.rr.com> wrote in message
news:1j********************************@4ax.com...
On Tue, 28 Oct 2003 20:09:38 -0800, "Emile van Sebille"
<em***@fenx.com> wrote:
I'll assume you mean comething like:

x = re.compile('abcd and a bunch of other stuff')

This is what i'm looking for:

string / result
'a' / pass
'ab' / pass
'abc' / pass
'abd' / fail
'aaaa' / fail
'abcd and a bunch of other stuff and then some' / fail

Is there a way to form a regular expression that will do this?
for k,v in re._cache.items():
if v == x:
ss=k[0]
break

Then it's a normal:
> re.match('a',ss)

<_sre.SRE_Match object at 0x009828E0>
> re.match('abd',ss)
>

Hi again Emile,

I tried it and get an error.
Here's my result. Am I missing something?


See, I told you it wouldn't work ;-) I abused an implementation detail
that apparently doesn't exist on the version you've got.

[snip]
New problem: Is there a way to expand an re from:

'yes$' to 'y$|ye$|yes$'

and

'(yes$)|(no$)' to '(y$|ye$|yes$)|(n$|no$)'

and

'\d{30}$' to '\d{1}$|\d{2}$|\d{3}$|\d{4}$|\d{5}$| .......'
Other expressions that I might use would be:

'\d{3}-\d{3}-d\{4}$' to match a phone number


You'll probably want to build a validation routine. Here's one quick idea:

import re

def oksofar(pattern, test, sample):
teststr = "".join([test,sample[len(test):]])
return not not re.match(pattern, teststr)

for p,t,s in [
(r'^(\d{5})$', '123456', '56789'),
(r'^([A-Z]{2}\d(2))$', 'AB4x', 'XY12'),
(r'^(\d{5})-(\d{4})$', '55555-1234', '55555-1212')
]:
print p,t,s,not not re.match(p, s)
for ii in range(len(t)):
print ii,t[:ii+1], oksofar(p, t[:ii+1], s)
HTH,

Emile van Sebille
em***@fenx.com
Jul 18 '05 #7
Hi,
As it is, the program checks the input buffer after each keystroke and
determines if the buffer is acceptable against a pattern as the string
is being built. A reverse match. It allows new keystrokes to be
added to the buffer as long as it's within the pattern.


I don't think its possible with regular expressions, or at least the way you
are using them right now, that is to come from a "in the end, I want to
look it like this, but it shall accept everything that prefixes a valid
result".

That won't work - the reason is simply that regular expressions are
equivalent to finite state automata. I don't know if you are familiar with
these, but the consist of states, which are nodes, and transitions between
these, which are edges/arrows between the states.

Now ususally there is one special starting state S, and there should be at
least on reachable end-state. As the names suggest, recoginizing a
specified String starts at S, and then every read character advances the
internal state until one of the end-states is reached _and_ there is no
more input.

Now lets see at a simple example:

"abc"

Here every character becomes a state, and the automata looks like this:

S-a->(a)-b->(b)-c->[c]

The -*-> means that this transition is taken when * is matched by the
current input. So -a-> is taken when an "a" is input.

[c] is an end-state.

Now what you want is, that _all_ states that are legal are also end-states.
That is very easy to accomplish when you implement the automata directly
like this:

S-a->[a]-b->[b]-c->[c]

However, its way more complicated to create this as a rex:

"a(b(c)?)?"

This looks straightforward, so you might be successful to create rexes like
this using a preprocessing step - but the more complicated your rex gets,
this approach will be hard to follow. Actually, I have currently no idea.
Which doesn't mean its not doable, I just don't have enough time to think
about it right now :)

It looks to me, that if you need this feature not on a per-case base where
you can think about your rexes more thoroughly, you have to sort of roll
out your own rex-implmenetation. Which isn't too hard, but definitely
something thats more than just a couple of lines.

Regards,

Diez
Jul 18 '05 #8
On Wed, 29 Oct 2003 00:48:46 -0800, "Emile van Sebille"
<em***@fenx.com> wrote:
[snip]

See, I told you it wouldn't work ;-) I abused an implementation detail
that apparently doesn't exist on the version you've got.

It's not the first time something doesn't work on a winxp system. :-)
[snip]

You'll probably want to build a validation routine. Here's one quick idea:

import re

def oksofar(pattern, test, sample):
teststr = "".join([test,sample[len(test):]])
return not not re.match(pattern, teststr)

for p,t,s in [
(r'^(\d{5})$', '123456', '56789'),
(r'^([A-Z]{2}\d(2))$', 'AB4x', 'XY12'),
(r'^(\d{5})-(\d{4})$', '55555-1234', '55555-1212')
]:
print p,t,s,not not re.match(p, s)
for ii in range(len(t)):
print ii,t[:ii+1], oksofar(p, t[:ii+1], s)
HTH,

Emile van Sebille
em***@fenx.com

Ok, it took me a while to see what you did. Using a sample to
complete the partial input is a good Idea, Thanks! :-)

This works in the case of regular expressions that are linear and only
have one possible path.

Doesn't work with expressions that have more than one possible path,
such as r'yes$|no$' . I think that's what Diez was trying to explain
to me. That this could become rather complex when grouping and
alternate cases are present.

It seems there are three possible routs to take on this.

1. Process the input so it will match the re. This will require
generating a set of samples, 1 for each re branch. Then matching with
each sample. Using the method above.

So now the problem becomes.... <starting to feel as though I fell in
a rabbit hole> .... is there a way to generate samples where there
is one of each for each re branch?

2. Process the re so it will match a subset of all possible final end
points. ie... a less than operation.

if this were a math problem it might look something like this.

string <= yes$ | no$
string = (y | ye | yes)$ | (n | no)$
string = (y$ | ye$ | yes$) | (n$ | no$)
string = y$ | ye$ | yes$ | n$ | no$

I was hoping there was a way to do this already. Since there isn't,
it would require building a parser to convert strings of 'abc' to
'a|ab|abc' and handle distributed operations with ^,$ and other
special characters. And there are probably exceptions that will
complicate the process. :-/

This is method is probably only worth doing if it had wider
applications. <shrug> I don't know enough (yet) to know.
3. Process both the input and the re.

1. string <= yes$|no$

2. string <= yes$
string <= no$

3. generate a sample list, 1 sample for each re
4. test the input using each sample for a possible match

This might be doable.... will have to think on it and do a little
experimenting.


Jul 18 '05 #9
On Wed, 29 Oct 2003 11:32:45 +0100, "Diez B. Roggisch"
<no**********@web.de> wrote:
Hi,
As it is, the program checks the input buffer after each keystroke and
determines if the buffer is acceptable against a pattern as the string
is being built. A reverse match. It allows new keystrokes to be
added to the buffer as long as it's within the pattern.
I don't think its possible with regular expressions, or at least the way you
are using them right now, that is to come from a "in the end, I want to
look it like this, but it shall accept everything that prefixes a valid
result".

That won't work - the reason is simply that regular expressions are
equivalent to finite state automata. I don't know if you are familiar with
these, but the consist of states, which are nodes, and transitions between
these, which are edges/arrows between the states.


I'm not familiar with the terms, but I am familiar with tree data
structures.
[clipped good explanation]

However, its way more complicated to create this as a rex:

"a(b(c)?)?"

This looks straightforward, so you might be successful to create rexes like
this using a preprocessing step - but the more complicated your rex gets,
this approach will be hard to follow. Actually, I have currently no idea.
Which doesn't mean its not doable, I just don't have enough time to think
about it right now :)
I'm still definitely learning this, and appreciate the help.

In reply to Emily, I compared rex to simplifying a math problem like
this. I'm hoping this will lead to a method that will work.
if this were a math problem it might look something like this.

string <= yes$ | no$
string = (y | ye | yes)$ | (n | no)$
string = (y$ | ye$ | yes$) | (n$ | no$)
string = y$ | ye$ | yes$ | n$ | no$
Using the same approach for the example above might be something like:

s <= a( b (c)? )?
s <= a( b | bc )?
s <= a | ab | abc
s = a | (a | ab) | ( a | ab | abc)
s = a | a | ab | a | ab | abc
s = a | ab | abc

I have no idea if what I'm doing here is actually valid. I'm sort of
thinking it out and learning as I go. Are there rules and methods to
manipulating regular expressions in this manner, rex algebra? Or is
this a subset of set theory? I think my statistics instructor did
something similar to this. It was a few years ago.

It looks to me, that if you need this feature not on a per-case base where
you can think about your rexes more thoroughly, you have to sort of roll
out your own rex-implmenetation. Which isn't too hard, but definitely
something thats more than just a couple of lines.

Regards,

Diez


Looks like I only <obvious understatement> need to create a few
functions to manipulate rex expressions. Simpler than a full
rex-emplementations, but still definitely more than a few lines of
code.

_Ron Adam

Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
11
by: Dimitris Georgakopuolos | last post by:
Hello, I have a text file that I load up to a string. The text includes certain expression like {firstName} or {userName} that I want to match and then replace with a new expression. However,...
3
by: Joe | last post by:
Hi, I have been using a regular expression that I don’t uite understand to filter the valid email address. My regular expression is as follows: <asp:RegularExpressionValidator...
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
3
by: Zach | last post by:
Hello, Please forgive if this is not the most appropriate newsgroup for this question. Unfortunately I didn't find a newsgroup specific to regular expressions. I have the following regular...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
5
by: shawnmkramer | last post by:
Anyone every heard of the Regex.IsMatch and Regex.Match methods just hanging and eventually getting a message "Requested Service not found"? I have the following pattern: ^(?<OrgCity>(+)+),...
1
by: NvrBst | last post by:
I want to use the .replace() method with the regular expression /^ %VAR % =,($|&)/. The following DOESN'T replace the "^default.aspx=,($|&)" regular expression with "":...
14
by: Andy B | last post by:
I need to create a regular expression that will match a 5 digit number, a space and then anything up to but not including the next closing html tag. Here is an example: <startTag>55555 any...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.