473,513 Members | 2,494 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Generating all permutations from a regexp

With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.

The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".

Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?

--
mvh Björn
Dec 22 '06 #1
9 6836
BJörn Lindqvist <bj*****@gmail.comwrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.

The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".

Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?
A regular expression matcher uses a state machine to match strings.

What you want to do is do a traversal through that state machine
generating all possible matches at each point.

Luckily the python regexp matcher is written in python, so you have
access to the state machine directly if you want.

Take a look at sre*.py in the python library and you might be able to
work out what to do! I had a brief look myself, and it
looked... complicated!

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Dec 22 '06 #2
Nick Craig-Wood wrote:
A regular expression matcher uses a state machine to match strings.
unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.

</F>

Dec 22 '06 #3

On Dec 22, 8:30 am, "BJörn Lindqvist" <bjou...@gmail.comwrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.

The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".

Is this possible to do?
Here is a first cut at your problem
(http://pyparsing-public.wikispaces.c...e/invRegex.py).
I used pyparsing to identify repeatable ranges within a regex, then
attached generator-generating classes to parse actions for each type of
regex element. Some limitations:
- unbounded '*' and '+' repetition is not allowed
- only supports \d, \w, and \s macros

Here are the test cases in the program that generate the expected list
of permutations:
[A-E]
[A-D]{3}
X[A-C]{3}Y
X[A-C]{3}\(
X\d
[A-C]\s[A-C]\s[A-C]
[A-C]\s?[A-C][A-C]
[A-C]\s([A-C][A-C])
[A-C]\s([A-C][A-C])?
[A-C]{2}\d{2}

Dec 23 '06 #4
Fredrik Lundh <fr*****@pythonware.comwrote:
Nick Craig-Wood wrote:
A regular expression matcher uses a state machine to match strings.

unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.
Ah! Well that is outside of my experience then... I've only really
studied the matchers produced by flex before. Apologies for the
mis-information!

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Dec 23 '06 #5

BJörn Lindqvist wrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.

The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".

Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?
For a very small number of characters, it would be feasible. For any
finite number of characters, it would be possible (though it wouldn't
take much to take longer than the age of the universe). For reference,
in your simple example, you have 17,576,000 matching strings.

I'm curious as to why you would wish to do this. I certainly understand
considering hard problems for their own sake, but when I formulate
them, there's always some impetus that makes me say "Huh. Now I
wonder..."

Dec 23 '06 #6
Fredrik Lundh wrote:
Nick Craig-Wood wrote:
>A regular expression matcher uses a state machine to match strings.

unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.

</F>
How is the matching engine implemented then?

I thought regular languages can be described by deterministic /
non-deterministic finite state machines. I am just curious ...

Thomas
Dec 23 '06 #7

Thomas Ploch wrote:
Fredrik Lundh wrote:
Nick Craig-Wood wrote:
A regular expression matcher uses a state machine to match strings.
unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.

</F>

How is the matching engine implemented then?

I thought regular languages can be described by deterministic /
non-deterministic finite state machines. I am just curious ...
Regular expressions are described by N?DFAs, but Perl-compatible
regular expressions (which pretty much every language implements now)
are not true regular expressions. They are actually Turing complete,
and thus have features that cannot be implemented in N?DFAs.

Dec 23 '06 #8
"Paul McGuire" <pt***@austin.rr.comwrote in message
news:11*********************@79g2000cws.googlegrou ps.com...

On Dec 22, 8:30 am, "BJörn Lindqvist" <bjou...@gmail.comwrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.

The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".

Is this possible to do?
Here is a first cut at your problem
(http://pyparsing-public.wikispaces.c...e/invRegex.py).
I used pyparsing to identify repeatable ranges within a regex, then
attached generator-generating classes to parse actions for each type of
regex element. Some limitations:
- unbounded '*' and '+' repetition is not allowed
- only supports \d, \w, and \s macros

=====================

Download the latest version of this file. It is now importable as its own
module, with the invert method that takes a regexp string and returns a
generator that yields all the possible matching strings. This file also
includes a simple count method, which returns the number of elements
returned by a generator (as opposed to calling len(list(invert("..."))),
which generates an intermediate list just to invoke len on it).

The reg exp features that have been added are:
- alternation using '|'
- min-max repetition using {min,max} format
- '.' wildcard character

Also fixed some repetition bugs, where "foobar{2}" was treated like
"(foobar){2}" - now both cases are handled correctly.

-- Paul

Dec 26 '06 #9
On 23 Dec 2006 04:23:09 -0800, Chris Johnson <ef******@gmail.comwrote:
>
BJörn Lindqvist wrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.

The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".

Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?

For a very small number of characters, it would be feasible. For any
finite number of characters, it would be possible (though it wouldn't
take much to take longer than the age of the universe). For reference,
in your simple example, you have 17,576,000 matching strings.

I'm curious as to why you would wish to do this. I certainly understand
considering hard problems for their own sake, but when I formulate
them, there's always some impetus that makes me say "Huh. Now I
wonder..."
I have a thousand use cases in mind. For example:

1. Generate sentences for a bot:

ipermute("(I|You|He|She|It|They) do( not)? (dis)?like (you|him|her|me|they)"):

Generates many grammatically incorrect sentences but you get the point.

2. Generate URL:s to web resources:

The following should generate URL:s to all c.l.p digests from the mail archive:

def download_clp():
year_re = "(199\d|200[0-6])"
months = ["January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December"]
month_re = '(' + '|'.join(months) + ')'
fmt = "http://mail\.python\.org/pipermail/python-list/%s-%s\.txt"
url_re = fmt % (year_re, month_re)
for x in ipermute(url_re):
print "Downloading", x
code to download here

The same approach could be used to download all threads in a forum for
example, or all articles on Slashdot.

3. "Visualising" regular expressions. I think it would be easier to
write regular expressions if you could see what kind of data they
would match.

4. Port scanning:

ip_tuple = "(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])"
for x in ipermute("192\.168\." + "\.".join([ip_tuple] * 2)):
scan_ip(x)

--
mvh Björn
Dec 26 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
5652
by: Steve Goldman | last post by:
Hi, I am trying to come up with a way to develop all n-length permutations of a given list of values. The short function below seems to work, but I can't help thinking there's a better way. ...
20
2259
by: John Trunek | last post by:
I have a set of X items, but want permutations of length Y (X > Y). I am aware of the permutation functions in <algorithm>, but I don't believe this will do what I want. Is there a way, either...
2
2834
by: Alex Vinokur | last post by:
Does STL contain algorithms which generate (enable to generate) exponents, permutations, arrangements, and combinations for any numbers and words? -- Alex Vinokur mailto:alexvn@connect.to...
0
1227
by: Alex Vinokur | last post by:
C++ program for generating combinatorial objects (exponents, permutations, arrangements, combinations for any numbers and words) can be downloaded at...
8
5027
by: girish | last post by:
Hi, I want to generate all non-empty substrings of a string of length >=2. Also, each substring is to be paired with 'string - substring' part and vice versa. Thus, gives me , , , , , ] etc....
8
5622
by: Mir Nazim | last post by:
Hello, I need to write scripts in which I need to generate all posible unique combinations of an integer list. Lists are a minimum 12 elements in size with very large number of possible...
6
3419
by: py_genetic | last post by:
Hi, I'm looking to generate x alphabetic strings in a list size x. This is exactly the same output that the unix command "split" generates as default file name output when splitting large...
0
1351
by: jonKushner | last post by:
id like to create a script that runs through a form, and autogenerates every single permutation for every single value, as well as option. The task is because I am currently working on a webservice...
14
1988
by: bjorklund.emil | last post by:
Hello pythonistas. I'm a newbie to pretty much both programming and Python. I have a task that involves writing a test script for every possible combination of preference settings for a software...
0
7260
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7384
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
7101
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7527
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
5090
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4746
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3234
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
1
803
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
456
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.