473,671 Members | 2,572 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Generating all permutations from a regexp

With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d \d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.

The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".

Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?

--
mvh Björn
Dec 22 '06 #1
9 6867
BJörn Lindqvist <bj*****@gmail. comwrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d \d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.

The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".

Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?
A regular expression matcher uses a state machine to match strings.

What you want to do is do a traversal through that state machine
generating all possible matches at each point.

Luckily the python regexp matcher is written in python, so you have
access to the state machine directly if you want.

Take a look at sre*.py in the python library and you might be able to
work out what to do! I had a brief look myself, and it
looked... complicated!

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Dec 22 '06 #2
Nick Craig-Wood wrote:
A regular expression matcher uses a state machine to match strings.
unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.

</F>

Dec 22 '06 #3

On Dec 22, 8:30 am, "BJörn Lindqvist" <bjou...@gmail. comwrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d \d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.

The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".

Is this possible to do?
Here is a first cut at your problem
(http://pyparsing-public.wikispaces.c...e/invRegex.py).
I used pyparsing to identify repeatable ranges within a regex, then
attached generator-generating classes to parse actions for each type of
regex element. Some limitations:
- unbounded '*' and '+' repetition is not allowed
- only supports \d, \w, and \s macros

Here are the test cases in the program that generate the expected list
of permutations:
[A-E]
[A-D]{3}
X[A-C]{3}Y
X[A-C]{3}\(
X\d
[A-C]\s[A-C]\s[A-C]
[A-C]\s?[A-C][A-C]
[A-C]\s([A-C][A-C])
[A-C]\s([A-C][A-C])?
[A-C]{2}\d{2}

Dec 23 '06 #4
Fredrik Lundh <fr*****@python ware.comwrote:
Nick Craig-Wood wrote:
A regular expression matcher uses a state machine to match strings.

unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.
Ah! Well that is outside of my experience then... I've only really
studied the matchers produced by flex before. Apologies for the
mis-information!

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Dec 23 '06 #5

BJörn Lindqvist wrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d \d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.

The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".

Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?
For a very small number of characters, it would be feasible. For any
finite number of characters, it would be possible (though it wouldn't
take much to take longer than the age of the universe). For reference,
in your simple example, you have 17,576,000 matching strings.

I'm curious as to why you would wish to do this. I certainly understand
considering hard problems for their own sake, but when I formulate
them, there's always some impetus that makes me say "Huh. Now I
wonder..."

Dec 23 '06 #6
Fredrik Lundh wrote:
Nick Craig-Wood wrote:
>A regular expression matcher uses a state machine to match strings.

unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.

</F>
How is the matching engine implemented then?

I thought regular languages can be described by deterministic /
non-deterministic finite state machines. I am just curious ...

Thomas
Dec 23 '06 #7

Thomas Ploch wrote:
Fredrik Lundh wrote:
Nick Craig-Wood wrote:
A regular expression matcher uses a state machine to match strings.
unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.

</F>

How is the matching engine implemented then?

I thought regular languages can be described by deterministic /
non-deterministic finite state machines. I am just curious ...
Regular expressions are described by N?DFAs, but Perl-compatible
regular expressions (which pretty much every language implements now)
are not true regular expressions. They are actually Turing complete,
and thus have features that cannot be implemented in N?DFAs.

Dec 23 '06 #8
"Paul McGuire" <pt***@austin.r r.comwrote in message
news:11******** *************@7 9g2000cws.googl egroups.com...

On Dec 22, 8:30 am, "BJörn Lindqvist" <bjou...@gmail. comwrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d \d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.

The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".

Is this possible to do?
Here is a first cut at your problem
(http://pyparsing-public.wikispaces.c...e/invRegex.py).
I used pyparsing to identify repeatable ranges within a regex, then
attached generator-generating classes to parse actions for each type of
regex element. Some limitations:
- unbounded '*' and '+' repetition is not allowed
- only supports \d, \w, and \s macros

=============== ======

Download the latest version of this file. It is now importable as its own
module, with the invert method that takes a regexp string and returns a
generator that yields all the possible matching strings. This file also
includes a simple count method, which returns the number of elements
returned by a generator (as opposed to calling len(list(invert ("..."))),
which generates an intermediate list just to invoke len on it).

The reg exp features that have been added are:
- alternation using '|'
- min-max repetition using {min,max} format
- '.' wildcard character

Also fixed some repetition bugs, where "foobar{2}" was treated like
"(foobar){2 }" - now both cases are handled correctly.

-- Paul

Dec 26 '06 #9
On 23 Dec 2006 04:23:09 -0800, Chris Johnson <ef******@gmail .comwrote:
>
BJörn Lindqvist wrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d \d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.

The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".

Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?

For a very small number of characters, it would be feasible. For any
finite number of characters, it would be possible (though it wouldn't
take much to take longer than the age of the universe). For reference,
in your simple example, you have 17,576,000 matching strings.

I'm curious as to why you would wish to do this. I certainly understand
considering hard problems for their own sake, but when I formulate
them, there's always some impetus that makes me say "Huh. Now I
wonder..."
I have a thousand use cases in mind. For example:

1. Generate sentences for a bot:

ipermute("(I|Yo u|He|She|It|The y) do( not)? (dis)?like (you|him|her|me |they)"):

Generates many grammatically incorrect sentences but you get the point.

2. Generate URL:s to web resources:

The following should generate URL:s to all c.l.p digests from the mail archive:

def download_clp():
year_re = "(199\d|200[0-6])"
months = ["January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September" ,
"October",
"November",
"December"]
month_re = '(' + '|'.join(months ) + ')'
fmt = "http://mail\.python\.o rg/pipermail/python-list/%s-%s\.txt"
url_re = fmt % (year_re, month_re)
for x in ipermute(url_re ):
print "Downloadin g", x
code to download here

The same approach could be used to download all threads in a forum for
example, or all articles on Slashdot.

3. "Visualisin g" regular expressions. I think it would be easier to
write regular expressions if you could see what kind of data they
would match.

4. Port scanning:

ip_tuple = "(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])"
for x in ipermute("192\. 168\." + "\.".join([ip_tuple] * 2)):
scan_ip(x)

--
mvh Björn
Dec 26 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
5664
by: Steve Goldman | last post by:
Hi, I am trying to come up with a way to develop all n-length permutations of a given list of values. The short function below seems to work, but I can't help thinking there's a better way. Not being a computer scientist, I find recursive functions to be frightening and unnatural. I'd appreciate if anyone can tell me the pythonic idiom to accomplish this. Thanks for your help,
20
2285
by: John Trunek | last post by:
I have a set of X items, but want permutations of length Y (X > Y). I am aware of the permutation functions in <algorithm>, but I don't believe this will do what I want. Is there a way, either through the STL or some other library to do this, or do I need to write my own code?
2
2844
by: Alex Vinokur | last post by:
Does STL contain algorithms which generate (enable to generate) exponents, permutations, arrangements, and combinations for any numbers and words? -- Alex Vinokur mailto:alexvn@connect.to http://mathforum.org/library/view/10978.html
0
1238
by: Alex Vinokur | last post by:
C++ program for generating combinatorial objects (exponents, permutations, arrangements, combinations for any numbers and words) can be downloaded at https://sourceforge.net/projects/comb-objects/ -- Alex Vinokur http://mathforum.org/library/view/10978.html http://sourceforge.net/users/alexvn
8
5035
by: girish | last post by:
Hi, I want to generate all non-empty substrings of a string of length >=2. Also, each substring is to be paired with 'string - substring' part and vice versa. Thus, gives me , , , , , ] etc. Similarly, 'abcd' should give me , , , , , ,, , , , , , ,] I've tried the following but i cant prevent duplicates and i'm missing
8
5630
by: Mir Nazim | last post by:
Hello, I need to write scripts in which I need to generate all posible unique combinations of an integer list. Lists are a minimum 12 elements in size with very large number of possible combination(12!) I hacked a few lines of code and tried a few things from Python CookBook (http://aspn.activestate.com/ASPN/Cookbook/), but they are hell slow.
6
3434
by: py_genetic | last post by:
Hi, I'm looking to generate x alphabetic strings in a list size x. This is exactly the same output that the unix command "split" generates as default file name output when splitting large files. Example: produce x original, but not random strings from english alphabet, all lowercase. The length of each string and possible combinations is
0
1371
by: jonKushner | last post by:
id like to create a script that runs through a form, and autogenerates every single permutation for every single value, as well as option. The task is because I am currently working on a webservice project where I am feeding in nightly a set of values from an external data source, and populating my form with those values. In the end, I end up resending a new query to this datasource with a custom url, including as parameters the values...
14
1998
by: bjorklund.emil | last post by:
Hello pythonistas. I'm a newbie to pretty much both programming and Python. I have a task that involves writing a test script for every possible combination of preference settings for a software I'm testing. I figured that this was something that a script could probably do pretty easily, given all the various possibilites. I started creating a dictionary of all the settings, where each key has a value that is a list of the possible...
0
8483
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8927
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8825
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8605
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8676
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5703
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4227
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2819
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1816
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.