With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.
The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?
--
mvh Björn 9 6836
BJörn Lindqvist <bj*****@gmail.comwrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.
The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?
A regular expression matcher uses a state machine to match strings.
What you want to do is do a traversal through that state machine
generating all possible matches at each point.
Luckily the python regexp matcher is written in python, so you have
access to the state machine directly if you want.
Take a look at sre*.py in the python library and you might be able to
work out what to do! I had a brief look myself, and it
looked... complicated!
--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Nick Craig-Wood wrote:
A regular expression matcher uses a state machine to match strings.
unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.
</F>
On Dec 22, 8:30 am, "BJörn Lindqvist" <bjou...@gmail.comwrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.
The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
Is this possible to do?
Here is a first cut at your problem
( http://pyparsing-public.wikispaces.c...e/invRegex.py).
I used pyparsing to identify repeatable ranges within a regex, then
attached generator-generating classes to parse actions for each type of
regex element. Some limitations:
- unbounded '*' and '+' repetition is not allowed
- only supports \d, \w, and \s macros
Here are the test cases in the program that generate the expected list
of permutations:
[A-E]
[A-D]{3}
X[A-C]{3}Y
X[A-C]{3}\(
X\d
[A-C]\s[A-C]\s[A-C]
[A-C]\s?[A-C][A-C]
[A-C]\s([A-C][A-C])
[A-C]\s([A-C][A-C])?
[A-C]{2}\d{2}
Fredrik Lundh <fr*****@pythonware.comwrote:
Nick Craig-Wood wrote:
A regular expression matcher uses a state machine to match strings.
unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.
Ah! Well that is outside of my experience then... I've only really
studied the matchers produced by flex before. Apologies for the
mis-information!
--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
BJörn Lindqvist wrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.
The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?
For a very small number of characters, it would be feasible. For any
finite number of characters, it would be possible (though it wouldn't
take much to take longer than the age of the universe). For reference,
in your simple example, you have 17,576,000 matching strings.
I'm curious as to why you would wish to do this. I certainly understand
considering hard problems for their own sake, but when I formulate
them, there's always some impetus that makes me say "Huh. Now I
wonder..."
Fredrik Lundh wrote:
Nick Craig-Wood wrote:
>A regular expression matcher uses a state machine to match strings.
unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.
</F>
How is the matching engine implemented then?
I thought regular languages can be described by deterministic /
non-deterministic finite state machines. I am just curious ...
Thomas
Thomas Ploch wrote:
Fredrik Lundh wrote:
Nick Craig-Wood wrote:
A regular expression matcher uses a state machine to match strings.
unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.
</F>
How is the matching engine implemented then?
I thought regular languages can be described by deterministic /
non-deterministic finite state machines. I am just curious ...
Regular expressions are described by N?DFAs, but Perl-compatible
regular expressions (which pretty much every language implements now)
are not true regular expressions. They are actually Turing complete,
and thus have features that cannot be implemented in N?DFAs.
"Paul McGuire" <pt***@austin.rr.comwrote in message
news:11*********************@79g2000cws.googlegrou ps.com...
On Dec 22, 8:30 am, "BJörn Lindqvist" <bjou...@gmail.comwrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.
The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
Is this possible to do?
Here is a first cut at your problem
( http://pyparsing-public.wikispaces.c...e/invRegex.py).
I used pyparsing to identify repeatable ranges within a regex, then
attached generator-generating classes to parse actions for each type of
regex element. Some limitations:
- unbounded '*' and '+' repetition is not allowed
- only supports \d, \w, and \s macros
=====================
Download the latest version of this file. It is now importable as its own
module, with the invert method that takes a regexp string and returns a
generator that yields all the possible matching strings. This file also
includes a simple count method, which returns the number of elements
returned by a generator (as opposed to calling len(list(invert("..."))),
which generates an intermediate list just to invoke len on it).
The reg exp features that have been added are:
- alternation using '|'
- min-max repetition using {min,max} format
- '.' wildcard character
Also fixed some repetition bugs, where "foobar{2}" was treated like
"(foobar){2}" - now both cases are handled correctly.
-- Paul
On 23 Dec 2006 04:23:09 -0800, Chris Johnson <ef******@gmail.comwrote:
>
BJörn Lindqvist wrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d\d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.
The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?
For a very small number of characters, it would be feasible. For any
finite number of characters, it would be possible (though it wouldn't
take much to take longer than the age of the universe). For reference,
in your simple example, you have 17,576,000 matching strings.
I'm curious as to why you would wish to do this. I certainly understand
considering hard problems for their own sake, but when I formulate
them, there's always some impetus that makes me say "Huh. Now I
wonder..."
I have a thousand use cases in mind. For example:
1. Generate sentences for a bot:
ipermute("(I|You|He|She|It|They) do( not)? (dis)?like (you|him|her|me|they)"):
Generates many grammatically incorrect sentences but you get the point.
2. Generate URL:s to web resources:
The following should generate URL:s to all c.l.p digests from the mail archive:
def download_clp():
year_re = "(199\d|200[0-6])"
months = ["January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December"]
month_re = '(' + '|'.join(months) + ')'
fmt = "http://mail\.python\.org/pipermail/python-list/%s-%s\.txt"
url_re = fmt % (year_re, month_re)
for x in ipermute(url_re):
print "Downloading", x
code to download here
The same approach could be used to download all threads in a forum for
example, or all articles on Slashdot.
3. "Visualising" regular expressions. I think it would be easier to
write regular expressions if you could see what kind of data they
would match.
4. Port scanning:
ip_tuple = "(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])"
for x in ipermute("192\.168\." + "\.".join([ip_tuple] * 2)):
scan_ip(x)
--
mvh Björn This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Steve Goldman |
last post by:
Hi,
I am trying to come up with a way to develop all n-length permutations of a
given list of values. The short function below seems to work, but I can't
help thinking there's a better way. ...
|
by: John Trunek |
last post by:
I have a set of X items, but want permutations of length Y (X > Y). I
am aware of the permutation functions in <algorithm>, but I don't
believe this will do what I want. Is there a way, either...
|
by: Alex Vinokur |
last post by:
Does STL contain algorithms which generate (enable to generate) exponents, permutations, arrangements, and combinations for any
numbers and words?
--
Alex Vinokur
mailto:alexvn@connect.to...
|
by: Alex Vinokur |
last post by:
C++ program for generating combinatorial objects
(exponents, permutations, arrangements, combinations for any numbers and words)
can be downloaded at...
|
by: girish |
last post by:
Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, gives me , , , , , ] etc....
| |
by: Mir Nazim |
last post by:
Hello,
I need to write scripts in which I need to generate all posible unique
combinations of an integer list. Lists are a minimum 12 elements in
size with very large number of possible...
|
by: py_genetic |
last post by:
Hi,
I'm looking to generate x alphabetic strings in a list size x. This
is exactly the same output that the unix command "split" generates as
default file name output when splitting large...
|
by: jonKushner |
last post by:
id like to create a script that runs through a form, and autogenerates
every single permutation for every single value, as well as option.
The task is because I am currently working on a webservice...
|
by: bjorklund.emil |
last post by:
Hello pythonistas.
I'm a newbie to pretty much both programming and Python. I have a task
that involves writing a test script for every possible combination of
preference settings for a software...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The...
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |