With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d \d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.
The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?
--
mvh Björn 9 6867
BJörn Lindqvist <bj*****@gmail. comwrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d \d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.
The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?
A regular expression matcher uses a state machine to match strings.
What you want to do is do a traversal through that state machine
generating all possible matches at each point.
Luckily the python regexp matcher is written in python, so you have
access to the state machine directly if you want.
Take a look at sre*.py in the python library and you might be able to
work out what to do! I had a brief look myself, and it
looked... complicated!
--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Nick Craig-Wood wrote:
A regular expression matcher uses a state machine to match strings.
unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.
</F>
On Dec 22, 8:30 am, "BJörn Lindqvist" <bjou...@gmail. comwrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d \d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.
The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
Is this possible to do?
Here is a first cut at your problem
( http://pyparsing-public.wikispaces.c...e/invRegex.py).
I used pyparsing to identify repeatable ranges within a regex, then
attached generator-generating classes to parse actions for each type of
regex element. Some limitations:
- unbounded '*' and '+' repetition is not allowed
- only supports \d, \w, and \s macros
Here are the test cases in the program that generate the expected list
of permutations:
[A-E]
[A-D]{3}
X[A-C]{3}Y
X[A-C]{3}\(
X\d
[A-C]\s[A-C]\s[A-C]
[A-C]\s?[A-C][A-C]
[A-C]\s([A-C][A-C])
[A-C]\s([A-C][A-C])?
[A-C]{2}\d{2}
Fredrik Lundh <fr*****@python ware.comwrote:
Nick Craig-Wood wrote:
A regular expression matcher uses a state machine to match strings.
unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.
Ah! Well that is outside of my experience then... I've only really
studied the matchers produced by flex before. Apologies for the
mis-information!
--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
BJörn Lindqvist wrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d \d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.
The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?
For a very small number of characters, it would be feasible. For any
finite number of characters, it would be possible (though it wouldn't
take much to take longer than the age of the universe). For reference,
in your simple example, you have 17,576,000 matching strings.
I'm curious as to why you would wish to do this. I certainly understand
considering hard problems for their own sake, but when I formulate
them, there's always some impetus that makes me say "Huh. Now I
wonder..."
Fredrik Lundh wrote:
Nick Craig-Wood wrote:
>A regular expression matcher uses a state machine to match strings.
unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.
</F>
How is the matching engine implemented then?
I thought regular languages can be described by deterministic /
non-deterministic finite state machines. I am just curious ...
Thomas
Thomas Ploch wrote:
Fredrik Lundh wrote:
Nick Craig-Wood wrote:
A regular expression matcher uses a state machine to match strings.
unless it's the kind of regular expression matcher that doesn't use a
state machine, like the one in Python.
</F>
How is the matching engine implemented then?
I thought regular languages can be described by deterministic /
non-deterministic finite state machines. I am just curious ...
Regular expressions are described by N?DFAs, but Perl-compatible
regular expressions (which pretty much every language implements now)
are not true regular expressions. They are actually Turing complete,
and thus have features that cannot be implemented in N?DFAs.
"Paul McGuire" <pt***@austin.r r.comwrote in message
news:11******** *************@7 9g2000cws.googl egroups.com...
On Dec 22, 8:30 am, "BJörn Lindqvist" <bjou...@gmail. comwrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d \d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.
The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
Is this possible to do?
Here is a first cut at your problem
( http://pyparsing-public.wikispaces.c...e/invRegex.py).
I used pyparsing to identify repeatable ranges within a regex, then
attached generator-generating classes to parse actions for each type of
regex element. Some limitations:
- unbounded '*' and '+' repetition is not allowed
- only supports \d, \w, and \s macros
=============== ======
Download the latest version of this file. It is now importable as its own
module, with the invert method that takes a regexp string and returns a
generator that yields all the possible matching strings. This file also
includes a simple count method, which returns the number of elements
returned by a generator (as opposed to calling len(list(invert ("..."))),
which generates an intermediate list just to invoke len on it).
The reg exp features that have been added are:
- alternation using '|'
- min-max repetition using {min,max} format
- '.' wildcard character
Also fixed some repetition bugs, where "foobar{2}" was treated like
"(foobar){2 }" - now both cases are handled correctly.
-- Paul
On 23 Dec 2006 04:23:09 -0800, Chris Johnson <ef******@gmail .comwrote:
>
BJörn Lindqvist wrote:
With regexps you can search for strings matching it. For example,
given the regexp: "foobar\d\d \d". "foobar123" would match. I want to
do the reverse, from a regexp generate all strings that could match
it.
The regexp: "[A-Z]{3}\d{3}" should generate the strings "AAA000",
"AAA001", "AAA002" ... "AAB000", "AAB001" ... "ZZZ999".
Is this possible to do? Obviously, for some regexps the set of matches
is unbounded (a list of everything that matches "*" would be very
unpractical), but how would you do it for simple regexps like the one
above?
For a very small number of characters, it would be feasible. For any
finite number of characters, it would be possible (though it wouldn't
take much to take longer than the age of the universe). For reference,
in your simple example, you have 17,576,000 matching strings.
I'm curious as to why you would wish to do this. I certainly understand
considering hard problems for their own sake, but when I formulate
them, there's always some impetus that makes me say "Huh. Now I
wonder..."
I have a thousand use cases in mind. For example:
1. Generate sentences for a bot:
ipermute("(I|Yo u|He|She|It|The y) do( not)? (dis)?like (you|him|her|me |they)"):
Generates many grammatically incorrect sentences but you get the point.
2. Generate URL:s to web resources:
The following should generate URL:s to all c.l.p digests from the mail archive:
def download_clp():
year_re = "(199\d|200[0-6])"
months = ["January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September" ,
"October",
"November",
"December"]
month_re = '(' + '|'.join(months ) + ')'
fmt = "http://mail\.python\.o rg/pipermail/python-list/%s-%s\.txt"
url_re = fmt % (year_re, month_re)
for x in ipermute(url_re ):
print "Downloadin g", x
code to download here
The same approach could be used to download all threads in a forum for
example, or all articles on Slashdot.
3. "Visualisin g" regular expressions. I think it would be easier to
write regular expressions if you could see what kind of data they
would match.
4. Port scanning:
ip_tuple = "(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])"
for x in ipermute("192\. 168\." + "\.".join([ip_tuple] * 2)):
scan_ip(x)
--
mvh Björn This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Steve Goldman |
last post by:
Hi,
I am trying to come up with a way to develop all n-length permutations of a
given list of values. The short function below seems to work, but I can't
help thinking there's a better way. Not being a computer scientist, I find
recursive functions to be frightening and unnatural. I'd appreciate if
anyone can tell me the pythonic idiom to accomplish this.
Thanks for your help,
|
by: John Trunek |
last post by:
I have a set of X items, but want permutations of length Y (X > Y). I
am aware of the permutation functions in <algorithm>, but I don't
believe this will do what I want. Is there a way, either through the
STL or some other library to do this, or do I need to write my own
code?
|
by: Alex Vinokur |
last post by:
Does STL contain algorithms which generate (enable to generate) exponents, permutations, arrangements, and combinations for any
numbers and words?
--
Alex Vinokur
mailto:alexvn@connect.to
http://mathforum.org/library/view/10978.html
|
by: Alex Vinokur |
last post by:
C++ program for generating combinatorial objects
(exponents, permutations, arrangements, combinations for any numbers and words)
can be downloaded at
https://sourceforge.net/projects/comb-objects/
--
Alex Vinokur
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn
|
by: girish |
last post by:
Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, gives me , , , , , ] etc.
Similarly, 'abcd' should give me , , , , , ,, ,
, , , , ,]
I've tried the following but i cant prevent duplicates and i'm missing
| |
by: Mir Nazim |
last post by:
Hello,
I need to write scripts in which I need to generate all posible unique
combinations of an integer list. Lists are a minimum 12 elements in
size with very large number of possible combination(12!)
I hacked a few lines of code and tried a few things from Python
CookBook (http://aspn.activestate.com/ASPN/Cookbook/), but they are
hell slow.
|
by: py_genetic |
last post by:
Hi,
I'm looking to generate x alphabetic strings in a list size x. This
is exactly the same output that the unix command "split" generates as
default file name output when splitting large files.
Example:
produce x original, but not random strings from english alphabet, all
lowercase. The length of each string and possible combinations is
|
by: jonKushner |
last post by:
id like to create a script that runs through a form, and autogenerates
every single permutation for every single value, as well as option.
The task is because I am currently working on a webservice project
where I am feeding in nightly a set of values from an external data
source, and populating my form with those values. In the end, I end up
resending a new query to this datasource with a custom url, including
as parameters the values...
|
by: bjorklund.emil |
last post by:
Hello pythonistas.
I'm a newbie to pretty much both programming and Python. I have a task
that involves writing a test script for every possible combination of
preference settings for a software I'm testing. I figured that this
was something that a script could probably do pretty easily, given all
the various possibilites.
I started creating a dictionary of all the settings, where each key
has a value that is a list of the possible...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |