469,964 Members | 1,772 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,964 developers. It's quick & easy.

creating an (inefficent) alternating regular expression from a listof options

Pyparsing has a really nice feature that I want in PLY. I want to
specify a list of strings and have them converted to a regular
expression.

A Perl module which does an aggressively optimizing job of this is
Regexp::List -
http://search.cpan.org/~dankogai/Reg...Regexp/List.pm

I really dont care if the expression is optimal. So the goal is
something like:

vowel_regexp = oneOf("a aa i ii u uu".split()) # yielding r'(aa|a|uu|
u|ii|i)'

Is there a public module available for this purpose?
Sep 9 '08 #1
8 1205
metaperl.com wrote:
Pyparsing has a really nice feature that I want in PLY. I want to
specify a list of strings and have them converted to a regular
expression.

A Perl module which does an aggressively optimizing job of this is
Regexp::List -
http://search.cpan.org/~dankogai/Reg...Regexp/List.pm

I really dont care if the expression is optimal. So the goal is
something like:

vowel_regexp = oneOf("a aa i ii u uu".split()) # yielding r'(aa|a|uu|
u|ii|i)'

Is there a public module available for this purpose?

Perhaps I'm missing something but your function call oneOf(...) is longer than
than actually specifying the result.

You can certainly write quite easily:

def oneOf(s):
return "|".join(s.split())

-Larry
Sep 9 '08 #2
>I really dont care if the expression is optimal. So the goal is
something like:
>vowel_regexp = oneOf("a aa i ii u uu".split()) # yielding r'(aa|a|uu|
u|ii|i)'
>Is there a public module available for this purpose?
Check Ka-Ping Yee's rxb module:

http://lfw.org/python/

Needs translating from the old regex module to the current re module, but it
shouldn't take a lot of effort. You can find regex docs in old
distributions (probably through 2.1 or 2.2?). Also, check PyPI to see if
someone has already updated rxb for use with re.

Skip

Sep 9 '08 #3
On Tue, 09 Sep 2008 08:19:04 -0500, Larry Bates wrote:
>I really dont care if the expression is optimal. So the goal is
something like:

vowel_regexp = oneOf("a aa i ii u uu".split()) # yielding r'(aa|a|uu|
u|ii|i)'

Is there a public module available for this purpose?
Perhaps I'm missing something but your function call oneOf(...) is
longer than than actually specifying the result.

You can certainly write quite easily:

def oneOf(s):
return "|".join(s.split())
I'd throw `re.escape()` into that function just in case `s` contains
something with special meaning in regular expressions.

Ciao,
Marc 'BlackJack' Rintsch
Sep 9 '08 #4
metaperl.com <me******@gmail.comwrote:
Pyparsing has a really nice feature that I want in PLY. I want to
specify a list of strings and have them converted to a regular
expression.

A Perl module which does an aggressively optimizing job of this is
Regexp::List -
http://search.cpan.org/~dankogai/Reg...Regexp/List.pm

I really dont care if the expression is optimal. So the goal is
something like:

vowel_regexp = oneOf("a aa i ii u uu".split()) # yielding r'(aa|a|uu|
u|ii|i)'

Is there a public module available for this purpose?
I wrote one of these in perl a while ago

http://www.craig-wood.com/nick/pub/words-to-regexp.pl

It transforms the regular expression recursively into more efficient
ones. It uses regular expressions to do that. I considered porting
it to python but looking at the regular expressions made me feel weak
at the knees ;-)

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Sep 9 '08 #5
On Sep 9, 9:12 am, "metaperl.com" <metap...@gmail.comwrote:
Pyparsing has a really nice feature that I want in PLY. I want to
specify a list of strings and have them converted to a regular
expression.

A Perl module which does an aggressively optimizing job of this is
Regexp::List -http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/Lis...

I really dont care if the expression is optimal. So the goal is
something like:

vowel_regexp = oneOf("a aa i ii u uu".split()) # yielding r'(aa|a|uu|
u|ii|i)'

Is there a public module available for this purpose?
I was about to start a thread about a very similar problem; hope I'm
not hijacking this thread. However I am interested in a solution that
(1) scales to a potentially large number of alternate strings
(hundreds or thousands), which will hit, sooner or later, some max
limit in the builtin regexps and (2) is not limited to strings but can
be used for arbitrary objects. Therefore I am looking for a different
approach, perhaps a specialization of the general regexp search
algorithm.

More formally, given (a) an input sequence I and (b) a set of
'censored' sequences S, find and remove all the sequences in S that
appear in I. For instance, if
S = set([(a,), (a,b), (a,c), (c,), (d,a)])
and
I = (x, a, b, c, y, d, a, b), the filtered sequence would be:
O = (x, y, b)
i.e. the censored subsequences are (a,b), (c,), and (d,a).

As with regexps, if sequence `x` is a prefix of `y`, the longer
sequence `y` wins when both match (e.g. if (a,b) matches, it
supersedes (a,)).

For extra bonus, the algorithm should remove overlapping subsequences
too, i.e. for the input I above, the output would be (x, y) since both
(d,a) and (a,b) would match for (d,a,b).

With respect to complexity, I am mainly interested in len(S); len(I)
is small for my application, typically no more than 10. Of course, an
algorithm that scales decently in both len(S) and len(I) would be even
better. Any ideas or relevant links ?

George

PS: This is not a homework ;)
Sep 9 '08 #6
Larry Bates wrote:
>vowel_regexp = oneOf("a aa i ii u uu".split()) # yielding r'(aa|a|uu|
u|ii|i)'

Is there a public module available for this purpose?

Perhaps I'm missing something but your function call oneOf(...) is
longer than than actually specifying the result.

You can certainly write quite easily:

def oneOf(s):
return "|".join(s.split())
"|" works strictly from left to right, so that doesn't quite work since
it doesn't place "aa" before "a". a simple reverse sort will take care
of that:

return "|".join(sorted(s.split(), reverse=True))

you may also want to do re.escape on all the words, to avoid surprises
when the choices contain special characters.

</F>

Sep 9 '08 #7
On Sep 9, 9:23*am, s...@pobox.com wrote:
* * >I really dont care if theexpressionis optimal. So the goal is
* * >something like:

* * >vowel_regexp = oneOf("a aa i ii u uu".split()) *# yieldingr'(aa|a|uu|
* * >u|ii|i)'

* * >Is there a public module available for this purpose?

Check Ka-Ping Yee's rxb module:

* *http://lfw.org/python/
Ok <http://lfw.org/python/rxb.py>
suffers from the possibility of putting shorter match before longer
one:

def either(*alternatives):
options = []
for option in alternatives:
options.append(makepat(option).regex)
return Pattern('\(' + string.join(options, '|') + '\)')

>*Also, check PyPI to see if
someone has already updated rxb for use with re.
No one has - http://pypi.python.org/pypi?%3Aactio...&submit=search

no results returned

Sep 18 '08 #8
On Sep 9, 12:42*pm, Fredrik Lundh <fred...@pythonware.comwrote:
>
you may also want to do re.escape on all the words, to avoid surprises
when the choices contain special characters.
yes, thank you very much:

import re

def oneOf(s):
alts = sorted(s.split(), reverse=True)
alts = [re.escape(s) for s in alts]
return "|".join(alts)

Sep 18 '08 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Neri | last post: by
1 post views Thread by | last post: by
14 posts views Thread by olekristianvillabo | last post: by
9 posts views Thread by Pete Davis | last post: by
25 posts views Thread by Mike | last post: by
4 posts views Thread by Clodoaldo Pinto | last post: by
4 posts views Thread by mike | last post: by
3 posts views Thread by shapper | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.