473,326 Members | 2,173 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Regexp parser and generator

Is there any package that parses regular expressions and returns an
AST ? Something like:
>>parse_rx(r'i (love|hate) h(is|er) (cat|dog)s?\s*!+')
Regex('i ', Or('love', 'hate'), ' h', Or('is', 'er'), ' ', Or('cat',
'dog'), Optional('s'), ZeroOrMore(r'\s'), OneOrMore('!'))

Given such a structure, I want to create a generator that can generate
all strings matched by this regexp. Obviously if the regexp contains a
'*' or '+' the generator is infinite, and although it can be
artificially constrained by, say, a maxdepth parameter, for now I'm
interested in finite regexps only. It shouldn't be too hard to write
one from scratch but just in case someone has already done it, so much
the better.

George
Nov 4 '08 #1
5 3619

GeorgeIs there any package that parses regular expressions and returns
Georgean AST ?

Maybe not directly, but this might provide a starting point for building
such a beast:
>>import re
re.compile("[ab]", 128)
in
literal 97
literal 98
<_sre.SRE_Pattern object at 0x47b7a0>
>>re.compile("ab*c[xyz]", 128)
literal 97
max_repeat 0 65535
literal 98
literal 99
in
literal 120
literal 121
literal 122
<_sre.SRE_Pattern object at 0x371f90>

Skip
Nov 4 '08 #2
George Sakkis wrote:
Is there any package that parses regular expressions and returns an
AST ? Something like:
>>>parse_rx(r'i (love|hate) h(is|er) (cat|dog)s?\s*!+')
Regex('i ', Or('love', 'hate'), ' h', Or('is', 'er'), ' ', Or('cat',
'dog'), Optional('s'), ZeroOrMore(r'\s'), OneOrMore('!'))
Seen today, on planet python:
>>import sre_parse
sre_parse.parse("a|b")
[('in', [('literal', 97), ('literal', 98)])]
Peter
Nov 4 '08 #3
On Nov 4, 1:34*pm, George Sakkis <george.sak...@gmail.comwrote:
Is there any package that parses regular expressions and returns an
AST ? Something like:
>parse_rx(r'i (love|hate) h(is|er) (cat|dog)s?\s*!+')

Regex('i ', Or('love', 'hate'), ' h', Or('is', 'er'), ' ', Or('cat',
'dog'), Optional('s'), ZeroOrMore(r'\s'), OneOrMore('!'))

Given such a structure, I want to create a generator that can generate
all strings matched by this regexp. Obviously if the regexp contains a
'*' or '+' the generator is infinite, and although it can be
artificially constrained by, say, a maxdepth parameter, for now I'm
interested in finite regexps only. It shouldn't be too hard to write
one from scratch but just in case someone has already done it, so much
the better.

George
Check out this pyparsing regex inverter: http://pyparsing.wikispaces.com/file/view/invRegex.py

Here is what your example generates:
i (love|hate) h(is|er) (cat|dog)s?
Parse time: 0.17 seconds
16
i love his cat
i love his cats
i love his dog
i love his dogs
i love her cat
i love her cats
i love her dog
i love her dogs
i hate his cat
i hate his cats
i hate his dog
i hate his dogs
i hate her cat
i hate her cats
i hate her dog
i hate her dogs

-- Paul
Nov 5 '08 #4
On Nov 4, 9:56*pm, Paul McGuire <pt...@austin.rr.comwrote:
On Nov 4, 1:34*pm, George Sakkis <george.sak...@gmail.comwrote:
Is there any package that parses regular expressions and returns an
AST ? Something like:
>>parse_rx(r'i (love|hate) h(is|er) (cat|dog)s?\s*!+')
Regex('i ', Or('love', 'hate'), ' h', Or('is', 'er'), ' ', Or('cat',
'dog'), Optional('s'), ZeroOrMore(r'\s'), OneOrMore('!'))
Given such a structure, I want to create a generator that can generate
all strings matched by this regexp. Obviously if the regexp contains a
'*' or '+' the generator is infinite, and although it can be
artificially constrained by, say, a maxdepth parameter, for now I'm
interested in finite regexps only. It shouldn't be too hard to write
one from scratch but just in case someone has already done it, so much
the better.
George

Check out this pyparsing regex inverter:http://pyparsing.wikispaces.com/file/view/invRegex.py
Neat, seems like a good excuse to look into pyparsing :)

Best,
George
Nov 6 '08 #5
On Nov 4, 3:30*pm, Peter Otten <__pete...@web.dewrote:
George Sakkis wrote:
Is there any package that parses regular expressions and returns an
AST ? Something like:
>>parse_rx(r'i (love|hate) h(is|er) (cat|dog)s?\s*!+')
Regex('i ', Or('love', 'hate'), ' h', Or('is', 'er'), ' ', Or('cat',
'dog'), Optional('s'), ZeroOrMore(r'\s'), OneOrMore('!'))

Seen today, on planet python:
>import sre_parse
sre_parse.parse("a|b")

[('in', [('literal', 97), ('literal', 98)])]

Peter
Thanks, that's rather low level and undocumented but it does the work.

Best,
George
Nov 6 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could....
11
by: Jean de Largentaye | last post by:
Hi, I need to parse a subset of C (a header file), and generate some unit tests for the functions listed in it. I thus need to parse the code, then rewrite function calls with wrong parameters....
2
by: alederer | last post by:
Hallo! Does anybody know a parser generator that supports unicode (UTF-16), and is based on something like ICU. The parser is used in a platform independent and cross-platform communicating...
2
by: karthik bala guru | last post by:
Hi, I would like to have a XHTML Generator and Parser in C language from the open source community. Someone Here, kindly give me a link or the name of the tool available in the opensource world....
8
by: Paddy | last post by:
Proposal: Named RE variables ====================== The problem I have is that I am writing a 'good-enough' verilog tag extractor as a long regular expression (with the 'x' flag for...
6
by: Mike C# | last post by:
Hi all, Can anyone recommend a good and *easy to use* lexer and parser generator? Preferably one that was written specifically for VC++ and not mangled through 20 different platforms. I've had...
9
by: =?ISO-8859-1?Q?BJ=F6rn_Lindqvist?= | last post by:
With regexps you can search for strings matching it. For example, given the regexp: "foobar\d\d\d". "foobar123" would match. I want to do the reverse, from a regexp generate all strings that could...
5
by: yoni | last post by:
Hi, I am trying to write a regexp to find all the code on the header of entities in SQL Server (views, SPs, etc...) I got something like this: (.|\n)*((create view)|(create proc)|(create...
3
by: Paddy | last post by:
Lets say i have a generator running that generates successive characters of a 'string' characters then I would have to 'freeze' the generator and pass the characters so far to re.search. It is...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.