472,354 Members | 1,621 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,354 software developers and data experts.

PyParsing and Headaches

Hi,

I'm trying to construct a parser, but I'm stuck with some basic
stuff... For example, I want to match the following:

letter = "A"..."Z" | "a"..."z"
literal = letter+
include_bool := "+" | "-"
term = [include_bool] literal

So I defined this as:

literal = Word(alphas)
include_bool = Optional(oneOf("+ -"))
term = include_bool + literal

The problem is that:

term.parseString("+a") -(['+', 'a'], {}) # OK
term.parseString("+ a") -(['+', 'a'], {}) # KO. It shouldn't
recognize any token since I didn't said the SPACE was allowed between
include_bool and literal.

Can anyone give me an hand here?

Cheers!

Hugo Ferreira

BTW, the following is the complete grammar I'm trying to implement with
pyparsing:

## L ::= expr | expr L
## expr ::= term | binary_expr
## binary_expr ::= term " " binary_op " " term
## binary_op ::= "*" | "OR" | "AND"
## include_bool ::= "+" | "-"
## term ::= ([include_bool] [modifier ":"] (literal | range)) | ("~"
literal)
## modifier ::= (letter | "_")+
## literal ::= word | quoted_words
## quoted_words ::= '"' word (" " word)* '"'
## word ::= (letter | digit | "_")+
## number ::= digit+
## range ::= number (".." | "...") number
## letter ::= "A"..."Z" | "a"..."z"
## digit ::= "0"..."9"

And this is where I got so far:

word = Word(nums + alphas + "_")
binary_op = oneOf("* and or", caseless=True).setResultsName("operator")
include_bool = oneOf("+ -")
literal = (word | quotedString).setResultsName("literal")
modifier = Word(alphas + "_")
rng = Word(nums) + (Literal("..") | Literal("...")) + Word(nums)
term = ((Optional(include_bool) + Optional(modifier + ":") + (literal |
rng)) | ("~" + literal)).setResultsName("Term")
binary_expr = (term + binary_op + term).setResultsName("binary")
expr = (binary_expr | term).setResultsName("Expr")
L = OneOrMore(expr)
--
GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85

Nov 22 '06 #1
4 1502
On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote:
Hi,

I'm trying to construct a parser, but I'm stuck with some basic
stuff... For example, I want to match the following:

letter = "A"..."Z" | "a"..."z"
literal = letter+
include_bool := "+" | "-"
term = [include_bool] literal

So I defined this as:

literal = Word(alphas)
include_bool = Optional(oneOf("+ -"))
term = include_bool + literal
+ here means that you allow a space. You need to explicitly override this.
Try:

term = Combine(include_bool + literal)
>
The problem is that:

term.parseString("+a") -(['+', 'a'], {}) # OK
term.parseString("+ a") -(['+', 'a'], {}) # KO. It shouldn't
recognize any token since I didn't said the SPACE was allowed between
include_bool and literal.

Can anyone give me an hand here?

Cheers!

Hugo Ferreira

BTW, the following is the complete grammar I'm trying to implement with
pyparsing:

## L ::= expr | expr L
## expr ::= term | binary_expr
## binary_expr ::= term " " binary_op " " term
## binary_op ::= "*" | "OR" | "AND"
## include_bool ::= "+" | "-"
## term ::= ([include_bool] [modifier ":"] (literal | range)) | ("~"
literal)
## modifier ::= (letter | "_")+
## literal ::= word | quoted_words
## quoted_words ::= '"' word (" " word)* '"'
## word ::= (letter | digit | "_")+
## number ::= digit+
## range ::= number (".." | "...") number
## letter ::= "A"..."Z" | "a"..."z"
## digit ::= "0"..."9"

And this is where I got so far:

word = Word(nums + alphas + "_")
binary_op = oneOf("* and or", caseless=True).setResultsName("operator")
include_bool = oneOf("+ -")
literal = (word | quotedString).setResultsName("literal")
modifier = Word(alphas + "_")
rng = Word(nums) + (Literal("..") | Literal("...")) + Word(nums)
term = ((Optional(include_bool) + Optional(modifier + ":") + (literal |
rng)) | ("~" + literal)).setResultsName("Term")
binary_expr = (term + binary_op + term).setResultsName("binary")
expr = (binary_expr | term).setResultsName("Expr")
L = OneOrMore(expr)
--
GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85

--
http://mail.python.org/mailman/listinfo/python-list
Nov 22 '06 #2
"Bytter" <by****@gmail.comwrote in message
news:11**********************@j44g2000cwa.googlegr oups.com...
Hi,

I'm trying to construct a parser, but I'm stuck with some basic
stuff... For example, I want to match the following:

letter = "A"..."Z" | "a"..."z"
literal = letter+
include_bool := "+" | "-"
term = [include_bool] literal

So I defined this as:

literal = Word(alphas)
include_bool = Optional(oneOf("+ -"))
term = include_bool + literal

The problem is that:

term.parseString("+a") -(['+', 'a'], {}) # OK
term.parseString("+ a") -(['+', 'a'], {}) # KO. It shouldn't
recognize any token since I didn't said the SPACE was allowed between
include_bool and literal.
As Chris pointed out in his post, the most direct way to fix this is to use
Combine. Note that Combine does two things: it requires the expressions to
be adjacent, and it combines the results into a single token. For instance,
when defining the expression for a real number, something like:

realnum = Optional(oneOf("+ -")) + Word(nums) + "." + Word(nums)

Pyparsing would parse "3.14159" into the separate tokens ['', '3', '.',
'14159']. For this grammar, pyparsing would also accept "2. 23" as ['',
'2', '.', '23'], even though there is a space between the decimal point and
"23". But by wrapping it inside Combine, as in:

realnum = Combine(Optional(oneOf("+ -")) + Word(nums) + "." + Word(nums))

we accomplish two things: pyparsing only matches if all the elements are
adjacent, with no whitespace or comments; and the matched token is returned
as ['3.14159']. (Yes, I left off scientific notation, but it is an
extension of the same issue.)

Pyparsing in general does implicit whitespace skipping; it is part of the
zen of pyparsing, and distinguishes it from conventional regexps (although I
think there is a new '?' switch for re's that puts '\s*'s between re terms
for you). This is to simplify the grammar definition, so that it doesn't
need to be littered with "optional whitespace or comments could go here"
expressions; instead, whitespace and comments (or "ignorables" in pyparsing
terminology) are parsed over before every grammar expression. I instituted
this out of recoil from a previous project, in which a co-developer
implemented a boolean parser by first tokenizing by whitespace, then parsing
out the tokens. Unfortunately, this meant that "color=='blue' &&
size=='medium'" would not parse successfully, instead requiring "color ==
'blue' && size == 'medium'". It doesn't seem like much, but our support
guys got many calls asking why the boolean clauses weren't matching. I
decided that when I wrote a parser, "y=m*x+b" would be just as parseable as
"y = m * x + b". For that matter, you'd be surprised where whitespace and
comments sneak in to people's source code: spaces after left parentheses and
comments after semicolons, for example, are easily forgotten when spec'ing
out the syntax for a C "for" statement; whitespace inside HTML tags is
another unanticipated surprise.

So looking at your grammar, you say you don't want to have this be a
successful parse:
term.parseString("+ a") -(['+', 'a'], {})

because, "It shouldn't recognize any token since I didn't said the SPACE was
allowed between include_bool and literal." In fact, pyparsing allows spaces
by default, that's why the given parse succeeds. I would turn this question
around, and ask you in terms of your grammar - what SHOULD be allowed
between include_bool and literal? If spaces are not a problem, then your
grammar as-is is sufficient. If spaces are absolutely verboten, then there
are 2 or 3 different techniques in pyparsing to disable the
whitespace-skipping behavior, depending on whether you want all whitespace
skipping disabled, just for literals of a certain type, or just for literals
when following a leading include_bool sign.

Thanks for giving pyparsing a try; if you want further help, you can post
here, or on the pyparsing wiki - the discussion threads on the Home page are
a pretty good support and message log.

-- Paul
Nov 22 '06 #3
(This message has already been sent to the mailing-list, but I don't
have sure this is arriving well since it doesn't come up in the usenet,
so I'm posting it through here now.)

Chris,

Thanks for your quick answer. That changes a lot of stuff, and now I'm
able to do my parsing as I intended to.

Still, there's a remaining problem. By using Combine(), everything is
interpreted as a single token. Though what I need is that
'include_bool' and 'literal' be parsed as separated tokens, though
without a space in the middle...

Paul,

Thanks for your detailed explanation. One of the things I think is
missing from the documentation (or that I couldn't find easy) is the
kind of explanation you give about 'The Way of PyParsing'. For example,
It took me a while to understand that I could easily implement simple
recursions using OneOrMany(Group()). Or maybe things were out there and
I didn't searched enough...

Still, fwiw, congratulations for the library. PyParsing allowed me to
do in just a couple of hours, including learning about it's API (minus
this little inconvenient) what would have taken me a couple of days
with, for example, ANTLR (in fact, I've already put aside ANTLR more
than once in the past for a built-from-scratch parser).

Cheers,

Hugo Ferreira

On Nov 22, 7:50 pm, Chris Lambacher <c...@kateandchris.netwrote:
On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote:
Hi,
I'm trying to construct a parser, but I'm stuck with some basic
stuff... For example, I want to match the following:
letter = "A"..."Z" | "a"..."z"
literal = letter+
include_bool := "+" | "-"
term = [include_bool] literal
So I defined this as:
literal = Word(alphas)
include_bool = Optional(oneOf("+ -"))
term = include_bool + literal+ here means that you allow a space. You need to explicitly override this.
Try:

term = Combine(include_bool + literal)
The problem is that:
term.parseString("+a") -(['+', 'a'], {}) # OK
term.parseString("+ a") -(['+', 'a'], {}) # KO. It shouldn't
recognize any token since I didn't said the SPACE was allowed between
include_bool and literal.
Can anyone give me an hand here?
Cheers!
Hugo Ferreira
BTW, the following is the complete grammar I'm trying to implement with
pyparsing:
## L ::= expr | expr L
## expr ::= term | binary_expr
## binary_expr ::= term " " binary_op " " term
## binary_op ::= "*" | "OR" | "AND"
## include_bool ::= "+" | "-"
## term ::= ([include_bool] [modifier ":"] (literal | range)) | ("~"
literal)
## modifier ::= (letter | "_")+
## literal ::= word | quoted_words
## quoted_words ::= '"' word (" " word)* '"'
## word ::= (letter | digit | "_")+
## number ::= digit+
## range ::= number (".." | "...") number
## letter ::= "A"..."Z" | "a"..."z"
## digit ::= "0"..."9"
And this is where I got so far:
word = Word(nums + alphas + "_")
binary_op = oneOf("* and or", caseless=True).setResultsName("operator")
include_bool = oneOf("+ -")
literal = (word | quotedString).setResultsName("literal")
modifier = Word(alphas + "_")
rng = Word(nums) + (Literal("..") | Literal("...")) + Word(nums)
term = ((Optional(include_bool) + Optional(modifier + ":") + (literal |
rng)) | ("~" + literal)).setResultsName("Term")
binary_expr = (term + binary_op + term).setResultsName("binary")
expr = (binary_expr | term).setResultsName("Expr")
L = OneOrMore(expr)
--
GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85
--
http://mail.python.org/mailman/listinfo/python-list
Nov 23 '06 #4
Heya there,

Ok, found the solution. I just needed to use leaveWhiteSpace() in the
places I want pyparsing to take into consideration the spaces.
Thx for the help.

Cheers!

Hugo Ferreira

On Nov 23, 11:57 am, "Bytter" <byt...@gmail.comwrote:
(This message has already been sent to the mailing-list, but I don't
have sure this is arriving well since it doesn't come up in the usenet,
so I'm posting it through here now.)

Chris,

Thanks for your quick answer. That changes a lot of stuff, and now I'm
able to do my parsing as I intended to.

Still, there's a remaining problem. By using Combine(), everything is
interpreted as a single token. Though what I need is that
'include_bool' and 'literal' be parsed as separated tokens, though
without a space in the middle...

Paul,

Thanks for your detailed explanation. One of the things I think is
missing from the documentation (or that I couldn't find easy) is the
kind of explanation you give about 'The Way of PyParsing'. For example,
It took me a while to understand that I could easily implement simple
recursions using OneOrMany(Group()). Or maybe things were out there and
I didn't searched enough...

Still, fwiw, congratulations for the library. PyParsing allowed me to
do in just a couple of hours, including learning about it's API (minus
this little inconvenient) what would have taken me a couple of days
with, for example, ANTLR (in fact, I've already put aside ANTLR more
than once in the past for a built-from-scratch parser).

Cheers,

Hugo Ferreira

On Nov 22, 7:50 pm, Chris Lambacher <c...@kateandchris.netwrote:
On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote:
Hi,
I'm trying to construct a parser, but I'm stuck with some basic
stuff... For example, I want to match the following:
letter = "A"..."Z" | "a"..."z"
literal = letter+
include_bool := "+" | "-"
term = [include_bool] literal
So I defined this as:
literal = Word(alphas)
include_bool = Optional(oneOf("+ -"))
term = include_bool + literal+ here means that you allow a space. You need to explicitly override this.
Try:
term = Combine(include_bool + literal)
The problem is that:
term.parseString("+a") -(['+', 'a'], {}) # OK
term.parseString("+ a") -(['+', 'a'], {}) # KO. It shouldn't
recognize any token since I didn't said the SPACE was allowed between
include_bool and literal.
Can anyone give me an hand here?
Cheers!
Hugo Ferreira
BTW, the following is the complete grammar I'm trying to implement with
pyparsing:
## L ::= expr | expr L
## expr ::= term | binary_expr
## binary_expr ::= term " " binary_op " " term
## binary_op ::= "*" | "OR" | "AND"
## include_bool ::= "+" | "-"
## term ::= ([include_bool] [modifier ":"] (literal | range)) | ("~"
literal)
## modifier ::= (letter | "_")+
## literal ::= word | quoted_words
## quoted_words ::= '"' word (" " word)* '"'
## word ::= (letter | digit | "_")+
## number ::= digit+
## range ::= number (".." | "...") number
## letter ::= "A"..."Z" | "a"..."z"
## digit ::= "0"..."9"
And this is where I got so far:
word = Word(nums + alphas + "_")
binary_op = oneOf("* and or", caseless=True).setResultsName("operator")
include_bool = oneOf("+ -")
literal = (word | quotedString).setResultsName("literal")
modifier = Word(alphas + "_")
rng = Word(nums) + (Literal("..") | Literal("...")) + Word(nums)
term = ((Optional(include_bool) + Optional(modifier + ":") + (literal |
rng)) | ("~" + literal)).setResultsName("Term")
binary_expr = (term + binary_op + term).setResultsName("binary")
expr = (binary_expr | term).setResultsName("Expr")
L = OneOrMore(expr)
--
GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85
--
>http://mail.python.org/mailman/listinfo/python-list
Nov 23 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could....
4
by: the.theorist | last post by:
Hey, I'm trying my hand and pyparsing a log file (named l.log): FIRSTLINE PROPERTY1 DATA1 PROPERTY2 DATA2 PROPERTYS LIST ID1 data1 ID2 data2
3
by: rh0dium | last post by:
Hi all, I have a file which I need to parse and I need to be able to break it down by sections. I know it's possible but I can't seem to figure this out. The sections are broken by <> with...
13
by: 7stud | last post by:
To the developer: 1) I went to the pyparsing wiki to download the pyparsing module and try it 2) At the wiki, there was no index entry in the table of contents for Downloads. After searching...
1
by: Steve | last post by:
Hi All (especially Paul McGuire!) Could you lend a hand in the grammar and paring of the output from the function win32pdhutil.ShowAllProcesses()? This is the code that I have so far (it is...
1
by: Neal Becker | last post by:
I'm just trying out pyparsing. I get stack overflow on my first try. Any help? #/usr/bin/python from pyparsing import Word, alphas, QuotedString, OneOrMore, delimitedList first_line = ''...
18
by: Just Another Victim of the Ambient Morality | last post by:
Is pyparsing really a recursive descent parser? I ask this because there are grammars it can't parse that my recursive descent parser would parse, should I have written one. For instance: ...
3
by: hubritic | last post by:
I am trying to parse data that looks like this: IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 2BFA76F6 1208230607 T S SYSPROC SYSTEM SHUTDOWN BY USER...
5
by: Paul McGuire | last post by:
I've just uploaded to SourceForge and PyPI the latest update to pyparsing, version 1.5.1. It has been a couple of months since 1.5.0 was released, and a number of bug-fixes and enhancements have...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made but the http to https rule only works for...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. header("Location:".$urlback); Is this the right layout the...
2
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it so the python app could use a http request to get...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.