Is pyparsing really a recursive descent parser?

Just Another Victim of the Ambient Morality

Is pyparsing really a recursive descent parser? I ask this because
there are grammars it can't parse that my recursive descent parser would
parse, should I have written one. For instance:
from pyparsing import *

grammar = OneOrMore(Word(alphas)) + Literal('end')
grammar.parseString('First Second Third end')
Amazingly enough, this will fail to parse!
Now, maybe I have no idea what I'm talking about but shouldn't a
recursive descent parser recursively descend through all possible rule
combinations to discover one that works? Why does pyparsing fail to parse
this? Is there a way to get pyparsing to parse a grammar like this?
Thank you...

Nov 2 '07 #1

Subscribe Post Reply

4676

Marc 'BlackJack' Rintsch

On Fri, 02 Nov 2007 06:05:13 +0000, Just Another Victim of the Ambient
Morality wrote:

Is pyparsing really a recursive descent parser? I ask this because
there are grammars it can't parse that my recursive descent parser would
parse, should I have written one. For instance:
from pyparsing import *

grammar = OneOrMore(Word(alphas)) + Literal('end')
grammar.parseString('First Second Third end')
Amazingly enough, this will fail to parse!
Now, maybe I have no idea what I'm talking about but shouldn't a
recursive descent parser recursively descend through all possible rule
combinations to discover one that works? Why does pyparsing fail to parse
this?

Pyparsing is no recursive descent parser. It doesn't go back in the input
stream. The ``OneOrMore(Word(alphas))`` part "eats" the 'end' and when it
can't get more, the parser moves to the ``Literal('end')`` part which
fails because the 'end' is already gone.

Is there a way to get pyparsing to parse a grammar like this?

Negative lookahead maybe:

grammar = (OneOrMore(NotAny(Literal('end')) + Word(alphas))
+ Literal('end'))

Ciao,
Marc 'BlackJack' Rintsch

Nov 2 '07 #2

Neil Cerutti

On 2007-11-04, Kay Schluehr <ka**********@gmx.netwrote:

On 4 Nov., 03:07, Neil Cerutti <horp...@yahoo.comwrote:
>I wouldn't characterize it as pretending. How would you parse:

hello end hello end

"WORD END WORD END" and "WORD WORD WORD END" are both valid
interpretations, according to the grammar.

As soon as you remove the ambiguity from the grammar, PyParsing
starts to work correctly.

I think you are correct about this. But I'm not sure how much
it shall matter. Just take a look at Pythons Grammar

http://svn.python.org/view/python/tr...46&view=markup

Without special keyword treatment this grammar would be
ambigous and couldn't be parsed using an LL(1) parser.

I agree. I don't know how easy it is to create keywords using
PyParsing, or if the grammar in question would still be
considered correct by the author.

The "grammar compiler" which builds the parser tables creates a
special label for each keyword. This label is filtered when a
NAME token is feeded into the parser. With the label that
belongs to e.g. 'if' or 'while' the correct statement can be
selected in constant time. Same happens when I use the parser
generator with your EBNF grammar. With a little more adaption
also NUMBER token could be filtered. But this would be
overdesign.

Theoretical beauty is compromised here using reasonable default
assumptions for keeping the grammar simple ( "convention over
configuration" to borrow a Rails slogan ).

Keywords are practically ubiquitous. I never thought of them as
unbeautiful before.

Tokenization is another issue in Python. It is indeed somewhat
special due to significant whitespace and line continuation but
tokenization is conceptually much simpler and one would likely
throw all kinds of options and special case handling in the
lexical analysis phase.

It might be a quick fix in PyParsing, which includes a Keyword
type, but without the semantics that are needed in this case. You
have to use (as suggested earlier) negative lookahead in either a
Regex or with the NotAny type.

>>goal = OneOrMore(NotAny(Literal('end')) + Word(alphas)) + Literal('end')
goal.parseString('hello hello hello end')

(['hello', 'hello', 'hello', 'end'], {})

>>goal.parseString('hello end hello end')

(['hello', 'end'], {})

No scanner/tokenizer needed! ;)

--
Neil Cerutti

Nov 4 '07 #3

Just Another Victim of the Ambient Morality

"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:wP*******************@newsreading01.news.tds. net...

On 2007-11-04, Just Another Victim of the Ambient Morality
<ih*******@hotmail.comwrote:
>>
"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:oz*******************@newsreading01.news.tds .net...
>>>
Is there not an ambiguity in the grammar?

In EBNF:

goal --WORD { WORD } END

WORD is '[a-zA-Z]+'
END is 'end'

I think it is fine that PyParsing can't guess what the composer
of that grammar meant.

One interpretation conforms to the grammar while the other
doesn't. You would assume that the interpretation that agrees
with the grammar would be the preferable choice and so should
the program. Secondly, even if it is an ambiguity... so what?
pyparsing's current behaviour is to return a parse error,
pretending that the string can't be parsed. Ideally, perhaps
it should alert you to the ambiguity but, surely, it's better
to return _a_ valid parsing than to pretend that the string
can't be parsed at all...

I wouldn't characterize it as pretending. How would you parse:

hello end hello end

"WORD END WORD END" and "WORD WORD WORD END" are both valid
interpretations, according to the grammar.

...and it would be nice if the parser were to parse one of them since
they are both right. Having more than one right answer is not the same as
having no answer, which is what pyparsing claims...

As soon as you remove the ambiguity from the grammar, PyParsing
starts to work correctly.

This is simply not true. Try this:
grammar = OneOrMore(Word(alphas)) + Literal('end') + Literal('.')
grammar.parseString('First Second Third end.')
...again, this will fail to parse. Where's the ambiguity?
Besides, parsing ambiguous grammars is a useful feature. Not all
grammars being parsed are designed by those doing the parsing...

Consider writing a recursive decent parser by hand to parse the
language '[ab]+b'.

goal --ab_list 'b'
ab_list --'a' list_tail
ab_list --'b' list_tail
list_tail --'a' list_tail
list_tail --'b' list_tail
list_tail --null
The above has the exact same bug (and probably some others--I'm
sorry unable to test it just now) as the PyParsing solution.

The error is in the grammar. It might be fixed by specifying that
'b' must be followed by EOF, and then it could be coded by using
more than one character of lookahead.

I don't exactly understand the syntax you used to describe the
productions of your recursive descent parser so not only did I not follow it
but I couldn't make out the rest of your post. Could you explain in a
little more detail? The last part that points to 'null' is especially
confusing...
As demonstrated earlier, it's not just the grammar. There are
situations that are unambiguous that pyparsing can't parse simply and
there's no reason for it.
Besides, ambiguous grammars are a fact of life and some of us need to
parse them. It's usually okay, too. Consider a previous example:
grammar = OneOrMore(Word(alphas)) + Literal('end')
While you may consider this inherently ambiguous, it's usually not.
That is to say, as long as it is rare that 'end' is used not at the end of
the string, this will simply parse and, yet, pyparsing will consistently
fail to parse it...

Nov 4 '07 #4

Neil Cerutti

On 2007-11-04, Just Another Victim of the Ambient Morality
<ih*******@hotmail.comwrote:

>Consider writing a recursive decent parser by hand to parse
the language '[ab]+b'.

goal --ab_list 'b'
ab_list --'a' list_tail
ab_list --'b' list_tail
list_tail --'a' list_tail
list_tail --'b' list_tail
list_tail --null
The above has the exact same bug (and probably some others--I'm
sorry unable to test it just now) as the PyParsing solution.

The error is in the grammar. It might be fixed by specifying that
'b' must be followed by EOF, and then it could be coded by using
more than one character of lookahead.

I don't exactly understand the syntax you used to describe the
productions of your recursive descent parser so not only did I not follow it
but I couldn't make out the rest of your post. Could you explain in a
little more detail? The last part that points to 'null' is especially
confusing...

It's the BNF spelling of

goal --ab_list 'b'
ab_list --ab { ab }
ab --'a' | 'b'

The null is to say that list_tail can match nothing, i.e, an
empty string.

Then, in the Parser class, every method (except for match, which
is used as a central place to consume characters) corresponds to
one of the productions in the BNF. Breaking things down into
BNF-based productions often makes implementation, debugging and
code generation easier.

PyParsing saves me that stop, since I can often directly
implement the EBNF using PyParsing.

As demonstrated earlier, it's not just the grammar. There are
situations that are unambiguous that pyparsing can't parse
simply and there's no reason for it.

Yes, many parser generators have many more limitations than just
the requirement of an unambiguous grammar.

Besides, ambiguous grammars are a fact of life and some of us
need to parse them. It's usually okay, too. Consider a
previous example:

grammar = OneOrMore(Word(alphas)) + Literal('end')

While you may consider this inherently ambiguous, it's usually
not. That is to say, as long as it is rare that 'end' is used
not at the end of the string, this will simply parse and, yet,
pyparsing will consistently fail to parse it...

I believe there's no cure for the confusion you're having except
for implementing a parser for your proposed grammar.
Alternatively, try implementing your grammar in one of your other
favorite parser generators.

--
Neil Cerutti

Nov 4 '07 #5

Just Another Victim of the Ambient Morality

"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:nP*******************@newsreading01.news.tds. net...

On 2007-11-04, Just Another Victim of the Ambient Morality
<ih*******@hotmail.comwrote:

>>Consider writing a recursive decent parser by hand to parse
the language '[ab]+b'.

goal --ab_list 'b'
ab_list --'a' list_tail
ab_list --'b' list_tail
list_tail --'a' list_tail
list_tail --'b' list_tail
list_tail --null
The above has the exact same bug (and probably some others--I'm
sorry unable to test it just now) as the PyParsing solution.

The error is in the grammar. It might be fixed by specifying that
'b' must be followed by EOF, and then it could be coded by using
more than one character of lookahead.

I don't exactly understand the syntax you used to describe the
productions of your recursive descent parser so not only did I not follow
it
but I couldn't make out the rest of your post. Could you explain in a
little more detail? The last part that points to 'null' is especially
confusing...

It's the BNF spelling of

goal --ab_list 'b'
ab_list --ab { ab }
ab --'a' | 'b'

The null is to say that list_tail can match nothing, i.e, an
empty string.

Then, in the Parser class, every method (except for match, which
is used as a central place to consume characters) corresponds to
one of the productions in the BNF. Breaking things down into
BNF-based productions often makes implementation, debugging and
code generation easier.

PyParsing saves me that stop, since I can often directly
implement the EBNF using PyParsing.

Okay, I see that now, thank you.
Your statement from the previous post:

>Consider writing a recursive decent parser by hand to parse
the language '[ab]+b'.

goal --ab_list 'b'
ab_list --'a' list_tail
ab_list --'b' list_tail
list_tail --'a' list_tail
list_tail --'b' list_tail
list_tail --null
The above has the exact same bug (and probably some others--I'm
sorry unable to test it just now) as the PyParsing solution.

...merely demonstrates that this grammar is similarly ambiguous. There
are many ways to parse this correctly and pyparsing chooses none of these!
Instead, it returns the same error it does when the string has no
solutions...

>As demonstrated earlier, it's not just the grammar. There are
situations that are unambiguous that pyparsing can't parse
simply and there's no reason for it.

Yes, many parser generators have many more limitations than just
the requirement of an unambiguous grammar.

Yes, but a recursive descent parser? I expect such things from LALR and
others, but not only do I expect a recursive descent parser to correctly
parse grammars but I expect it to even parse ambiguous ones, in that it is
the only technique prepared to find more than one solution...

>Besides, ambiguous grammars are a fact of life and some of us
need to parse them. It's usually okay, too. Consider a
previous example:

grammar = OneOrMore(Word(alphas)) + Literal('end')

While you may consider this inherently ambiguous, it's usually
not. That is to say, as long as it is rare that 'end' is used
not at the end of the string, this will simply parse and, yet,
pyparsing will consistently fail to parse it...

I believe there's no cure for the confusion you're having except
for implementing a parser for your proposed grammar.
Alternatively, try implementing your grammar in one of your other
favorite parser generators.

I believe there is a cure and it's called recursive descent parsing.
It's slow, obviously, but it's correct and, sometimes (arguably, often),
that's more important the execution speed.

I spent this morning whipping up a proof of concept parser whose
interface greatly resembles pyparsing but, baring unknown bugs, works and
works as I'd expect a recursive descent parser to work. I don't know Python
very well so the parser is pretty simple. It only lexes single characters
as tokens. It only supports And, Or, Optional, OneOrMore and ZeroOrMore
rules but I already think this is a rich set of rules. I'm sure others can
be added. Finally, I'm not sure it's safely copying all its parameter input
the same way pyparsing does but surely those bugs can be worked out. It's
merely a proof of concept to demonstrate a point.
Everyone, please look it over and tell me what you think.
Unfortunately, my news client is kind of poor, so I can't simply cut and
paste the code into here. All the tabs get turned into single spacing, so I
will post this link, instead:
http://theorem.ca/~dlkong/new_pyparsing.zip
I hope you can all deal with .zip files. Let me know if this is a
problem.
Thank you...

Nov 4 '07 #6

Neil Cerutti

On 2007-11-04, Just Another Victim of the Ambient Morality
<ih*******@hotmail.comwrote:

"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:nP*******************@newsreading01.news.tds. net...
>I believe there's no cure for the confusion you're having except
for implementing a parser for your proposed grammar.
Alternatively, try implementing your grammar in one of your other
favorite parser generators.

I believe there is a cure and it's called recursive descent
parsing. It's slow, obviously, but it's correct and, sometimes
(arguably, often), that's more important the execution speed.

I spent this morning whipping up a proof of concept parser
whose interface greatly resembles pyparsing but, baring unknown
bugs, works and works as I'd expect a recursive descent parser
to work. I don't know Python very well so the parser is pretty
simple. It only lexes single characters as tokens. It only
supports And, Or, Optional, OneOrMore and ZeroOrMore rules but
I already think this is a rich set of rules. I'm sure others
can be added. Finally, I'm not sure it's safely copying all
its parameter input the same way pyparsing does but surely
those bugs can be worked out. It's merely a proof of concept
to demonstrate a point.
Everyone, please look it over and tell me what you think.
Unfortunately, my news client is kind of poor, so I can't
simply cut and paste the code into here. All the tabs get
turned into single spacing, so I will post this link, instead:

http://theorem.ca/~dlkong/new_pyparsing.zip

Your program doesn't necessarily address the ambiguity in the
grammar in question, since right now it is only a recognizer.
Will it be hard to get it to return a parse tree?

The grammar in your implementation is:

>>goal = OneOrMore(RuleAnd('a') | RuleAnd('b')) + RuleAnd('b')
goal.parse(0, 'ab')

True

>>goal.parse(0, 'ba')

False

>>goal.parse(0, 'b')

False

>>goal.parse(0, 'aaab')

True

>>goal.parse(0, 'abc')

True

So far so good. :)

--
Neil Cerutti

Nov 4 '07 #7

Just Another Victim of the Ambient Morality

"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:qz******************@newsreading01.news.tds.n et...

On 2007-11-04, Just Another Victim of the Ambient Morality
<ih*******@hotmail.comwrote:
>"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:nP*******************@newsreading01.news.tds .net...
>>I believe there's no cure for the confusion you're having except
for implementing a parser for your proposed grammar.
Alternatively, try implementing your grammar in one of your other
favorite parser generators.

I believe there is a cure and it's called recursive descent
parsing. It's slow, obviously, but it's correct and, sometimes
(arguably, often), that's more important the execution speed.

I spent this morning whipping up a proof of concept parser
whose interface greatly resembles pyparsing but, baring unknown
bugs, works and works as I'd expect a recursive descent parser
to work. I don't know Python very well so the parser is pretty
simple. It only lexes single characters as tokens. It only
supports And, Or, Optional, OneOrMore and ZeroOrMore rules but
I already think this is a rich set of rules. I'm sure others
can be added. Finally, I'm not sure it's safely copying all
its parameter input the same way pyparsing does but surely
those bugs can be worked out. It's merely a proof of concept
to demonstrate a point.
Everyone, please look it over and tell me what you think.
Unfortunately, my news client is kind of poor, so I can't
simply cut and paste the code into here. All the tabs get
turned into single spacing, so I will post this link, instead:

http://theorem.ca/~dlkong/new_pyparsing.zip

Your program doesn't necessarily address the ambiguity in the
grammar in question, since right now it is only a recognizer.
Will it be hard to get it to return a parse tree?

Hey, it's only a proof of concept. If you can parse the tree, surely
you can record what you parsed, right?
Did you notice that the parse() functions have the rather serious bug of
not returning how much of the string they could parse? It just so happens
that the contstructions that I made only ever had to increment the matches
by one, so they just happen to work. That's an easy bug to fix but a pretty
major one to have overlooked. Hence, my enthusiasm for input...

The grammar in your implementation is:

>>>goal = OneOrMore(RuleAnd('a') | RuleAnd('b')) + RuleAnd('b')
goal.parse(0, 'ab')

True

>>>goal.parse(0, 'ba')

False

>>>goal.parse(0, 'b')

False

>>>goal.parse(0, 'aaab')

True

>>>goal.parse(0, 'abc')

True

So far so good. :)

Good! Keep hammering at it!
More importantly, study it to understand the idea I'm trying to convey.
This is what I thought a recursive descent parser would do...

Nov 5 '07 #8

Kay Schluehr

On Nov 4, 10:44 pm, "Just Another Victim of the Ambient Morality"
<ihates...@hotmail.com>

I believe there is a cure and it's called recursive descent parsing.
It's slow, obviously, but it's correct and, sometimes (arguably, often),
that's more important the execution speed.

Recursive decendent parsing is not necessarily slow but from your
remarks above I infer you want a general RD parser with backtracking:
when one rule doesn't match, try another one to derive the current
symbol in the input stream.

I'm not sure one needs to start again with a naive approach just to
avoid any parser theory. For a user of a parser it is quite important
whether she has to wait 50 seconds for a parse to run or 50
milliseconds. I don't like to compromise speed for implementation
simplicity here.

Nov 5 '07 #9

Neil Cerutti

On 2007-11-05, Just Another Victim of the Ambient Morality <ih*******@hotmail.comwrote:

>
"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:qz******************@newsreading01.news.tds.n et...
>On 2007-11-04, Just Another Victim of the Ambient Morality
<ih*******@hotmail.comwrote:
>>"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:nP*******************@newsreading01.news.td s.net...
I believe there's no cure for the confusion you're having except
for implementing a parser for your proposed grammar.
Alternatively, try implementing your grammar in one of your other
favorite parser generators.

I believe there is a cure and it's called recursive descent
parsing. It's slow, obviously, but it's correct and, sometimes
(arguably, often), that's more important the execution speed.

I spent this morning whipping up a proof of concept parser
whose interface greatly resembles pyparsing but, baring unknown
bugs, works and works as I'd expect a recursive descent parser
to work. I don't know Python very well so the parser is pretty
simple. It only lexes single characters as tokens. It only
supports And, Or, Optional, OneOrMore and ZeroOrMore rules but
I already think this is a rich set of rules. I'm sure others
can be added. Finally, I'm not sure it's safely copying all
its parameter input the same way pyparsing does but surely
those bugs can be worked out. It's merely a proof of concept
to demonstrate a point.
Everyone, please look it over and tell me what you think.
Unfortunately, my news client is kind of poor, so I can't
simply cut and paste the code into here. All the tabs get
turned into single spacing, so I will post this link, instead:

http://theorem.ca/~dlkong/new_pyparsing.zip

Your program doesn't necessarily address the ambiguity in the
grammar in question, since right now it is only a recognizer.
Will it be hard to get it to return a parse tree?

Hey, it's only a proof of concept. If you can parse the tree, surely
you can record what you parsed, right?
Did you notice that the parse() functions have the rather serious bug of
not returning how much of the string they could parse?

Unfortunately I haven't had much time to play with it today; just
barely enough to put it through a very few paces.

It just so happens that the contstructions that I made only
ever had to increment the matches by one, so they just happen
to work. That's an easy bug to fix but a pretty major one to
have overlooked. Hence, my enthusiasm for input...

>The grammar in your implementation is:

>>>>goal = OneOrMore(RuleAnd('a') | RuleAnd('b')) + RuleAnd('b')
goal.parse(0, 'ab')
True
>>>>goal.parse(0, 'ba')
False
>>>>goal.parse(0, 'b')
False
>>>>goal.parse(0, 'aaab')
True
>>>>goal.parse(0, 'abc')
True

So far so good. :)

Good! Keep hammering at it!
More importantly, study it to understand the idea I'm
trying to convey. This is what I thought a recursive descent
parser would do...

Kay has pointed out how it works. Strangely enough, I've never
studied a backtracking RDP before (trying to teach yourself a
subject like parsing can be tricky--I've had to somehow avoid all
the texts that overuse Greek letters--those incomprehensible
symbols confuse the hell out of me). It does simplify the job of
the grammar designer, but Kay's message makes it sound like it
won't scale very well.

It might, perhaps, be an interesting feature for PyParsing to
entertain by setting a 'backtracking' option, for when you're
writing a quick script and don't want to fuss too much with a
non-conformant grammar.

I'll have more time to look at it tomorrow.

--
Neil Cerutti

Nov 5 '07 #10

Neil Cerutti

On 2007-11-05, Neil Cerutti <ho*****@yahoo.comwrote:

def Ack(x, y):
""" The Ackermann function. Creates a humongous mess even
with quite tiny numbers. """
if x < 0 or y < 0:
raise ValueError('non-negative integer')
elif x == 0:
return y + 1
elif y == 0:
return foo3(x-1, 1)
else:
return foo3(x-1, foo3(x, y-1))

Urk! Of course those foo3 calls should have been Ack calls.

--
Neil Cerutti

Nov 5 '07 #11

asdfjehqwjerhqjwljekrh

On Nov 2, 6:47 am, Paul McGuire <pt...@austin.rr.comwrote:

Well I'll be darned! All this time, I thought "recursive descent"
described the recursive behavior of the parser, which pyparsing
definitely has. I never knew that backing up in the face of parse
mismatches was a required part of the picture.

I looked at pyparsing about a year ago for some project and realized
that it wouldn't quite do what I needed it to. Maddeningly enough, I
cannot remember exactly what the problem was, but I think that it was
some combination of lack of backtracking and insufficient control over
whitespace skipping.

Mostly off-topic, what I could *really* use is something like this
folded into the Python standard library. I cannot count the number of
times that this would have saved me 30 lines of code.

http://mail.python.org/pipermail/pyt...st/047042.html

Mike

Nov 6 '07 #12

Neil Cerutti

On 2007-11-07, Just Another Victim of the Ambient Morality
<ih*******@hotmail.comwrote:

"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:sl********************@FIAD06.norwich.edu...
>You might be interested in the Early parsing algorithm. It is
more efficient than the naive approach used in your prototype,
and still handles ambiguous grammars.

I'll take this opportunity to correct my misspelling. It's
"Earley".

I think I might be interested in this algorithm, thank you!

>http://www.cpsc.ucalgary.ca/~aycock/spark/

You know, I tried this thing but, for the life of me, I
can't figure out how to use it and the few tutorials out there
are less than illuminating...

I'll send you the code I composed.

The tricky part of Spark is the Token and AST classes you have to
use. A type used as a token class is required to provide a
__cmp__ function that behaves in the following confounding
manner:

class Token(object):
def __cmp__(self, o):
return cmp(self.type, o)

If you attempt to use a list, string or a tuple as token, it just
barfs. AST's are required to provide an even weirder interface.

In effect, you have to write badly designed wrappers around
tuples and lists, respectively to take advantage of the generic
classes.

Go to the examples directory of the distribution to find working
versions of these stupid classes.

Once you get over that hurdle it becomes easier. Be sure to
provide your Parser and Scanner classes with an error method to
prevent the library from raising SystemExit(!) on errors. Scanner
classes are also required to override the t_default method to
prevent this mishap.

In short, it hasn't really evovled into a user-friendly package
yet.

--
Neil Cerutti

Nov 7 '07 #13

Just Another Victim of the Ambient Morality

"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:IL*******************@newsreading01.news.tds. net...

On 2007-11-07, Just Another Victim of the Ambient Morality
<ih*******@hotmail.comwrote:
>"Neil Cerutti" <ho*****@yahoo.comwrote in message
news:sl********************@FIAD06.norwich.edu. ..
>>You might be interested in the Early parsing algorithm. It is
more efficient than the naive approach used in your prototype,
and still handles ambiguous grammars.

I'll take this opportunity to correct my misspelling. It's
"Earley".

>I think I might be interested in this algorithm, thank you!

>>http://www.cpsc.ucalgary.ca/~aycock/spark/

You know, I tried this thing but, for the life of me, I
can't figure out how to use it and the few tutorials out there
are less than illuminating...

I'll send you the code I composed.

The tricky part of Spark is the Token and AST classes you have to
use. A type used as a token class is required to provide a
__cmp__ function that behaves in the following confounding
manner:

class Token(object):
def __cmp__(self, o):
return cmp(self.type, o)

If you attempt to use a list, string or a tuple as token, it just
barfs. AST's are required to provide an even weirder interface.

In effect, you have to write badly designed wrappers around
tuples and lists, respectively to take advantage of the generic
classes.

Go to the examples directory of the distribution to find working
versions of these stupid classes.

Once you get over that hurdle it becomes easier. Be sure to
provide your Parser and Scanner classes with an error method to
prevent the library from raising SystemExit(!) on errors. Scanner
classes are also required to override the t_default method to
prevent this mishap.

In short, it hasn't really evovled into a user-friendly package
yet.

Thank you.
How is it that I seem to be the only one in the market for a correct
parser? Earley has a runtine of O(n^3) in the worst case and O(n^2)
typically. I have trouble believing that everyone else in the world has
such intense run-time requirements that they're willing to forego
correctness. Why can't I find a pyparsing-esque library with this
implementation? I'm tempted to roll my own except that it's a fairly
complicated algorithm and I don't really understand how it's any more
efficient than the naive approach...

Nov 7 '07 #14

Steven D'Aprano

On Wed, 07 Nov 2007 21:15:50 +0000, Just Another Victim of the Ambient
Morality wrote:

Why can't I find a pyparsing-esque library with this implementation?
I'm tempted to roll my own except that it's a fairly complicated
algorithm and I don't really understand how it's any more efficient than
the naive approach...

I think you may have just answered your own question :)
--
Steven.

Nov 7 '07 #15

Just Another Victim of the Ambient Morality

"Steven D'Aprano" <st***@REMOVE-THIS-cybersource.com.auwrote in message
news:13*************@corp.supernews.com...

On Wed, 07 Nov 2007 21:15:50 +0000, Just Another Victim of the Ambient
Morality wrote:

>Why can't I find a pyparsing-esque library with this implementation?
I'm tempted to roll my own except that it's a fairly complicated
algorithm and I don't really understand how it's any more efficient than
the naive approach...

I think you may have just answered your own question :)

Yes, but my own answer lacks detail that I was hoping you could
provide...

Nov 7 '07 #16

Kay Schluehr

What do you mean by "disambiguate?" Do you mean disambiguate the

grammar? One of the conditions of the problem is that you have no control
over the grammar, so that's really not an option. Also, an implicit
condition of solving a problem is that the problem be... solved, so not
accepting the input is not an option, either.

Could you please learn some parser theory 101 and then come back when
you have complaints about one or the other implemententation of a
particular Python parser? When some guy enters a forum with a "newbie
question" most of the time people are willing to give a fair and
comprehensive answer but they don't want to mess around with him
endlessly when he is not willing to learn and to listen.

Nov 8 '07 #17

[david]

Kay Schluehr wrote:

>
Could you please learn some parser theory 101 and then come back when
you have complaints about one or the other implemententation of a
particular Python parser? When some guy enters a forum with a "newbie
question" most of the time people are willing to give a fair and
comprehensive answer but they don't want to mess around with him
endlessly when he is not willing to learn and to listen.

Immaturity is a mark of this group, isn't it? When all else fails,
resort to insults.

My suggestion is, if you can't be polite, just go back to your own room.

(david)

Nov 8 '07 #18

Kay Schluehr

On Nov 9, 12:28 am, "[david]" <da...@nospam.spamwrote:

Kay Schluehr wrote:

Could you please learn some parser theory 101 and then come back when
you have complaints about one or the other implemententation of a
particular Python parser? When some guy enters a forum with a "newbie
question" most of the time people are willing to give a fair and
comprehensive answer but they don't want to mess around with him
endlessly when he is not willing to learn and to listen.

Immaturity is a mark of this group, isn't it? When all else fails,
resort to insults.

Dear [david], when all else fails, be rigorous. Tell someone that it
is not o.k. to keep people busy with his complaints when he doesn't
know what he is talking about. This is *my* rejection of inpoliteness
and has nothing to do with "this group" in the first place. But I'm
not entirely sure I want to discuss about polite behaviour with
someone not using his full name.

Kay

Nov 9 '07 #19

Is pyparsing really a recursive descent parser?

Similar topics