By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,270 Members | 1,703 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,270 IT Pros & Developers. It's quick & easy.

pyparsing

P: n/a
Hello !

I am trying to understand pyparsing. Here is a little test program
to check Optional subclass:

from pyparsing import Word,nums,Literal,Optional

lbrack=Literal("[").suppress()
rbrack=Literal("]").suppress()
ddot=Literal(":").suppress()
start = Word(nums+".")
step = Word(nums+".")
end = Word(nums+".")

sequence=lbrack+start+Optional(ddot+step)+ddot+end +rbrack

tokens = sequence.parseString("[0:0.1:1]")
print tokens

tokens1 = sequence.parseString("[1:2]")
print tokens1

It works on tokens, but the error message is showed on
the second string ("[1:2]"). I don't get it. I did use
Optional for ddot and step so I guess they are optional.

Any hints what I am doing wrong?

The versions are pyparsing 1.1.2 and Python 2.3.3.

Thanks,

B.
Jul 18 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
On Thu, 13 May 2004 08:05:32 +0200, bo***********@mf.uni-lj.si
(Boštjan Jerko) wrote:
Hello !

I am trying to understand pyparsing. Here is a little test program
to check Optional subclass:

from pyparsing import Word,nums,Literal,Optional

lbrack=Literal("[").suppress()
rbrack=Literal("]").suppress()
ddot=Literal(":").suppress()
start = Word(nums+".")
step = Word(nums+".")
end = Word(nums+".")

sequence=lbrack+start+Optional(ddot+step)+ddot+en d+rbrack

tokens = sequence.parseString("[0:0.1:1]")
print tokens

tokens1 = sequence.parseString("[1:2]")
print tokens1

It works on tokens, but the error message is showed on
the second string ("[1:2]"). I don't get it. I did use
Optional for ddot and step so I guess they are optional.

Any hints what I am doing wrong?

The versions are pyparsing 1.1.2 and Python 2.3.3.

Thanks,

B.

I don't see anything "obviously" wrong to me, but changing it thusly
seems to resolve the problem (I added a few intermediate rules to
make it more obvious):

pref = lbrack + start
midf = ddot + step
suff = ddot + end + rbrack
sequence = pref + midf + suff | pref + suff

I've run into "this kind of thing" now and again, and have always
been able to resolve it by reorganizing my rules.

--dang
Jul 18 '05 #2

P: n/a
"Boštjan Jerko" <bo***********@mf.uni-lj.si> wrote in message
news:87************@bostjan-pc.mf.uni-lj.si...
Hello !

I am trying to understand pyparsing. Here is a little test program
to check Optional subclass:

from pyparsing import Word,nums,Literal,Optional

lbrack=Literal("[").suppress()
rbrack=Literal("]").suppress()
ddot=Literal(":").suppress()
start = Word(nums+".")
step = Word(nums+".")
end = Word(nums+".")

sequence=lbrack+start+Optional(ddot+step)+ddot+end +rbrack

tokens = sequence.parseString("[0:0.1:1]")
print tokens

tokens1 = sequence.parseString("[1:2]")
print tokens1

It works on tokens, but the error message is showed on
the second string ("[1:2]"). I don't get it. I did use
Optional for ddot and step so I guess they are optional.

Any hints what I am doing wrong?

The versions are pyparsing 1.1.2 and Python 2.3.3.

Thanks,

B.

Bostjan -

Here's how pyparsing is processing your input strings:

[0:0.1:1]
[ = lbrack
0 = start
:0.1 = ddot + step (Optional match)
: = ddot
1 = end
] = rbrack

[1:2]
[ = lbrack
1 = start
:2 = ddot + step (Optional match)
] = oops! expected ddot -> failure
Dang Griffith proposed one alternative construct, here's another, perhaps
more explicit:
lbrack + ( ( ddot + step + ddot + end ) | (ddot + end) ) + rbrack

Note that the order of the inner construct is important, so as to not match
ddot+end before trying ddot+step+ddot+end; '|' is a greedy matching
operator, creating a MatchFirst object from pyparsing's class library. You
could avoid this confusion by using '^', which generates an Or object:
lbrack + ( (ddot + end) ^ ( ddot + step + ddot + end ) ) + rbrack
This will evaluate both subconstructs, and choose the longer of the two.

Or you can use another pyparsing helper, the delimited list
lbrack + delimitedlist( Word(nums+"."), delim=":") + rbrack
This implicitly suppresses delimiters, so that all you will get back are
["1","0.1","1"] in the first case and ["1","2"] in the second.

Happy pyparsing!
-- Paul
Jul 18 '05 #3

P: n/a
> Dang Griffith proposed one alternative construct, here's another, perhaps
more explicit:
lbrack + ( ( ddot + step + ddot + end ) | (ddot + end) ) + rbrack

should be:
lbrack + start + ( ( ddot + step + ddot + end ) | (ddot + end) ) +
rbrack
Note that the order of the inner construct is important, so as to not match ddot+end before trying ddot+step+ddot+end; '|' is a greedy matching
operator, creating a MatchFirst object from pyparsing's class library. You could avoid this confusion by using '^', which generates an Or object:
lbrack + ( (ddot + end) ^ ( ddot + step + ddot + end ) ) + rbrack
should be:
lbrack + start + ( (ddot + end) ^ ( ddot + step + ddot + end ) ) +
rbrack
This will evaluate both subconstructs, and choose the longer of the two.

Or you can use another pyparsing helper, the delimited list
lbrack + delimitedlist( Word(nums+"."), delim=":") + rbrack
at least this one is correct! No, wait, I mis-cased delimitedList!
should be:
lbrack + delimitedList( Word(nums+"."), delim=":") + rbrack
This implicitly suppresses delimiters, so that all you will get back are
["1","0.1","1"] in the first case and ["1","2"] in the second.

Happy pyparsing!
-- Paul

Sorry for the sloppiness,
-- Paul
Jul 18 '05 #4

P: n/a
Paul,

thanks for the explanation.

Boštjan

On Fri, 14 May 2004, pt***@austin.rr._bogus_.com spake:
Dang Griffith proposed one alternative construct, here's another, perhaps
more explicit:
lbrack + ( ( ddot + step + ddot + end ) | (ddot + end) ) +
rbrack


should be:
lbrack + start + ( ( ddot + step + ddot + end ) | (ddot + end)
) +
rbrack
Note that the order of the inner construct is important, so as to
not

match
ddot+end before trying ddot+step+ddot+end; '|' is a greedy matching
operator, creating a MatchFirst object from pyparsing's class
library.

You
could avoid this confusion by using '^', which generates an Or
object: lbrack + ( (ddot + end) ^ ( ddot + step + ddot + end )
) + rbrack


should be:
lbrack + start + ( (ddot + end) ^ ( ddot + step + ddot + end )
) +
rbrack
This will evaluate both subconstructs, and choose the longer of the
two.

Or you can use another pyparsing helper, the delimited list
lbrack + delimitedlist( Word(nums+"."), delim=":") + rbrack


at least this one is correct! No, wait, I mis-cased delimitedList!
should be:
lbrack + delimitedList( Word(nums+"."), delim=":") + rbrack
This implicitly suppresses delimiters, so that all you will get
back are ["1","0.1","1"] in the first case and ["1","2"] in the
second.

Happy pyparsing!
-- Paul

Sorry for the sloppiness,
-- Paul

Jul 18 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.