By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,361 Members | 1,824 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,361 IT Pros & Developers. It's quick & easy.

trouble pyparsing

P: n/a
Hey, I'm trying my hand and pyparsing a log file (named l.log):
FIRSTLINE

PROPERTY1 DATA1
PROPERTY2 DATA2

PROPERTYS LIST
ID1 data1
ID2 data2

ID1 data11
ID2 data12

SECTION

So I wrote up a small bit of code (named p.py):
from pyparsing import *
import sys

toplevel = Forward()

firstLine = Word('FIRSTLINE')
property = (Word('PROPERTY1') + Word(alphanums)) ^ (Word('PROPERTY2')
+ Word(alphanums))

id = (Word('ID1') + Word(alphanums)) ^ (Word('ID2') +
Word(alphanums))
plist = Word('PROPERTYS LIST') + ZeroOrMore( id )

toplevel << firstLine
toplevel << OneOrMore( property )
toplevel << plist

par = toplevel

print toplevel.parseFile(sys.argv[1])

The problem is that I get the following error:
Traceback (most recent call last):
File "./p.py", line 23, in ?
print toplevel.parseFile(sys.argv[1])
File "/home/erich/tap/lib/python/pyparsing.py", line 833, in
parseFile
return self.parseString(file_contents)
File "/home/erich/tap/lib/python/pyparsing.py", line 622, in
parseString
loc, tokens = self.parse( instring.expandtabs(), 0 )
File "/home/erich/tap/lib/python/pyparsing.py", line 564, in parse
loc,tokens = self.parseImpl( instring, loc, doActions )
File "/home/erich/tap/lib/python/pyparsing.py", line 1743, in
parseImpl
return self.expr.parse( instring, loc, doActions )
File "/home/erich/tap/lib/python/pyparsing.py", line 564, in parse
loc,tokens = self.parseImpl( instring, loc, doActions )
File "/home/erich/tap/lib/python/pyparsing.py", line 1511, in
parseImpl
loc, resultlist = self.exprs[0].parse( instring, loc, doActions )
File "/home/erich/tap/lib/python/pyparsing.py", line 568, in parse
loc,tokens = self.parseImpl( instring, loc, doActions )
File "/home/erich/tap/lib/python/pyparsing.py", line 1068, in
parseImpl
raise exc
pyparsing.ParseException: Expected W:(PROP...) (at char 0), (line:1,
col:1)

I fiddled around with this for quite awhile, and it looks like because
"PROPERTYS LIST" follows one of ['PROPERTY1', 'PROPERTY2'] that pyparse
grabs the overlapping text 'PROPERTY' so that it only has 'S', 'LIST'
when it goes looking for the next thing to parse.

Is this a fundamental error, or is it just me? (I haven't yet tried
simpleparse)

Jan 5 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
"the.theorist" <th**********@gmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Hey, I'm trying my hand and pyparsing a log file (named l.log):
FIRSTLINE

PROPERTY1 DATA1
PROPERTY2 DATA2

PROPERTYS LIST
ID1 data1
ID2 data2

ID1 data11
ID2 data12

SECTION

So I wrote up a small bit of code (named p.py):
from pyparsing import *
import sys

toplevel = Forward()

firstLine = Word('FIRSTLINE')
property = (Word('PROPERTY1') + Word(alphanums)) ^ (Word('PROPERTY2')
+ Word(alphanums))

id = (Word('ID1') + Word(alphanums)) ^ (Word('ID2') +
Word(alphanums))
plist = Word('PROPERTYS LIST') + ZeroOrMore( id )

toplevel << firstLine
toplevel << OneOrMore( property )
toplevel << plist

par = toplevel

print toplevel.parseFile(sys.argv[1])

The problem is that I get the following error: <snip> Is this a fundamental error, or is it just me? (I haven't yet tried
simpleparse)


It's you.

Well, let's focus on the behavior and not the individual. There are two
major misconceptions that you have here:
1. Confusing "Word" for "Literal"
2. Confusing "<<" Forward assignment for some sort of C++ streaming
operator.

What puzzles me is that in some places, you correctly use the Word class, as
in Word(alphanums), to indicate a "word" as a contiguous set of characters
found in the string alphanums. You also correctly use '+' to build up id
and plist expressions, but then you use "<<" successively in what looks like
streaming into the toplevel variable.

When your grammar includes Word("FIRSTLINE"), you are actually saying you
want to match a "word" composed of one ore more letters found in the string
"FIRSTLINE" - this would match not only FIRSTLINE, but also FIRST, LINE,
LIRST, FINE, LIST, FIST, FLINTSTRINE, well, you get the idea. Just the way
Word(alphanums) matches DATA1, DATA2, data1, data2, data11, and data12.

What you really want here is the class Literal, as in Literal("FIRSTLINE").

As for toplevel, there is no reason here to use Forward() - reserve use of
this class for recursive structures, such as lists composed of lists, etc.
toplevel is simply the sequence of a firstline, OneOrMore properties, and a
plist, which is just the plain old:

toplevel = firstline + OneOrMore(property) + plist

Lastly, if you'll peruse the documentation that comes with pyparsing, you'll
also find the Group class. This class is very helpful in imparting some
structure to the returned set of tokens.

Here is a before/after version of your program, that has some more
successful results.

-- Paul
data = """FIRSTLINE

PROPERTY1 DATA1
PROPERTY2 DATA2

PROPERTYS LIST
ID1 data1
ID2 data2

ID1 data11
ID2 data12

SECTION
"""

from pyparsing import *
import sys

#~ toplevel = Forward()

#~ firstLine = Word('FIRSTLINE')
firstLine = Literal('FIRSTLINE')

#~ property = (Word('PROPERTY1') + Word(alphanums)) ^ (Word('PROPERTY2') +
Word(alphanums))
property = (Literal('PROPERTY1') + Word(alphanums)) ^ (Literal('PROPERTY2')
+ Word(alphanums))

#~ id = (Word('ID1') + Word(alphanums)) ^ (Word('ID2') +
Word(alphanums))
id = (Literal('ID1') + Word(alphanums)) ^ (Literal('ID2') +
Word(alphanums))

#~ plist = Word('PROPERTYS LIST') + ZeroOrMore( id )
plist = Literal('PROPERTYS LIST') + ZeroOrMore( id )

#~ toplevel << firstLine
#~ toplevel << OneOrMore( property )
#~ toplevel << plist
toplevel = firstLine + OneOrMore( property ) + plist

par = toplevel

print par.parseString(data)

# add Groups, to give structure to results, rather than just returning a
flat list of strings
plist = Literal('PROPERTYS LIST') + ZeroOrMore( Group(id) )
toplevel = firstLine + Group(OneOrMore(Group(property))) + Group(plist)

par = toplevel

print par.parseString(data)
Jan 5 '06 #2

P: n/a
Looks like the fundamental error was in my understanding.
Boy, do I feel sheepish. Yes, what I wanted were Literals.
That clarifies things greatly. Thank you.

Also, I went browsing around further, and found on O'reilly's CodeZoo:
Most of the methods in the pyparsing module are very easy to figure
out. Forward() might not be as obvious. To get going with it, think of
the Forward() method as declaring a recursive match. Start out by
defining the match variable with the Forward() method (with no
arguments). Then use the '<<' operator to define the actual match --
and use the match variable itself in the definition to show where
recursion would occur. See the code sample below for an example.
# Parser definition for simple
# SGML snippet
container_tag = Forward()
open_tag, close_tag = tag()
content_tag = tag(closed=False)
content = Group(open_tag + CharsNotIn("<"))
container_tag << Group(open_tag + OneOrMore(container_tag | content) +
close_tag)
# ^name
^recursive reference
body = Group(container_tag).setResultsName("body")

So, I think that clears things up on both issues.

Again, Thank you for your assistance.

Jan 5 '06 #3

P: n/a
Sorry, got a bit of formatting wrong. Messed up what I was really
trying to point out.

container_tag << Group(open_tag + OneOrMore(container_tag | content)
# ^name ^recursive reference

Jan 5 '06 #4

P: n/a
Boy, do I feel sheepish. Yes, what I wanted were Literals. That
clarifies things greatly. Thank you.

Also, I went browsing around further, and found on O'reilly's CodeZoo:
Most of the methods in the pyparsing module are very easy to figure
out. Forward() might not be as obvious. To get going with it, think of
the Forward() method as declaring a recursive match. Start out by
defining the match variable with the Forward() method (with no
arguments). Then use the '<<' operator to define the actual match --
and use the match variable itself in the definition to show where
recursion would occur. See the code sample below for an example.
# Parser definition for simple
# SGML snippet
container_tag = Forward()
open_tag, close_tag = tag()
content_tag = tag(closed=False)
content = Group(open_tag + CharsNotIn("<"))
container_tag << Group(open_tag + OneOrMore(container_tag | content) +
close_tag)
# ^name
^recursive reference
body = Group(container_tag).setResultsName("body")

So, I think that clears things up on both issues.

Again, Thank you for your assistance.

Jan 5 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.