473,394 Members | 2,063 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

trouble pyparsing

Hey, I'm trying my hand and pyparsing a log file (named l.log):
FIRSTLINE

PROPERTY1 DATA1
PROPERTY2 DATA2

PROPERTYS LIST
ID1 data1
ID2 data2

ID1 data11
ID2 data12

SECTION

So I wrote up a small bit of code (named p.py):
from pyparsing import *
import sys

toplevel = Forward()

firstLine = Word('FIRSTLINE')
property = (Word('PROPERTY1') + Word(alphanums)) ^ (Word('PROPERTY2')
+ Word(alphanums))

id = (Word('ID1') + Word(alphanums)) ^ (Word('ID2') +
Word(alphanums))
plist = Word('PROPERTYS LIST') + ZeroOrMore( id )

toplevel << firstLine
toplevel << OneOrMore( property )
toplevel << plist

par = toplevel

print toplevel.parseFile(sys.argv[1])

The problem is that I get the following error:
Traceback (most recent call last):
File "./p.py", line 23, in ?
print toplevel.parseFile(sys.argv[1])
File "/home/erich/tap/lib/python/pyparsing.py", line 833, in
parseFile
return self.parseString(file_contents)
File "/home/erich/tap/lib/python/pyparsing.py", line 622, in
parseString
loc, tokens = self.parse( instring.expandtabs(), 0 )
File "/home/erich/tap/lib/python/pyparsing.py", line 564, in parse
loc,tokens = self.parseImpl( instring, loc, doActions )
File "/home/erich/tap/lib/python/pyparsing.py", line 1743, in
parseImpl
return self.expr.parse( instring, loc, doActions )
File "/home/erich/tap/lib/python/pyparsing.py", line 564, in parse
loc,tokens = self.parseImpl( instring, loc, doActions )
File "/home/erich/tap/lib/python/pyparsing.py", line 1511, in
parseImpl
loc, resultlist = self.exprs[0].parse( instring, loc, doActions )
File "/home/erich/tap/lib/python/pyparsing.py", line 568, in parse
loc,tokens = self.parseImpl( instring, loc, doActions )
File "/home/erich/tap/lib/python/pyparsing.py", line 1068, in
parseImpl
raise exc
pyparsing.ParseException: Expected W:(PROP...) (at char 0), (line:1,
col:1)

I fiddled around with this for quite awhile, and it looks like because
"PROPERTYS LIST" follows one of ['PROPERTY1', 'PROPERTY2'] that pyparse
grabs the overlapping text 'PROPERTY' so that it only has 'S', 'LIST'
when it goes looking for the next thing to parse.

Is this a fundamental error, or is it just me? (I haven't yet tried
simpleparse)

Jan 5 '06 #1
4 2060
"the.theorist" <th**********@gmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Hey, I'm trying my hand and pyparsing a log file (named l.log):
FIRSTLINE

PROPERTY1 DATA1
PROPERTY2 DATA2

PROPERTYS LIST
ID1 data1
ID2 data2

ID1 data11
ID2 data12

SECTION

So I wrote up a small bit of code (named p.py):
from pyparsing import *
import sys

toplevel = Forward()

firstLine = Word('FIRSTLINE')
property = (Word('PROPERTY1') + Word(alphanums)) ^ (Word('PROPERTY2')
+ Word(alphanums))

id = (Word('ID1') + Word(alphanums)) ^ (Word('ID2') +
Word(alphanums))
plist = Word('PROPERTYS LIST') + ZeroOrMore( id )

toplevel << firstLine
toplevel << OneOrMore( property )
toplevel << plist

par = toplevel

print toplevel.parseFile(sys.argv[1])

The problem is that I get the following error: <snip> Is this a fundamental error, or is it just me? (I haven't yet tried
simpleparse)


It's you.

Well, let's focus on the behavior and not the individual. There are two
major misconceptions that you have here:
1. Confusing "Word" for "Literal"
2. Confusing "<<" Forward assignment for some sort of C++ streaming
operator.

What puzzles me is that in some places, you correctly use the Word class, as
in Word(alphanums), to indicate a "word" as a contiguous set of characters
found in the string alphanums. You also correctly use '+' to build up id
and plist expressions, but then you use "<<" successively in what looks like
streaming into the toplevel variable.

When your grammar includes Word("FIRSTLINE"), you are actually saying you
want to match a "word" composed of one ore more letters found in the string
"FIRSTLINE" - this would match not only FIRSTLINE, but also FIRST, LINE,
LIRST, FINE, LIST, FIST, FLINTSTRINE, well, you get the idea. Just the way
Word(alphanums) matches DATA1, DATA2, data1, data2, data11, and data12.

What you really want here is the class Literal, as in Literal("FIRSTLINE").

As for toplevel, there is no reason here to use Forward() - reserve use of
this class for recursive structures, such as lists composed of lists, etc.
toplevel is simply the sequence of a firstline, OneOrMore properties, and a
plist, which is just the plain old:

toplevel = firstline + OneOrMore(property) + plist

Lastly, if you'll peruse the documentation that comes with pyparsing, you'll
also find the Group class. This class is very helpful in imparting some
structure to the returned set of tokens.

Here is a before/after version of your program, that has some more
successful results.

-- Paul
data = """FIRSTLINE

PROPERTY1 DATA1
PROPERTY2 DATA2

PROPERTYS LIST
ID1 data1
ID2 data2

ID1 data11
ID2 data12

SECTION
"""

from pyparsing import *
import sys

#~ toplevel = Forward()

#~ firstLine = Word('FIRSTLINE')
firstLine = Literal('FIRSTLINE')

#~ property = (Word('PROPERTY1') + Word(alphanums)) ^ (Word('PROPERTY2') +
Word(alphanums))
property = (Literal('PROPERTY1') + Word(alphanums)) ^ (Literal('PROPERTY2')
+ Word(alphanums))

#~ id = (Word('ID1') + Word(alphanums)) ^ (Word('ID2') +
Word(alphanums))
id = (Literal('ID1') + Word(alphanums)) ^ (Literal('ID2') +
Word(alphanums))

#~ plist = Word('PROPERTYS LIST') + ZeroOrMore( id )
plist = Literal('PROPERTYS LIST') + ZeroOrMore( id )

#~ toplevel << firstLine
#~ toplevel << OneOrMore( property )
#~ toplevel << plist
toplevel = firstLine + OneOrMore( property ) + plist

par = toplevel

print par.parseString(data)

# add Groups, to give structure to results, rather than just returning a
flat list of strings
plist = Literal('PROPERTYS LIST') + ZeroOrMore( Group(id) )
toplevel = firstLine + Group(OneOrMore(Group(property))) + Group(plist)

par = toplevel

print par.parseString(data)
Jan 5 '06 #2
Looks like the fundamental error was in my understanding.
Boy, do I feel sheepish. Yes, what I wanted were Literals.
That clarifies things greatly. Thank you.

Also, I went browsing around further, and found on O'reilly's CodeZoo:
Most of the methods in the pyparsing module are very easy to figure
out. Forward() might not be as obvious. To get going with it, think of
the Forward() method as declaring a recursive match. Start out by
defining the match variable with the Forward() method (with no
arguments). Then use the '<<' operator to define the actual match --
and use the match variable itself in the definition to show where
recursion would occur. See the code sample below for an example.
# Parser definition for simple
# SGML snippet
container_tag = Forward()
open_tag, close_tag = tag()
content_tag = tag(closed=False)
content = Group(open_tag + CharsNotIn("<"))
container_tag << Group(open_tag + OneOrMore(container_tag | content) +
close_tag)
# ^name
^recursive reference
body = Group(container_tag).setResultsName("body")

So, I think that clears things up on both issues.

Again, Thank you for your assistance.

Jan 5 '06 #3
Sorry, got a bit of formatting wrong. Messed up what I was really
trying to point out.

container_tag << Group(open_tag + OneOrMore(container_tag | content)
# ^name ^recursive reference

Jan 5 '06 #4
Boy, do I feel sheepish. Yes, what I wanted were Literals. That
clarifies things greatly. Thank you.

Also, I went browsing around further, and found on O'reilly's CodeZoo:
Most of the methods in the pyparsing module are very easy to figure
out. Forward() might not be as obvious. To get going with it, think of
the Forward() method as declaring a recursive match. Start out by
defining the match variable with the Forward() method (with no
arguments). Then use the '<<' operator to define the actual match --
and use the match variable itself in the definition to show where
recursion would occur. See the code sample below for an example.
# Parser definition for simple
# SGML snippet
container_tag = Forward()
open_tag, close_tag = tag()
content_tag = tag(closed=False)
content = Group(open_tag + CharsNotIn("<"))
container_tag << Group(open_tag + OneOrMore(container_tag | content) +
close_tag)
# ^name
^recursive reference
body = Group(container_tag).setResultsName("body")

So, I think that clears things up on both issues.

Again, Thank you for your assistance.

Jan 5 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could....
3
by: rh0dium | last post by:
Hi all, I have a file which I need to parse and I need to be able to break it down by sections. I know it's possible but I can't seem to figure this out. The sections are broken by <> with...
4
by: Bytter | last post by:
Hi, I'm trying to construct a parser, but I'm stuck with some basic stuff... For example, I want to match the following: letter = "A"..."Z" | "a"..."z" literal = letter+ include_bool := "+"...
13
by: 7stud | last post by:
To the developer: 1) I went to the pyparsing wiki to download the pyparsing module and try it 2) At the wiki, there was no index entry in the table of contents for Downloads. After searching...
1
by: Steve | last post by:
Hi All (especially Paul McGuire!) Could you lend a hand in the grammar and paring of the output from the function win32pdhutil.ShowAllProcesses()? This is the code that I have so far (it is...
1
by: Neal Becker | last post by:
I'm just trying out pyparsing. I get stack overflow on my first try. Any help? #/usr/bin/python from pyparsing import Word, alphas, QuotedString, OneOrMore, delimitedList first_line = ''...
18
by: Just Another Victim of the Ambient Morality | last post by:
Is pyparsing really a recursive descent parser? I ask this because there are grammars it can't parse that my recursive descent parser would parse, should I have written one. For instance: ...
3
by: hubritic | last post by:
I am trying to parse data that looks like this: IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 2BFA76F6 1208230607 T S SYSPROC SYSTEM SHUTDOWN BY USER...
5
by: Paul McGuire | last post by:
I've just uploaded to SourceForge and PyPI the latest update to pyparsing, version 1.5.1. It has been a couple of months since 1.5.0 was released, and a number of bug-fixes and enhancements have...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.