"rh0dium" <sk****@pointci rcle.com> wrote in message
news:11******** *************@g 47g2000cwa.goog legroups.com...
Hi all,
I have a file which I need to parse and I need to be able to break it
down by sections. I know it's possible but I can't seem to figure this
out.
The sections are broken by <> with one or more keywords in the <>.
But how do I say that <SECTIONn> stops at the start of the next
<SECTIONm>?
See the attached working example - the comments and definition of dataLine
show how this is done.
This is something of a trick in pyparsing, but it is a basic characteristic
of the pyparsing recursive descent parser.
-- Paul
data="""<SYSLIB >
Sys Data
Sys-Data
asdkData
Data
<LOGLVS>
Data
Data
Data
Data
<SOME SECTION>
Data
Data
Data
Data
<NETLIST>
Data
Data
Data
Data
<NET>
"""
from pyparsing import *
# basic pyparsing version
secLabel = Suppress("<") + OneOrMore(Word( alphas)) + Suppress(">") +
LineEnd().suppr ess()
# need to indicate which entries are *not* valid datalines - next secLabel,
or end of string
dataLine = ~secLabel + ~StringEnd() + restOfLine + LineEnd().suppr ess()
# a data section is a section label, followed by zero or more data lines
section = Group(secLabel + ZeroOrMore(data Line))
# a config data contains one or more sections
configData = OneOrMore(secti on)
# parse the input data and print the results
res = configData.pars eString(data)
print res
# prints:
# [['SYSLIB', 'Sys Data', 'Sys-Data', 'asdkData', 'Data'], ['LOGLVS',
'Data', 'Data', 'Data', 'Data'], ['SOME', 'SECTION', 'Data', 'Data', 'Data',
'Data'], ['NETLIST', 'Data', 'Data', 'Data', 'Data'], ['NET']]
# enhanced version, constructing a ParseResults with dict-like access
# (reuses previous expression definitions)
# combine multiword keys into a single string
# - want <SOME SECTION> to return 'SOME SECTION', not
# 'SOME', 'SECTION'
def joinKeyWords(s, l,t):
return " ".join(t)
secLabel.setPar seAction(joinKe yWords)
section = Group(secLabel + ZeroOrMore(data Line))
configData = Dict(OneOrMore( section))
# parse the input data, and access the results by section name
res = configData.pars eString(data)
print res
print res["SYSLIB"]
print res["SOME SECTION"]
print res.keys()
# prints:
#[['SYSLIB', 'Sys Data', 'Sys-Data', 'asdkData', 'Data'], ['LOGLVS', 'Data',
'Data', 'Data', 'Data'], ['SOME SECTION', 'Data', 'Data', 'Data', 'Data'],
['NETLIST', 'Data', 'Data', 'Data', 'Data'], ['NET']]
#['Sys Data', 'Sys-Data', 'asdkData', 'Data']
#['Data', 'Data', 'Data', 'Data']
#['LOGLVS', 'NET', 'NETLIST', 'SYSLIB', 'SOME SECTION']