By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,359 Members | 1,543 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,359 IT Pros & Developers. It's quick & easy.

Parsing files -- pyparsing to the rescue?

P: n/a
Hi all,

I have a file which I need to parse and I need to be able to break it
down by sections. I know it's possible but I can't seem to figure this
out.

The sections are broken by <> with one or more keywords in the <>.
What I want to do is to be able to pars a particular section of the
file. So for example I need to be able to look at the SYSLIB section.
Presumably the sections are
<SYSLIB>
Sys Data
Sys-Data
asdkData
Data
<LOGLVS>
Data
Data
Data
Data
<SOME SECTION>
Data
Data
Data
Data
<NETLIST>
Data
Data
Data
Data
<NET>

So if I wanted to break them down..

Sections are broken down by this..

secH=pyparsing.LineStart() + pyparsing.Suppress(
pyparsing.Literal("<")) +
pyparsing.OneOrMore(pyparsing.Word(pyparsing.alpha nums)) +
pyparsing.Suppress( pyparsing.Literal(">"))

But how do I say that <SECTIONn> stops at the start of the next
<SECTIONm>?

Jan 16 '06 #1
Share this Question
Share on Google+
3 Replies


P: n/a
rh0dium wrote:
I have a file which I need to parse and I need to be able to break it
down by sections. I know it's possible but I can't seem to figure this
out.

The sections are broken by <> with one or more keywords in the <>.
What I want to do is to be able to pars a particular section of the
file. So for example I need to be able to look at the SYSLIB section.
Presumably the sections are
<SYSLIB>
Sys Data
Sys-Data
asdkData
Data
<LOGLVS>
Data
Data
Data
Data
<SOME SECTION>
Data
Data
Data
Data
<NETLIST>
Data
Data
Data
Data
<NET>


Given your description, pyparsing doesn't feel like the correct tool:

secs = {}
for L in file("foo.txt", "rU"):
L = L.rstrip("\n")
if re.match(r"<.*>", L):
name = L[1:-1]
secs[name] = []
else:
secs[name].append(L)

--
Giovanni Bajo
Jan 16 '06 #2

P: n/a
"rh0dium" <sk****@pointcircle.com> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...
Hi all,

I have a file which I need to parse and I need to be able to break it
down by sections. I know it's possible but I can't seem to figure this
out.

The sections are broken by <> with one or more keywords in the <>.
But how do I say that <SECTIONn> stops at the start of the next
<SECTIONm>?


See the attached working example - the comments and definition of dataLine
show how this is done.

This is something of a trick in pyparsing, but it is a basic characteristic
of the pyparsing recursive descent parser.

-- Paul

data="""<SYSLIB>
Sys Data
Sys-Data
asdkData
Data
<LOGLVS>
Data
Data
Data
Data
<SOME SECTION>
Data
Data
Data
Data
<NETLIST>
Data
Data
Data
Data
<NET>
"""

from pyparsing import *

# basic pyparsing version
secLabel = Suppress("<") + OneOrMore(Word(alphas)) + Suppress(">") +
LineEnd().suppress()
# need to indicate which entries are *not* valid datalines - next secLabel,
or end of string
dataLine = ~secLabel + ~StringEnd() + restOfLine + LineEnd().suppress()

# a data section is a section label, followed by zero or more data lines
section = Group(secLabel + ZeroOrMore(dataLine))

# a config data contains one or more sections
configData = OneOrMore(section)

# parse the input data and print the results
res = configData.parseString(data)
print res

# prints:
# [['SYSLIB', 'Sys Data', 'Sys-Data', 'asdkData', 'Data'], ['LOGLVS',
'Data', 'Data', 'Data', 'Data'], ['SOME', 'SECTION', 'Data', 'Data', 'Data',
'Data'], ['NETLIST', 'Data', 'Data', 'Data', 'Data'], ['NET']]
# enhanced version, constructing a ParseResults with dict-like access
# (reuses previous expression definitions)

# combine multiword keys into a single string
# - want <SOME SECTION> to return 'SOME SECTION', not
# 'SOME', 'SECTION'
def joinKeyWords(s,l,t):
return " ".join(t)
secLabel.setParseAction(joinKeyWords)
section = Group(secLabel + ZeroOrMore(dataLine))
configData = Dict(OneOrMore(section))

# parse the input data, and access the results by section name
res = configData.parseString(data)
print res
print res["SYSLIB"]
print res["SOME SECTION"]
print res.keys()
# prints:
#[['SYSLIB', 'Sys Data', 'Sys-Data', 'asdkData', 'Data'], ['LOGLVS', 'Data',
'Data', 'Data', 'Data'], ['SOME SECTION', 'Data', 'Data', 'Data', 'Data'],
['NETLIST', 'Data', 'Data', 'Data', 'Data'], ['NET']]
#['Sys Data', 'Sys-Data', 'asdkData', 'Data']
#['Data', 'Data', 'Data', 'Data']
#['LOGLVS', 'NET', 'NETLIST', 'SYSLIB', 'SOME SECTION']

Jan 17 '06 #3

P: n/a
Try this

code
=====
import re
p = re.compile(r'<SYSLIB>([^<]*)<')
s = open("file").read()
m = re.search(p, s)
if m: res = m.groups()[0]
res = res.lstrip("\n")
res = res.rstrip("\n")
print res
result:
=======
%python parser.py
Sys Data
Sys-Data
asdkData
Data
%

Thanks
Allan
"rh0dium" <sk****@pointcircle.com> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...
Hi all,

I have a file which I need to parse and I need to be able to break it
down by sections. I know it's possible but I can't seem to figure this
out.

The sections are broken by <> with one or more keywords in the <>.
What I want to do is to be able to pars a particular section of the
file. So for example I need to be able to look at the SYSLIB section.
Presumably the sections are
<SYSLIB>
Sys Data
Sys-Data
asdkData
Data
<LOGLVS>
Data
Data
Data
Data
<SOME SECTION>
Data
Data
Data
Data
<NETLIST>
Data
Data
Data
Data
<NET>

So if I wanted to break them down..

Sections are broken down by this..

secH=pyparsing.LineStart() + pyparsing.Suppress(
pyparsing.Literal("<")) +
pyparsing.OneOrMore(pyparsing.Word(pyparsing.alpha nums)) +
pyparsing.Suppress( pyparsing.Literal(">"))

But how do I say that <SECTIONn> stops at the start of the next
<SECTIONm>?

Jan 17 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.