Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old January 16th, 2006, 09:15 PM
rh0dium
Guest
 
Posts: n/a
Default Parsing files -- pyparsing to the rescue?

Hi all,

I have a file which I need to parse and I need to be able to break it
down by sections. I know it's possible but I can't seem to figure this
out.

The sections are broken by <> with one or more keywords in the <>.
What I want to do is to be able to pars a particular section of the
file. So for example I need to be able to look at the SYSLIB section.
Presumably the sections are


<SYSLIB>
Sys Data
Sys-Data
asdkData
Data
<LOGLVS>
Data
Data
Data
Data
<SOME SECTION>
Data
Data
Data
Data
<NETLIST>
Data
Data
Data
Data
<NET>

So if I wanted to break them down..

Sections are broken down by this..

secH=pyparsing.LineStart() + pyparsing.Suppress(
pyparsing.Literal("<")) +
pyparsing.OneOrMore(pyparsing.Word(pyparsing.alpha nums)) +
pyparsing.Suppress( pyparsing.Literal(">"))

But how do I say that <SECTIONn> stops at the start of the next
<SECTIONm>?

  #2  
Old January 16th, 2006, 09:15 PM
Giovanni Bajo
Guest
 
Posts: n/a
Default Re: Parsing files -- pyparsing to the rescue?

rh0dium wrote:
[color=blue]
> I have a file which I need to parse and I need to be able to break it
> down by sections. I know it's possible but I can't seem to figure this
> out.
>
> The sections are broken by <> with one or more keywords in the <>.
> What I want to do is to be able to pars a particular section of the
> file. So for example I need to be able to look at the SYSLIB section.
> Presumably the sections are
>
>
> <SYSLIB>
> Sys Data
> Sys-Data
> asdkData
> Data
> <LOGLVS>
> Data
> Data
> Data
> Data
> <SOME SECTION>
> Data
> Data
> Data
> Data
> <NETLIST>
> Data
> Data
> Data
> Data
> <NET>[/color]

Given your description, pyparsing doesn't feel like the correct tool:

secs = {}
for L in file("foo.txt", "rU"):
L = L.rstrip("\n")
if re.match(r"<.*>", L):
name = L[1:-1]
secs[name] = []
else:
secs[name].append(L)

--
Giovanni Bajo


  #3  
Old January 17th, 2006, 12:15 AM
Paul McGuire
Guest
 
Posts: n/a
Default Re: Parsing files -- pyparsing to the rescue?

"rh0dium" <sklass@pointcircle.com> wrote in message
news:1137445296.573399.60970@g47g2000cwa.googlegro ups.com...[color=blue]
> Hi all,
>
> I have a file which I need to parse and I need to be able to break it
> down by sections. I know it's possible but I can't seem to figure this
> out.
>
> The sections are broken by <> with one or more keywords in the <>.
> But how do I say that <SECTIONn> stops at the start of the next
> <SECTIONm>?
>[/color]

See the attached working example - the comments and definition of dataLine
show how this is done.

This is something of a trick in pyparsing, but it is a basic characteristic
of the pyparsing recursive descent parser.

-- Paul

data="""<SYSLIB>
Sys Data
Sys-Data
asdkData
Data
<LOGLVS>
Data
Data
Data
Data
<SOME SECTION>
Data
Data
Data
Data
<NETLIST>
Data
Data
Data
Data
<NET>
"""

from pyparsing import *

# basic pyparsing version
secLabel = Suppress("<") + OneOrMore(Word(alphas)) + Suppress(">") +
LineEnd().suppress()
# need to indicate which entries are *not* valid datalines - next secLabel,
or end of string
dataLine = ~secLabel + ~StringEnd() + restOfLine + LineEnd().suppress()

# a data section is a section label, followed by zero or more data lines
section = Group(secLabel + ZeroOrMore(dataLine))

# a config data contains one or more sections
configData = OneOrMore(section)

# parse the input data and print the results
res = configData.parseString(data)
print res

# prints:
# [['SYSLIB', 'Sys Data', 'Sys-Data', 'asdkData', 'Data'], ['LOGLVS',
'Data', 'Data', 'Data', 'Data'], ['SOME', 'SECTION', 'Data', 'Data', 'Data',
'Data'], ['NETLIST', 'Data', 'Data', 'Data', 'Data'], ['NET']]


# enhanced version, constructing a ParseResults with dict-like access
# (reuses previous expression definitions)

# combine multiword keys into a single string
# - want <SOME SECTION> to return 'SOME SECTION', not
# 'SOME', 'SECTION'
def joinKeyWords(s,l,t):
return " ".join(t)
secLabel.setParseAction(joinKeyWords)
section = Group(secLabel + ZeroOrMore(dataLine))
configData = Dict(OneOrMore(section))

# parse the input data, and access the results by section name
res = configData.parseString(data)
print res
print res["SYSLIB"]
print res["SOME SECTION"]
print res.keys()


# prints:
#[['SYSLIB', 'Sys Data', 'Sys-Data', 'asdkData', 'Data'], ['LOGLVS', 'Data',
'Data', 'Data', 'Data'], ['SOME SECTION', 'Data', 'Data', 'Data', 'Data'],
['NETLIST', 'Data', 'Data', 'Data', 'Data'], ['NET']]
#['Sys Data', 'Sys-Data', 'asdkData', 'Data']
#['Data', 'Data', 'Data', 'Data']
#['LOGLVS', 'NET', 'NETLIST', 'SYSLIB', 'SOME SECTION']



  #4  
Old January 17th, 2006, 12:15 AM
Allan Zhang
Guest
 
Posts: n/a
Default Re: Parsing files -- pyparsing to the rescue?

Try this

code
=====
import re
p = re.compile(r'<SYSLIB>([^<]*)<')
s = open("file").read()
m = re.search(p, s)
if m: res = m.groups()[0]
res = res.lstrip("\n")
res = res.rstrip("\n")
print res


result:
=======
%python parser.py
Sys Data
Sys-Data
asdkData
Data
%

Thanks
Allan
"rh0dium" <sklass@pointcircle.com> wrote in message
news:1137445296.573399.60970@g47g2000cwa.googlegro ups.com...[color=blue]
> Hi all,
>
> I have a file which I need to parse and I need to be able to break it
> down by sections. I know it's possible but I can't seem to figure this
> out.
>
> The sections are broken by <> with one or more keywords in the <>.
> What I want to do is to be able to pars a particular section of the
> file. So for example I need to be able to look at the SYSLIB section.
> Presumably the sections are
>
>
> <SYSLIB>
> Sys Data
> Sys-Data
> asdkData
> Data
> <LOGLVS>
> Data
> Data
> Data
> Data
> <SOME SECTION>
> Data
> Data
> Data
> Data
> <NETLIST>
> Data
> Data
> Data
> Data
> <NET>
>
> So if I wanted to break them down..
>
> Sections are broken down by this..
>
> secH=pyparsing.LineStart() + pyparsing.Suppress(
> pyparsing.Literal("<")) +
> pyparsing.OneOrMore(pyparsing.Word(pyparsing.alpha nums)) +
> pyparsing.Suppress( pyparsing.Literal(">"))
>
> But how do I say that <SECTIONn> stops at the start of the next
> <SECTIONm>?
>[/color]


 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles