472,958 Members | 2,036 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,958 software developers and data experts.

pyparsing question

I am trying to parse data that looks like this:

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
2BFA76F6 1208230607 T S SYSPROC SYSTEM
SHUTDOWN BY USER
A6D1BD62 1215230807 I
H Firmware Event

My problem is that sometimes there is a RESOURCE_NAME and sometimes
not, so I wind up with "Firmware" as my RESOURCE_NAME and "Event" as
my DESCRIPTION. The formating seems to use a set number of spaces.

I have tried making RESOURCE_NAME an Optional(Word(alphanums))) and
Description OneOrMore(Word(alphas) + LineEnd(). So the question is,
how can I avoid having the first word of Description sucked into
RESOURCE_NAME when that field should be blank?
The data I have has a fixed number of characters per field, so I could
split it up that way, but wouldn't that defeat the purpose of using a
parser? I am determined to become proficient with pyparsing so I am
using it even when it could be considered overkill; thus, it has gone
past mere utility now, this is a matter of principle!

thanks
Jan 1 '08 #1
3 1678
On Jan 2, 10:32 am, hubritic <colinland...@gmail.comwrote:
The data I have has a fixed number of characters per field, so I could
split it up that way, but wouldn't that defeat the purpose of using a
parser?
The purpose of a parser is to parse. Data in fixed columns does not
need parsing.
I am determined to become proficient with pyparsing so I am
using it even when it could be considered overkill; thus, it has gone
past mere utility now, this is a matter of principle!
An extremely misguided "principle". Would you use an AK47 on the
flies around your barbecue? A better principle is to choose the best
tool for the job.

Jan 2 '08 #2
On Jan 1, 4:18 pm, John Machin <sjmac...@lexicon.netwrote:
On Jan 2, 10:32 am, hubritic <colinland...@gmail.comwrote:
The data I have has a fixed number of characters per field, so I could
split it up that way, but wouldn't that defeat the purpose of using a
parser?

The purpose of a parser is to parse. Data in fixed columns does not
need parsing.
I am determined to become proficient with pyparsing so I am
using it even when it could be considered overkill; thus, it has gone
past mere utility now, this is a matter of principle!

An extremely misguided "principle". Would you use an AK47 on the
flies around your barbecue? A better principle is to choose the best
tool for the job.
Your principle is no doubt the saner one for the real world, but your
example of AK47 is a bit off.
We generally know enough about an AK47 to know that it is not
something to kill flies with. Consider, though, if
someone unfamiliar with the concept of guns and mayhem got an AK47 for
xmas and was only told that it was
really good for killing things. He would try it out and would discover
that indeed it kills all sorts of things.
So he might try killing flies. Then he would discover the limitations;
those already familiar with guns would wonder
why he would waste his time.
Jan 2 '08 #3
On Jan 1, 5:32*pm, hubritic <colinland...@gmail.comwrote:
I am trying to parse data that looks like this:

IDENTIFIER * *TIMESTAMP * T *C * RESOURCE_NAME * DESCRIPTION
2BFA76F6 * * 1208230607 * T * S * SYSPROC * * * * * * * * * *SYSTEM
SHUTDOWN BY USER
A6D1BD62 * 1215230807 * * I
H * * * * * * * * * * * * * * * * * * * * * *Firmware Event
<snip>
The data I have has a fixed number of characters per field, so I could
split it up that way, but wouldn't that defeat the purpose of using a
parser? *
I think you have this backwards. I use pyparsing for a lot of text
processing, but if it is not a good fit, or if str.split is all that
is required, there is no real rationale for using anything more
complicated.
I am determined to become proficient with pyparsing so I am
using it even when it could be considered overkill; thus, it has gone
past mere utility now, this is a matter of principle!
Well, I'm glad you are driven to learn pyparsing if it kills you, but
John Machin has a good point. This data is really so amenable to
something as simple as:

for line in logfile:
id,timestamp,t,c resource_and_description = line.split(None,4)

that it is difficult to recommend pyparsing for this case. The sample
you posted was space-delimited, but if it is tab-delimited, and there
is a pair of tabs between the "H" and "Firmware Event" on the second
line, then just use split("\t") for your data and be done.

Still, pyparsing may be helpful in disambiguating that RESOURCE_NAME
and DESCRIPTION text. One approach would be to enumerate (if
possible) the different values of RESOURCE_NAME. Something like this:

ident = Word(alphanums)
timestamp = Word(nums,exact=10)

# I don't know what these are, I'm just getting the values
# from the sample text you posted
t_field = oneOf("T I")
c_field = oneOf("S H")

# I'm just guessing here, you'll need to provide the actual
# values from your log file
resource_name = oneOf("SYSPROC USERPROC IOSUBSYS whatever")

logline = ident("identifier") + timestamp("time") + \
t_field("T") + c_field("C") + \
Optional(resource_name, default="")("resource") + \
Optional(restOfLine, default="")("description")
Another tack to take might be to use a parse action on the resource
name, to verify the column position of the found token by using the
pyparsing method col:

def matchOnlyAtCol(n):
def verifyCol(strg,locn,toks):
if col(locn,strg) != n: raise
ParseException(strg,locn,"matched token not at column %d" % n)
return verifyCol

resource_name = Word(alphas).setParseAction(matchOnlyAtCol(35))

This will only work if your data really is columnar - the example text
that you posted isn't. (Hmm, I like that matchOnlyAtCol method, I
think I'll add that to the next release of pyparsing...)

Here are some similar parsers that might give you some other ideas:
http://pyparsing.wikispaces.com/spac...erLogParser.py
http://mail.python.org/pipermail/pyt...ad.html#301450

In the second link, I made a similar remark, that pyparsing may not be
the first tool to try, but the variability of the input file made the
non-pyparsing options pretty hairy-looking with special case code, so
in the end, pyparsing was no more complex to use.

Good luck!
-- Paul
Jan 2 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could....
4
by: the.theorist | last post by:
Hey, I'm trying my hand and pyparsing a log file (named l.log): FIRSTLINE PROPERTY1 DATA1 PROPERTY2 DATA2 PROPERTYS LIST ID1 data1 ID2 data2
3
by: Ant | last post by:
I have a home-grown Wiki that I created as an excercise, with it's own wiki markup (actually just a clone of the Trac wiki markup). The wiki text parser I wrote works nicely, but makes heavy use of...
4
by: Bytter | last post by:
Hi, I'm trying to construct a parser, but I'm stuck with some basic stuff... For example, I want to match the following: letter = "A"..."Z" | "a"..."z" literal = letter+ include_bool := "+"...
13
by: 7stud | last post by:
To the developer: 1) I went to the pyparsing wiki to download the pyparsing module and try it 2) At the wiki, there was no index entry in the table of contents for Downloads. After searching...
2
by: Nathan Harmston | last post by:
Hi, I know this isnt the pyparsing list, but it doesnt seem like there is one. I m trying to use pyparsing to parse a file however I cant get the Optional keyword to work. My file generally...
1
by: Steve | last post by:
Hi All (especially Paul McGuire!) Could you lend a hand in the grammar and paring of the output from the function win32pdhutil.ShowAllProcesses()? This is the code that I have so far (it is...
19
by: Ant | last post by:
Hi all, I have a question on PyParsing. I am trying to create a parser for a hierarchical todo list format, but have hit a stumbling block. I have parsers for the header of the list (title and...
5
by: Paul McGuire | last post by:
I've just uploaded to SourceForge and PyPI the latest update to pyparsing, version 1.5.1. It has been a couple of months since 1.5.0 was released, and a number of bug-fixes and enhancements have...
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
2
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.