473,320 Members | 1,872 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

log parser design question

I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:

The log file entries will consist of something like this:

ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete

ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

and so on...

I need to be able to group these entries together, index them by ID
and IID, and search the context of each entry and if a certain status
if found (such as wait), then be able to return the ID or IID
(depending...) of that entry.

So I was considering parsing them to this effect:

in a dictionary, where the key is a tuple, and the value is a list:

{('ID=8688', 'IID=98889998'): ['ID=8688 IID=98889998 execute begin -
01.21.2007 status enabled', 'locked working.lock', 'status running',
'status complete']}

I am keeping the full text of each entry in the list so that I can
recreate them for display if need be.

I am fairly new to python, so could anyone offer any advice here
before I get too far and discover a fatal flaw that you might see
coming a mile away?

would I, with this design, be able to, for example, search each list
for "waiting on ID=8688", and when found, be able to associate that
value with one of the elements of it's key "ID=9009" ? or is this
approached flawed? I'm assuming there is a better way, but I need
some advice...

I appreciate any thoughts.

Thanks.
Jan 28 '07 #1
3 1513
On Jan 27, 10:43 pm, avidfan <n...@nowhere.comwrote:
I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:

The log file entries will consist of something like this:

ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete

ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

and so on...
For the parsing of this data, here is a pyparsing approach. Once
parse, the pyparsing ParseResults data structures can be massaged into
a queryable list. See the examples at the end for accessing the
individual parsed fields.

-- Paul

data = """
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

"""
from pyparsing import *

integer=Word(nums)
idref = "ID=" + integer.setResultsName("id")
iidref = "IID=" + integer.setResultsName("iid")
date = Regex(r"\d\d\.\d\d\.\d{4}")

logLabel = Group("execute" + oneOf("begin wait"))
logStatus = Group("status" + oneOf("enabled wait"))
lockQual = Group("locked" + Word(alphanums+"."))
waitingOnQual = Group("waiting on" + idref)
statusQual = Group("status" + oneOf("running complete wait"))
waitingToLockQual = Group(Literal("waiting to lock"))
statusQualifier = statusQual | waitingOnQual | waitingToLockQual |
lockQual
logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
+ date + logStatus.setResultsName("status") \
+ ZeroOrMore(statusQualifier).setResultsName("quals" )

for tokens in logEntry.searchString(data):
print tokens
print tokens.dump()
print tokens.id
print tokens.iid
print tokens.status
print tokens.quals
print

prints:

['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
- id: 8688
- iid: 98889998
- logtype: ['execute', 'begin']
- quals: [['locked', 'working.lock'], ['status', 'running'],
['status', 'complete']]
- status: ['status', 'enabled']
8688
98889998
['status', 'enabled']
[['locked', 'working.lock'], ['status', 'running'], ['status',
'complete']]

['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
- id: 9009
- iid: 87234785
- logtype: ['execute', 'wait']
- quals: [['waiting to lock'], ['status', 'wait'], ['waiting on',
'ID=', '8688']]
- status: ['status', 'wait']
9009
87234785
['status', 'wait']
[['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=',
'8688']]

Jan 29 '07 #2
On 28 Jan 2007 21:20:47 -0800, "Paul McGuire" <pt***@austin.rr.com>
wrote:
>On Jan 27, 10:43 pm, avidfan <n...@nowhere.comwrote:
>I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:

The log file entries will consist of something like this:

ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete

ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

and so on...
For the parsing of this data, here is a pyparsing approach. Once
parse, the pyparsing ParseResults data structures can be massaged into
a queryable list. See the examples at the end for accessing the
individual parsed fields.

-- Paul

data = """
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

"""
from pyparsing import *

integer=Word(nums)
idref = "ID=" + integer.setResultsName("id")
iidref = "IID=" + integer.setResultsName("iid")
date = Regex(r"\d\d\.\d\d\.\d{4}")

logLabel = Group("execute" + oneOf("begin wait"))
logStatus = Group("status" + oneOf("enabled wait"))
lockQual = Group("locked" + Word(alphanums+"."))
waitingOnQual = Group("waiting on" + idref)
statusQual = Group("status" + oneOf("running complete wait"))
waitingToLockQual = Group(Literal("waiting to lock"))
statusQualifier = statusQual | waitingOnQual | waitingToLockQual |
lockQual
logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
+ date + logStatus.setResultsName("status") \
+ ZeroOrMore(statusQualifier).setResultsName("quals" )

for tokens in logEntry.searchString(data):
print tokens
print tokens.dump()
print tokens.id
print tokens.iid
print tokens.status
print tokens.quals
print

prints:

['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
- id: 8688
- iid: 98889998
- logtype: ['execute', 'begin']
- quals: [['locked', 'working.lock'], ['status', 'running'],
['status', 'complete']]
- status: ['status', 'enabled']
8688
98889998
['status', 'enabled']
[['locked', 'working.lock'], ['status', 'running'], ['status',
'complete']]

['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
- id: 9009
- iid: 87234785
- logtype: ['execute', 'wait']
- quals: [['waiting to lock'], ['status', 'wait'], ['waiting on',
'ID=', '8688']]
- status: ['status', 'wait']
9009
87234785
['status', 'wait']
[['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=',
'8688']]
Paul,

Thanks! That's a great module. I've been going through the docs and
it seems to do exactly what I need...

I appreciate your help!

Jan 30 '07 #3
On Mon, 29 Jan 2007 23:11:32 -0600, avidfan <no***@nowhere.comwrote:
>On 28 Jan 2007 21:20:47 -0800, "Paul McGuire" <pt***@austin.rr.com>
wrote:
>>On Jan 27, 10:43 pm, avidfan <n...@nowhere.comwrote:
>>I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:

The log file entries will consist of something like this:

ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete

ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

and so on...
For the parsing of this data, here is a pyparsing approach. Once
parse, the pyparsing ParseResults data structures can be massaged into
a queryable list. See the examples at the end for accessing the
individual parsed fields.

-- Paul

data = """
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

"""
from pyparsing import *

integer=Word(nums)
idref = "ID=" + integer.setResultsName("id")
iidref = "IID=" + integer.setResultsName("iid")
date = Regex(r"\d\d\.\d\d\.\d{4}")

logLabel = Group("execute" + oneOf("begin wait"))
logStatus = Group("status" + oneOf("enabled wait"))
lockQual = Group("locked" + Word(alphanums+"."))
waitingOnQual = Group("waiting on" + idref)
statusQual = Group("status" + oneOf("running complete wait"))
waitingToLockQual = Group(Literal("waiting to lock"))
statusQualifier = statusQual | waitingOnQual | waitingToLockQual |
lockQual
logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
+ date + logStatus.setResultsName("status") \
+ ZeroOrMore(statusQualifier).setResultsName("quals" )

for tokens in logEntry.searchString(data):
print tokens
print tokens.dump()
print tokens.id
print tokens.iid
print tokens.status
print tokens.quals
print

prints:

['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
- id: 8688
- iid: 98889998
- logtype: ['execute', 'begin']
- quals: [['locked', 'working.lock'], ['status', 'running'],
['status', 'complete']]
- status: ['status', 'enabled']
8688
98889998
['status', 'enabled']
[['locked', 'working.lock'], ['status', 'running'], ['status',
'complete']]

['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
- id: 9009
- iid: 87234785
- logtype: ['execute', 'wait']
- quals: [['waiting to lock'], ['status', 'wait'], ['waiting on',
'ID=', '8688']]
- status: ['status', 'wait']
9009
87234785
['status', 'wait']
[['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=',
'8688']]

Paul,

Thanks! That's a great module. I've been going through the docs and
it seems to do exactly what I need...

I appreciate your help!
http://www.camelrichard.org/roller/p...log_files_with

Thanks, Paul!

Feb 4 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: ggg | last post by:
There are over 10 different XML reports I can download from someone else's server. I've made a class to parse the XML into an array. So basically I just have the start_tag() track what the...
2
by: Rio | last post by:
I'm trying to build HTML table parser (available not satisfying my needs), I need to know what is in every cell, along with its cell, row and table number plus the attributes that go with those...
2
by: Magnus Heino | last post by:
Hi. Are there any patterns or other design techniques that could be used when implementing a xml parser that needs to be able to handle different versions of a schema? Let's say that I write...
0
by: Phlip | last post by:
C++ newsgroupies: I wrote a parser to solve math expressions like "3.0 ^(4 - 5)", or "3 / 8". Below my sig is recursiveDescentParser.cpp, the test suite that drove the implementation of the...
8
by: Andy | last post by:
Hi, all I am trying to design a parser for C program using C++. Currently what I did for syntax tree is to design a class for each nontermials in the grammar, and use inherentance to link them....
25
by: Ali-R | last post by:
Hi, Is there a parser which parses CSV files? Thanks for your help. Reza
4
by: siddharthkhare | last post by:
Hi All, I need to parse certain text from a paragraph (like 20 lines). I know the exact tags that I am looking for. my approach is to define a xml (config) file that defines what tag I am...
6
by: mahesh.kanakaraj | last post by:
Hi Folks, This is my first post to this group, and I really am not sure whether this is the right group to ask my question. If its not an appropriate question to this group, please correct me...
28
by: Marc Gravell | last post by:
In Linq, you can apparently get a meaningful body from and expression's .ToString(); random question - does anybody know if linq also includes a parser? It just seemed it might be a handy way to...
0
by: UncleRic | last post by:
Environment: Mac OS X (10.4.10) on MacBook Pro I'm a Perl Neophyte. I've downloaded the XML::Parser module and am attempting to install it in my working directory (referenced via PERL5LIB env): ...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.