By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,911 Members | 1,213 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,911 IT Pros & Developers. It's quick & easy.

log parser design question

P: n/a
I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:

The log file entries will consist of something like this:

ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete

ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

and so on...

I need to be able to group these entries together, index them by ID
and IID, and search the context of each entry and if a certain status
if found (such as wait), then be able to return the ID or IID
(depending...) of that entry.

So I was considering parsing them to this effect:

in a dictionary, where the key is a tuple, and the value is a list:

{('ID=8688', 'IID=98889998'): ['ID=8688 IID=98889998 execute begin -
01.21.2007 status enabled', 'locked working.lock', 'status running',
'status complete']}

I am keeping the full text of each entry in the list so that I can
recreate them for display if need be.

I am fairly new to python, so could anyone offer any advice here
before I get too far and discover a fatal flaw that you might see
coming a mile away?

would I, with this design, be able to, for example, search each list
for "waiting on ID=8688", and when found, be able to associate that
value with one of the elements of it's key "ID=9009" ? or is this
approached flawed? I'm assuming there is a better way, but I need
some advice...

I appreciate any thoughts.

Thanks.
Jan 28 '07 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On Jan 27, 10:43 pm, avidfan <n...@nowhere.comwrote:
I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:

The log file entries will consist of something like this:

ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete

ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

and so on...
For the parsing of this data, here is a pyparsing approach. Once
parse, the pyparsing ParseResults data structures can be massaged into
a queryable list. See the examples at the end for accessing the
individual parsed fields.

-- Paul

data = """
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

"""
from pyparsing import *

integer=Word(nums)
idref = "ID=" + integer.setResultsName("id")
iidref = "IID=" + integer.setResultsName("iid")
date = Regex(r"\d\d\.\d\d\.\d{4}")

logLabel = Group("execute" + oneOf("begin wait"))
logStatus = Group("status" + oneOf("enabled wait"))
lockQual = Group("locked" + Word(alphanums+"."))
waitingOnQual = Group("waiting on" + idref)
statusQual = Group("status" + oneOf("running complete wait"))
waitingToLockQual = Group(Literal("waiting to lock"))
statusQualifier = statusQual | waitingOnQual | waitingToLockQual |
lockQual
logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
+ date + logStatus.setResultsName("status") \
+ ZeroOrMore(statusQualifier).setResultsName("quals" )

for tokens in logEntry.searchString(data):
print tokens
print tokens.dump()
print tokens.id
print tokens.iid
print tokens.status
print tokens.quals
print

prints:

['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
- id: 8688
- iid: 98889998
- logtype: ['execute', 'begin']
- quals: [['locked', 'working.lock'], ['status', 'running'],
['status', 'complete']]
- status: ['status', 'enabled']
8688
98889998
['status', 'enabled']
[['locked', 'working.lock'], ['status', 'running'], ['status',
'complete']]

['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
- id: 9009
- iid: 87234785
- logtype: ['execute', 'wait']
- quals: [['waiting to lock'], ['status', 'wait'], ['waiting on',
'ID=', '8688']]
- status: ['status', 'wait']
9009
87234785
['status', 'wait']
[['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=',
'8688']]

Jan 29 '07 #2

P: n/a
On 28 Jan 2007 21:20:47 -0800, "Paul McGuire" <pt***@austin.rr.com>
wrote:
>On Jan 27, 10:43 pm, avidfan <n...@nowhere.comwrote:
>I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:

The log file entries will consist of something like this:

ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete

ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

and so on...
For the parsing of this data, here is a pyparsing approach. Once
parse, the pyparsing ParseResults data structures can be massaged into
a queryable list. See the examples at the end for accessing the
individual parsed fields.

-- Paul

data = """
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

"""
from pyparsing import *

integer=Word(nums)
idref = "ID=" + integer.setResultsName("id")
iidref = "IID=" + integer.setResultsName("iid")
date = Regex(r"\d\d\.\d\d\.\d{4}")

logLabel = Group("execute" + oneOf("begin wait"))
logStatus = Group("status" + oneOf("enabled wait"))
lockQual = Group("locked" + Word(alphanums+"."))
waitingOnQual = Group("waiting on" + idref)
statusQual = Group("status" + oneOf("running complete wait"))
waitingToLockQual = Group(Literal("waiting to lock"))
statusQualifier = statusQual | waitingOnQual | waitingToLockQual |
lockQual
logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
+ date + logStatus.setResultsName("status") \
+ ZeroOrMore(statusQualifier).setResultsName("quals" )

for tokens in logEntry.searchString(data):
print tokens
print tokens.dump()
print tokens.id
print tokens.iid
print tokens.status
print tokens.quals
print

prints:

['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
- id: 8688
- iid: 98889998
- logtype: ['execute', 'begin']
- quals: [['locked', 'working.lock'], ['status', 'running'],
['status', 'complete']]
- status: ['status', 'enabled']
8688
98889998
['status', 'enabled']
[['locked', 'working.lock'], ['status', 'running'], ['status',
'complete']]

['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
- id: 9009
- iid: 87234785
- logtype: ['execute', 'wait']
- quals: [['waiting to lock'], ['status', 'wait'], ['waiting on',
'ID=', '8688']]
- status: ['status', 'wait']
9009
87234785
['status', 'wait']
[['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=',
'8688']]
Paul,

Thanks! That's a great module. I've been going through the docs and
it seems to do exactly what I need...

I appreciate your help!

Jan 30 '07 #3

P: n/a
On Mon, 29 Jan 2007 23:11:32 -0600, avidfan <no***@nowhere.comwrote:
>On 28 Jan 2007 21:20:47 -0800, "Paul McGuire" <pt***@austin.rr.com>
wrote:
>>On Jan 27, 10:43 pm, avidfan <n...@nowhere.comwrote:
>>I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:

The log file entries will consist of something like this:

ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete

ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

and so on...
For the parsing of this data, here is a pyparsing approach. Once
parse, the pyparsing ParseResults data structures can be massaged into
a queryable list. See the examples at the end for accessing the
individual parsed fields.

-- Paul

data = """
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688

"""
from pyparsing import *

integer=Word(nums)
idref = "ID=" + integer.setResultsName("id")
iidref = "IID=" + integer.setResultsName("iid")
date = Regex(r"\d\d\.\d\d\.\d{4}")

logLabel = Group("execute" + oneOf("begin wait"))
logStatus = Group("status" + oneOf("enabled wait"))
lockQual = Group("locked" + Word(alphanums+"."))
waitingOnQual = Group("waiting on" + idref)
statusQual = Group("status" + oneOf("running complete wait"))
waitingToLockQual = Group(Literal("waiting to lock"))
statusQualifier = statusQual | waitingOnQual | waitingToLockQual |
lockQual
logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
+ date + logStatus.setResultsName("status") \
+ ZeroOrMore(statusQualifier).setResultsName("quals" )

for tokens in logEntry.searchString(data):
print tokens
print tokens.dump()
print tokens.id
print tokens.iid
print tokens.status
print tokens.quals
print

prints:

['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
- id: 8688
- iid: 98889998
- logtype: ['execute', 'begin']
- quals: [['locked', 'working.lock'], ['status', 'running'],
['status', 'complete']]
- status: ['status', 'enabled']
8688
98889998
['status', 'enabled']
[['locked', 'working.lock'], ['status', 'running'], ['status',
'complete']]

['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
- id: 9009
- iid: 87234785
- logtype: ['execute', 'wait']
- quals: [['waiting to lock'], ['status', 'wait'], ['waiting on',
'ID=', '8688']]
- status: ['status', 'wait']
9009
87234785
['status', 'wait']
[['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=',
'8688']]

Paul,

Thanks! That's a great module. I've been going through the docs and
it seems to do exactly what I need...

I appreciate your help!
http://www.camelrichard.org/roller/p...log_files_with

Thanks, Paul!

Feb 4 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.