I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:
The log file entries will consist of something like this:
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688
and so on...
I need to be able to group these entries together, index them by ID
and IID, and search the context of each entry and if a certain status
if found (such as wait), then be able to return the ID or IID
(depending...) of that entry.
So I was considering parsing them to this effect:
in a dictionary, where the key is a tuple, and the value is a list:
{('ID=8688', 'IID=98889998'): ['ID=8688 IID=98889998 execute begin -
01.21.2007 status enabled', 'locked working.lock', 'status running',
'status complete']}
I am keeping the full text of each entry in the list so that I can
recreate them for display if need be.
I am fairly new to python, so could anyone offer any advice here
before I get too far and discover a fatal flaw that you might see
coming a mile away?
would I, with this design, be able to, for example, search each list
for "waiting on ID=8688", and when found, be able to associate that
value with one of the elements of it's key "ID=9009" ? or is this
approached flawed? I'm assuming there is a better way, but I need
some advice...
I appreciate any thoughts.
Thanks. 3 1513
On Jan 27, 10:43 pm, avidfan <n...@nowhere.comwrote:
I need to parse a log file using python and I need some advice/wisdom
on the best way to go about it:
The log file entries will consist of something like this:
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688
and so on...
For the parsing of this data, here is a pyparsing approach. Once
parse, the pyparsing ParseResults data structures can be massaged into
a queryable list. See the examples at the end for accessing the
individual parsed fields.
-- Paul
data = """
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688
"""
from pyparsing import *
integer=Word(nums)
idref = "ID=" + integer.setResultsName("id")
iidref = "IID=" + integer.setResultsName("iid")
date = Regex(r"\d\d\.\d\d\.\d{4}")
logLabel = Group("execute" + oneOf("begin wait"))
logStatus = Group("status" + oneOf("enabled wait"))
lockQual = Group("locked" + Word(alphanums+"."))
waitingOnQual = Group("waiting on" + idref)
statusQual = Group("status" + oneOf("running complete wait"))
waitingToLockQual = Group(Literal("waiting to lock"))
statusQualifier = statusQual | waitingOnQual | waitingToLockQual |
lockQual
logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
+ date + logStatus.setResultsName("status") \
+ ZeroOrMore(statusQualifier).setResultsName("quals" )
for tokens in logEntry.searchString(data):
print tokens
print tokens.dump()
print tokens.id
print tokens.iid
print tokens.status
print tokens.quals
print
prints:
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-',
'01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'],
['status', 'running'], ['status', 'complete']]
- id: 8688
- iid: 98889998
- logtype: ['execute', 'begin']
- quals: [['locked', 'working.lock'], ['status', 'running'],
['status', 'complete']]
- status: ['status', 'enabled']
8688
98889998
['status', 'enabled']
[['locked', 'working.lock'], ['status', 'running'], ['status',
'complete']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-',
'01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status',
'wait'], ['waiting on', 'ID=', '8688']]
- id: 9009
- iid: 87234785
- logtype: ['execute', 'wait']
- quals: [['waiting to lock'], ['status', 'wait'], ['waiting on',
'ID=', '8688']]
- status: ['status', 'wait']
9009
87234785
['status', 'wait']
[['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=',
'8688']]
On 28 Jan 2007 21:20:47 -0800, "Paul McGuire" <pt***@austin.rr.com>
wrote:
>On Jan 27, 10:43 pm, avidfan <n...@nowhere.comwrote:
>I need to parse a log file using python and I need some advice/wisdom on the best way to go about it:
The log file entries will consist of something like this:
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled locked working.lock status running status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait waiting to lock status wait waiting on ID=8688
and so on...
For the parsing of this data, here is a pyparsing approach. Once parse, the pyparsing ParseResults data structures can be massaged into a queryable list. See the examples at the end for accessing the individual parsed fields.
-- Paul
data = """ ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled
locked working.lock
status running
status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait
waiting to lock
status wait
waiting on ID=8688
""" from pyparsing import *
integer=Word(nums) idref = "ID=" + integer.setResultsName("id") iidref = "IID=" + integer.setResultsName("iid") date = Regex(r"\d\d\.\d\d\.\d{4}")
logLabel = Group("execute" + oneOf("begin wait")) logStatus = Group("status" + oneOf("enabled wait")) lockQual = Group("locked" + Word(alphanums+".")) waitingOnQual = Group("waiting on" + idref) statusQual = Group("status" + oneOf("running complete wait")) waitingToLockQual = Group(Literal("waiting to lock")) statusQualifier = statusQual | waitingOnQual | waitingToLockQual | lockQual logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \
+ date + logStatus.setResultsName("status") \
+ ZeroOrMore(statusQualifier).setResultsName("quals" )
for tokens in logEntry.searchString(data):
print tokens
print tokens.dump()
print tokens.id
print tokens.iid
print tokens.status
print tokens.quals
print
prints:
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-', '01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'], ['status', 'running'], ['status', 'complete']] ['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-', '01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'], ['status', 'running'], ['status', 'complete']] - id: 8688 - iid: 98889998 - logtype: ['execute', 'begin'] - quals: [['locked', 'working.lock'], ['status', 'running'], ['status', 'complete']] - status: ['status', 'enabled'] 8688 98889998 ['status', 'enabled'] [['locked', 'working.lock'], ['status', 'running'], ['status', 'complete']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-', '01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=', '8688']] ['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-', '01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=', '8688']] - id: 9009 - iid: 87234785 - logtype: ['execute', 'wait'] - quals: [['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=', '8688']] - status: ['status', 'wait'] 9009 87234785 ['status', 'wait'] [['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=', '8688']]
Paul,
Thanks! That's a great module. I've been going through the docs and
it seems to do exactly what I need...
I appreciate your help!
On Mon, 29 Jan 2007 23:11:32 -0600, avidfan <no***@nowhere.comwrote:
>On 28 Jan 2007 21:20:47 -0800, "Paul McGuire" <pt***@austin.rr.com> wrote:
>>On Jan 27, 10:43 pm, avidfan <n...@nowhere.comwrote:
>>I need to parse a log file using python and I need some advice/wisdom on the best way to go about it:
The log file entries will consist of something like this:
ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled locked working.lock status running status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait waiting to lock status wait waiting on ID=8688
and so on...
For the parsing of this data, here is a pyparsing approach. Once parse, the pyparsing ParseResults data structures can be massaged into a queryable list. See the examples at the end for accessing the individual parsed fields.
-- Paul
data = """ ID=8688 IID=98889998 execute begin - 01.21.2007 status enabled locked working.lock status running status complete
ID=9009 IID=87234785 execute wait - 01.21.2007 status wait waiting to lock status wait waiting on ID=8688
""" from pyparsing import *
integer=Word(nums) idref = "ID=" + integer.setResultsName("id") iidref = "IID=" + integer.setResultsName("iid") date = Regex(r"\d\d\.\d\d\.\d{4}")
logLabel = Group("execute" + oneOf("begin wait")) logStatus = Group("status" + oneOf("enabled wait")) lockQual = Group("locked" + Word(alphanums+".")) waitingOnQual = Group("waiting on" + idref) statusQual = Group("status" + oneOf("running complete wait")) waitingToLockQual = Group(Literal("waiting to lock")) statusQualifier = statusQual | waitingOnQual | waitingToLockQual | lockQual logEntry = idref + iidref + logLabel.setResultsName("logtype") + "-" \ + date + logStatus.setResultsName("status") \ + ZeroOrMore(statusQualifier).setResultsName("quals" )
for tokens in logEntry.searchString(data): print tokens print tokens.dump() print tokens.id print tokens.iid print tokens.status print tokens.quals print
prints:
['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-', '01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'], ['status', 'running'], ['status', 'complete']] ['ID=', '8688', 'IID=', '98889998', ['execute', 'begin'], '-', '01.21.2007', ['status', 'enabled'], ['locked', 'working.lock'], ['status', 'running'], ['status', 'complete']] - id: 8688 - iid: 98889998 - logtype: ['execute', 'begin'] - quals: [['locked', 'working.lock'], ['status', 'running'], ['status', 'complete']] - status: ['status', 'enabled'] 8688 98889998 ['status', 'enabled'] [['locked', 'working.lock'], ['status', 'running'], ['status', 'complete']]
['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-', '01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=', '8688']] ['ID=', '9009', 'IID=', '87234785', ['execute', 'wait'], '-', '01.21.2007', ['status', 'wait'], ['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=', '8688']] - id: 9009 - iid: 87234785 - logtype: ['execute', 'wait'] - quals: [['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=', '8688']] - status: ['status', 'wait'] 9009 87234785 ['status', 'wait'] [['waiting to lock'], ['status', 'wait'], ['waiting on', 'ID=', '8688']]
Paul,
Thanks! That's a great module. I've been going through the docs and it seems to do exactly what I need...
I appreciate your help!
http://www.camelrichard.org/roller/p...log_files_with
Thanks, Paul! This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: ggg |
last post by:
There are over 10 different XML reports I can download from someone
else's server.
I've made a class to parse the XML into an array. So basically I just
have the start_tag() track what the...
|
by: Rio |
last post by:
I'm trying to build HTML table parser (available not satisfying my needs), I
need to know what is in every cell, along with its cell, row and table
number plus the attributes that go with those...
|
by: Magnus Heino |
last post by:
Hi.
Are there any patterns or other design techniques that could be used
when implementing a xml parser that needs to be able to handle
different versions of a schema?
Let's say that I write...
|
by: Phlip |
last post by:
C++ newsgroupies:
I wrote a parser to solve math expressions like "3.0 ^(4 - 5)", or "3 / 8".
Below my sig is recursiveDescentParser.cpp, the test suite that drove the
implementation of the...
|
by: Andy |
last post by:
Hi, all
I am trying to design a parser for C program using C++. Currently what
I did for syntax tree is to design a class for each nontermials in the
grammar, and use inherentance to link them....
|
by: Ali-R |
last post by:
Hi,
Is there a parser which parses CSV files?
Thanks for your help.
Reza
|
by: siddharthkhare |
last post by:
Hi All,
I need to parse certain text from a paragraph (like 20 lines).
I know the exact tags that I am looking for.
my approach is to define a xml (config) file that defines what tag I am...
|
by: mahesh.kanakaraj |
last post by:
Hi Folks,
This is my first post to this group, and I really am not sure whether
this is the right group to ask my question. If its not an appropriate
question to this group, please correct me...
|
by: Marc Gravell |
last post by:
In Linq, you can apparently get a meaningful body from and
expression's .ToString(); random question - does anybody know if linq
also includes a parser? It just seemed it might be a handy way to...
|
by: UncleRic |
last post by:
Environment: Mac OS X (10.4.10) on MacBook Pro
I'm a Perl Neophyte. I've downloaded the XML::Parser module and am attempting to install it in my working directory (referenced via PERL5LIB env):
...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
| |