473,241 Members | 1,575 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,241 software developers and data experts.

Parse ASCII log ; sort and keep most recent entries

Hi folks,

I am a newbie to Python and am hoping that someone can get me started
on a log parser that I am trying to write.

The log is an ASCII file that contains a process identifier (PID),
username, date, and time field like this:

1234 williamstim 01AUG03 7:44:31
2348 williamstim 02AUG03 14:11:20
23 jonesjimbo 07AUG03 15:25:00
2348 williamstim 17AUG03 9:13:55
748 jonesjimbo 13OCT03 14:10:05
23 jonesjimbo 14OCT03 23:01:23
748 jonesjimbo 14OCT03 23:59:59

I want to read in and sort the file so the new list only contains only
the most the most recent PID (PIDS get reused often). In my example,
the new list would be:

1234 williamstim 01AUG03 7:44:31
2348 williamstim 17AUG03 9:13:55
23 jonesjimbo 14OCT03 23:01:23
748 jonesjimbo 14OCT03 23:59:59

So I need to sort by PID and date + time,then keep the most recent.

Any help would be appreciated!

Taylor

No*********@hotmail.com
Jul 18 '05 #1
7 2565
Nova's Taylor wrote:
I am a newbie to Python and am hoping that someone can get me started
on a log parser that I am trying to write.

I want to read in and sort the file so the new list only contains only
the most the most recent PID (PIDS get reused often). In my example,
the new list would be:

1234 williamstim 01AUG03 7:44:31
2348 williamstim 17AUG03 9:13:55
23 jonesjimbo 14OCT03 23:01:23
748 jonesjimbo 14OCT03 23:59:59

So I need to sort by PID and date + time,then keep the most recent.


I think you are specifying the implementation of the solution
a bit, rather than just the requirements. Do you really need
the resulting list to be sorted by PID and date/time, or was
that just part of how you thought you'd write it?

If you don't care about the sorting part, but just want the
output to be a list of unique PIDs, you could just do the
following instead, taking advantage of how Python dictionaries
have unique keys. Note that this assumes that the contents
of the file were originally in order by date (i.e. more recent
items come later).

1. Create empty dict: "d = {}"
2. Read data line by line: "for line in infile.readlines()"
3. Split so the PID is separate: "pid = line.split()[0]"
4. Store entire line in dictionary using PID as key: "d[pid] = line"

When you're done, the dict will contain only the most recent
line with a given PID, though in "arbitrary" (effectively
random) order. If you don't care about the order of the final
result, just open a file and with one line the reduced data
is written out:

newfile.write(''.join(d.values()))

-Peter
Jul 18 '05 #2
Here's a quick solution.

Larry Bates
Syscon, Inc.
def cmpfunc(x,y):
xdate=x[0]
xtime=x[1]
ydate=y[0]
ytime=y[1]
if xdate == ydate:
#
# If the two dates are equal, I must check the times
#
if xtime > ytime: return 1
elif xtime == ytime: return 0
else: return -1
elif xdate > ydate: return 1
return -1

fp=file(yourlogfilepath, 'r')
lines=fp.readlines()
fp.close()
list=[]
months={'JAN': '01', 'FEB': '02', 'MAR': '03', 'APR': '04',
'MAY': '05', 'JUN': '06', 'JUL': '07', 'AUG': '08',
'SEP': '09', 'OCT': '10', 'NOV': '11', 'DEC': '12'}

logdict={}

for line in lines:
if not line.strip(): break
print line
pid, name, date, time=[x.strip() for x in line.rstrip().split(' ')]
#
# Must zero pad time for proper comparison
#
stime=time.zfill(8)
#
# Must reformat the data as YYMMDD
#
sdate=date[-2:]+months[date[2:5]]+date[:2]
list.append((sdate, stime, pid, name, date, time))

list.sort(cmpfunc)
list.reverse()

for sdate, stime, pid, name, date, time in list:
if logdict.has_key(pid): continue
logdict[pid]=(pid, name, date, time)

for key in logdict.keys():
pid, name, date, time=logdict[key]
print pid, name, date, time

"Nova's Taylor" <no*********@hotmail.com> wrote in message
news:fd*************************@posting.google.co m...
Hi folks,

I am a newbie to Python and am hoping that someone can get me started
on a log parser that I am trying to write.

The log is an ASCII file that contains a process identifier (PID),
username, date, and time field like this:

1234 williamstim 01AUG03 7:44:31
2348 williamstim 02AUG03 14:11:20
23 jonesjimbo 07AUG03 15:25:00
2348 williamstim 17AUG03 9:13:55
748 jonesjimbo 13OCT03 14:10:05
23 jonesjimbo 14OCT03 23:01:23
748 jonesjimbo 14OCT03 23:59:59

I want to read in and sort the file so the new list only contains only
the most the most recent PID (PIDS get reused often). In my example,
the new list would be:

1234 williamstim 01AUG03 7:44:31
2348 williamstim 17AUG03 9:13:55
23 jonesjimbo 14OCT03 23:01:23
748 jonesjimbo 14OCT03 23:59:59

So I need to sort by PID and date + time,then keep the most recent.

Any help would be appreciated!

Taylor

No*********@hotmail.com

Jul 18 '05 #3
no*********@hotmail.com (Nova's Taylor) writes:
Hi folks,

I am a newbie to Python and am hoping that someone can get me started
on a log parser that I am trying to write.

The log is an ASCII file that contains a process identifier (PID),
username, date, and time field like this:

1234 williamstim 01AUG03 7:44:31
2348 williamstim 02AUG03 14:11:20
23 jonesjimbo 07AUG03 15:25:00
2348 williamstim 17AUG03 9:13:55
748 jonesjimbo 13OCT03 14:10:05
23 jonesjimbo 14OCT03 23:01:23
748 jonesjimbo 14OCT03 23:59:59

I want to read in and sort the file so the new list only contains only
the most the most recent PID (PIDS get reused often). In my example,
the new list would be:

1234 williamstim 01AUG03 7:44:31
2348 williamstim 17AUG03 9:13:55
23 jonesjimbo 14OCT03 23:01:23
748 jonesjimbo 14OCT03 23:59:59

So I need to sort by PID and date + time,then keep the most recent.

Any help would be appreciated!

Taylor

No*********@hotmail.com

#!/usr/bin/env python
#
# I'm expecting the log file to be in chronalogical order
# so later entries are later in time
# using the dict, later PIDs overwrite newer ones.
# make a script and use this like
# logparse.py mylogfile.log > newlogfile.log
#
import fileinput
piddict = {}
for line in fileinput:
pid,username,date,time = line.split()
piddict[pid] = (username,date,time)
#
pidlist = piddict.keys()
pidlist.sort()
for pid in pidlist:
username,date,time = piddict[pid]
print pid,username,date,time
#tada!
Jul 18 '05 #4
On Wed, 16 Jun 2004 19:41:58 -0400, rumours say that Peter Hansen
<pe***@engcorp.com> might have written:

[snip]
If you don't care about the order of the final
result, just open a file and with one line the reduced data
is written out:

newfile.write(''.join(d.values()))


or

newfile.writelines(d.values()) # 1.5.2 and later

or

newfile.writelines(d.itervalues()) # 2.2 and later
--
TZOTZIOY, I speak England very best,
"I have a cunning plan, m'lord" --Sean Bean as Odysseus/Ulysses
Jul 18 '05 #5

"Nova's Taylor" <no*********@hotmail.com> wrote in message
news:fd*************************@posting.google.co m...
The log is an ASCII file that contains a process identifier (PID),
username, date, and time field like this:

1234 williamstim 01AUG03 7:44:31
2348 williamstim 02AUG03 14:11:20
23 jonesjimbo 07AUG03 15:25:00
2348 williamstim 17AUG03 9:13:55
748 jonesjimbo 13OCT03 14:10:05
23 jonesjimbo 14OCT03 23:01:23
748 jonesjimbo 14OCT03 23:59:59
If you can get the log writer to write fixed length records with everything
lined up nicely, it would be easier to read the log by eye (with fixed
pitch font, which my newsreader doesn't use). It is also then trivial to
slice a field out of the middle of the line.

If one wants/needs to sort records by date, life is also easier if you can
get the record writer to print dates in sortable format: YYYYMMDD. (I
learned this 25 years ago.)
I want to read in and sort the file so the new list only contains only
the most the most recent PID (PIDS get reused often).
If these are *nix process ids, this does not make obvious sense. Since
pids are arbitrary, why delete a recent record because its PID got reused
while keeping an old record because its PID happended not to? I could
better imagine keeping all records since a certain date or the last n
records (the latter is trivial with fixed len records).
In my example, the new list would be:

1234 williamstim 01AUG03 7:44:31
2348 williamstim 17AUG03 9:13:55
23 jonesjimbo 14OCT03 23:01:23
748 jonesjimbo 14OCT03 23:59:59

So I need to sort by PID and date + time,then keep the most recent.
That is one possibility: you have form a list of (key, line) pairs, where
key is extracted from the line.
Any help would be appreciated!


Alternative: instead of sort then filter duplicates, filter duplicates and
then sort the reduced list. Assuming records are in date order from
earlier to later, insert them into a dict with PID as key and entire record
as value, and later records will replace earlier records with same key
(PID). Then resort d.values() by date. Variation: if you cannot get dates
stored properly for easy sorting, store line numbers with records so you
can sort by line number instead of fiddling with nasty dates. Something
like (incomplete and untested):

d = {}
for pair in enumerate(file('whatever')):
d[getpid(pair[1])] = pair # getpid might be inline expression
uniqs = d.values()
uniqs.sort()
new = [pair[1] for pair in uniqs]

Terry J. Reedy


Jul 18 '05 #6
Wow - thanks for all of your great suggestions. I did neglect to
mention that the log file is appended to over time, so the values are
already in a time-sequenced sort going in, thus allowing the use of a
dictionary as suggested by David and others. This is what I wound up
using:

sourceFile = open(r'C:\_sandbox\SASAdmin\Python\ServerAdmin\Sig nOnLog.txt')

# output file for testing only
logFile = open(r'C:\_sandbox\SASAdmin\Python\ServerAdmin\tes t.txt',
'w')

piddict = {}
for line in sourceFile:
pid,username,date,time = line.split()
piddict[pid] = (username,date,time)

pidlist = piddict.keys()
pidlist.sort()
for pid in pidlist:
username,date,time = piddict[pid]
# next line seems amateurish, but that is what I am!
logFile.write(pid + " " + username + " " + date + "" + time +
"\n")

More background:

I will next merge this log file to process identifiers running on a
server, so I can identify "who-started-what-process-when." In Perl I
do it this way:
$pattern=sas; ## name of application I am searching for

# Use PSLIST.EXE to list processes on the server
open(PIDLIST, "pslist |") or die "Can not run the PSLIST program:
$!\n";

while(<PIDLIST>)
{
$output .=$_;
if (/$pattern/i)
{
## collect pids that match pattern into an array, splitting on
white spaces
@taskList=split(/\s+/, $_);

## Check each value in the Server task list with each row in the
log file
foreach $proc_val ( @fl )
{
chomp($proc_val); ## Remove new line characters at the end.
@log=split(/\s+/, $proc_val);

if ( $log[0] eq $taskList[1])
{
# print">>>>No matches in log Files!!<<<<<<<<<<< \n"; #
debug
print "$taskList[0] $log[0] $log[1] $log[2]
$taskList[5] $log[3] $taskList[8] \n";
$foundIt=1;
}
}
}
}
close(PIDLIST);
So now its more reading to see how to do this in Python!
Thanks again for all your help!

Taylor
Jul 18 '05 #7
Nova's Taylor wrote:
This is what I wound up using:
Could I suggest part of my suggestion again? See below:
piddict = {}
for line in sourceFile:
pid,username,date,time = line.split()
piddict[pid] = (username,date,time)
Here you are splitting the whole thing, and storing a Python
tuple rather than the original "line" contents...
pidlist = piddict.keys()
pidlist.sort()
for pid in pidlist:
username,date,time = piddict[pid]
# next line seems amateurish, but that is what I am!
logFile.write(pid + " " + username + " " + date + "" + time +
"\n")


Here you are writing out something that is exactly equal
(if I read this all correctly) to the original line, but
having to split the tuple and append lots of strings together
again with spaces, the newline, etc.

Why not just store the original line and use it at the end:

for line in sourceFile:
pid, _ = line.split(' ', 1)
piddict[pid] = line

and later, use writelines as Christos suggested, without
even needing a loop:

logFile.writelines(piddict.values())

The difference in the writing part is that you are sorting by
pid, though I'm not clear why or if it's required. If it is,
you could still loop, but more simply:

for pid in pidlist:
logFile.write(piddict[pid])

No splitting, no concatenating...

-Peter
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: Peter Wilkinson | last post by:
Hello tlistmembers, I am using the encoding function to convert unicode to ascii. At one point this code was working just fine, however, now it has broken. I am reading a text file that has is...
7
by: William Payne | last post by:
Hello, have you seen a recent files menu in a GUI application? In many GUI applications there's a menu the displays the most recent files that has been opened by the program. Say such a menu has...
19
by: Johnny Google | last post by:
Here is an example of the type of data from a file I will have: Apple,4322,3435,4653,6543,4652 Banana,6934,5423,6753,6531 Carrot,3454,4534,3434,1111,9120,5453 Cheese,4411,5522,6622,6641 The...
8
by: moondaddy | last post by:
I'm writing an app in vb.net 1.1 and I need to parse strings that look similar to the one below. All 5 rows will make up one string. I have a form where a use can copy/paste data like what you...
14
by: Luis P. Mendes | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I'm developing a django based intranet web server that has a search page. Data contained in the database is mixed. Some of the words are...
3
by: beconrad | last post by:
Hi all, I am not sure if what I want to do is possible, and if it is I have not been able to figure out how to do it. This is what I would like: 1. I have a data entry form with a field...
5
by: Mike Currie | last post by:
Can anyone explain why I'm getting an ascii encoding error when I'm trying to write out using a UTF-8 encoder? Thanks Python 2.4.3 (#69, Mar 29 2006, 17:35:34) on win32 Type "help",...
19
by: Thomas W | last post by:
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". I stored the string as "fødselsdag"...
9
by: RMC | last post by:
Hello, I'm looking for a way to parse/format a memo field within a report. The Access 2000 database (application) has an equipment table that holds a memo field. Within the report, the memo...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.