473,511 Members | 15,503 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Parsing a log file

CG
I am looking for a way to parse a simple log file to get the
information in a format that I can use. I would like to use python,
but I am just beginning to learn how to use it. I am not a programmer,
but have done some simple modifications and revisions of scripts. I am
willing to attempt this on my own, if someone can point me in the right
direction (any example scripts that do similar things would be
helpful). This doesn't have to be Python, but I need a cross-platform
solution (i.e. Perl or some other kind of script). I just wanted to
try Python because I like the concept of it.

Here is my scenario:
I have a program that connects and disconnects to a server. It writes
a simple log file like this:

08-13-2005 13:19:37:564 Program: CONNECTED to 'Server'
08-13-2005 15:40:08:313 Program: DISCONNECTED from 'Server'
08-13-2005 15:45:39:234 Program: CONNECTED to 'Server'
08-13-2005 15:55:18:113 Program: DISCONNECTED from 'Server'
08-13-2005 16:30:57:264 Program: CONNECTED to 'Server'
08-13-2005 16:59:46:417 Program: DISCONNECTED from 'Server'
08-13-2005 17:10:33:264 Program: CONNECTED to 'Server'
08-13-2005 18:25:26:316 Program: DISCONNECTED from 'Server'
08-13-2005 18:58:13:564 Program: CONNECTED to 'Server'
08-13-2005 19:29:10:715 Program: DISCONNECTED from 'Server'

What I basically want to do is end up with a text file that can be
easily imported into a database with a format like this (or I guess it
could be written in a SQL script form that could write directly to a
database like Mysql):

Connect_Date Connect_Time Disconnect_date Disconnect_time User
------------ ------------ --------------- --------------- -------
08-13-2005 13:19:37 08-13-2005 15:40:08 John
08-13-2005 15:45:39 08-13-2005 15:55:18 John
08-13-2005 16:30:57 08-13-2005 16:59:46 John
08-13-2005 17:10:33 08-13-2005 18:25:26 John
08-13-2005 18:58:13 08-13-2005 19:29:10 John

Here are some notes about this:
* the username would come from the log file name (i.e.
John_Connect.log)
* I don't need the fractions of seconds in the timestamps
* I only need date, time, and connect or disconnect, the other info is
not important
* If it is possible to calculate the elapsed time between Connect and
Disconnect and create a new field with that data, that would help (but
I can easily do that with SQL queries)
* This log file layout seems to be consistent
* There may not be a "disconnect" statement if the log file is read
while connected, so the next time it would have to insert the
disconnect information. The file will be read quite regularly, so this
is very likely.
* This would eventually need to be done without intervention (maybe
every 5 minutes).

I am open to other ideas or existing programs and am flexible about the
final solution.

Thanks,
Clint

Aug 13 '05 #1
7 2908
Am Samstag, den 13.08.2005, 14:01 -0700 schrieb CG:

Well, you have described your problem nicely. One thing that's missing
is how to deal with incorrect input. (For example missing connect or
disconnect messages).

Furthermore, you can now:
a) try to find somebody who writes it for you. How you motivate that
person is another question.
b) try to hack some solution yourself. Start with doing the python
tutorial?

Andreas
I am looking for a way to parse a simple log file to get the
information in a format that I can use. I would like to use python,
but I am just beginning to learn how to use it. I am not a programmer,
but have done some simple modifications and revisions of scripts. I am
willing to attempt this on my own, if someone can point me in the right
direction (any example scripts that do similar things would be
helpful). This doesn't have to be Python, but I need a cross-platform
solution (i.e. Perl or some other kind of script). I just wanted to
try Python because I like the concept of it.

Here is my scenario:
I have a program that connects and disconnects to a server. It writes
a simple log file like this:

08-13-2005 13:19:37:564 Program: CONNECTED to 'Server'
08-13-2005 15:40:08:313 Program: DISCONNECTED from 'Server'
08-13-2005 15:45:39:234 Program: CONNECTED to 'Server'
08-13-2005 15:55:18:113 Program: DISCONNECTED from 'Server'
08-13-2005 16:30:57:264 Program: CONNECTED to 'Server'
08-13-2005 16:59:46:417 Program: DISCONNECTED from 'Server'
08-13-2005 17:10:33:264 Program: CONNECTED to 'Server'
08-13-2005 18:25:26:316 Program: DISCONNECTED from 'Server'
08-13-2005 18:58:13:564 Program: CONNECTED to 'Server'
08-13-2005 19:29:10:715 Program: DISCONNECTED from 'Server'

What I basically want to do is end up with a text file that can be
easily imported into a database with a format like this (or I guess it
could be written in a SQL script form that could write directly to a
database like Mysql):

Connect_Date Connect_Time Disconnect_date Disconnect_time User
------------ ------------ --------------- --------------- -------
08-13-2005 13:19:37 08-13-2005 15:40:08 John
08-13-2005 15:45:39 08-13-2005 15:55:18 John
08-13-2005 16:30:57 08-13-2005 16:59:46 John
08-13-2005 17:10:33 08-13-2005 18:25:26 John
08-13-2005 18:58:13 08-13-2005 19:29:10 John

Here are some notes about this:
* the username would come from the log file name (i.e.
John_Connect.log)
* I don't need the fractions of seconds in the timestamps
* I only need date, time, and connect or disconnect, the other info is
not important
* If it is possible to calculate the elapsed time between Connect and
Disconnect and create a new field with that data, that would help (but
I can easily do that with SQL queries)
* This log file layout seems to be consistent
* There may not be a "disconnect" statement if the log file is read
while connected, so the next time it would have to insert the
disconnect information. The file will be read quite regularly, so this
is very likely.
* This would eventually need to be done without intervention (maybe
every 5 minutes).

I am open to other ideas or existing programs and am flexible about the
final solution.

Thanks,
Clint


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQBC/zsYHJdudm4KnO0RArj+AKDO9t4P7Sc/I6aAFcznfeWG5/nvqgCgjAIk
SnheKzvZcopetsUB/isosq8=
=z8EC
-----END PGP SIGNATURE-----

Aug 14 '05 #2
CG wrote:
[snip]
What I basically want to do is end up with a text file that can be
easily imported into a database with a format like this (or I guess it
could be written in a SQL script form that could write directly to a
database like Mysql):

Connect_Date Connect_Time Disconnect_date Disconnect_time User
------------ ------------ --------------- --------------- -------
08-13-2005 13:19:37 08-13-2005 15:40:08 John [snip] * I don't need the fractions of seconds in the timestamps
(1) Famous last words.
(2) What do you gain my throwing information away? Nothing! On input,
record what you are given. You can always round/truncate on output.
* I only need date, time, and connect or disconnect, the other info is
not important
Think about date and time as ONE piece of info. Use a "datetime" object
in Python, not a "date" and a "time". Same story with the columns in
your database.
* If it is possible to calculate the elapsed time between Connect and
Disconnect and create a new field with that data, that would help (but
I can easily do that with SQL queries)


and you will be able to do that even more easily if you use one "datetime".

A couple of quick silly questions: What do you do if servers are in
different timezones? What if "John" connects before a daylight saving
change and disconnects afterwards? Any chance of your using ISO standard
format for representing dates?
Aug 14 '05 #3
CG
Thanks Andreas,

In your first paragraph, you ask about incorrect input. I guess it is
possible, but without that information, my collection of the data is
useless, so I really don't know what I would do with that.

As for the other stuff, I can hack the data in other ways, such as with
VBA and MSAccess, which I am more familiar with, but I am trying to
move to Linux and want to do it right the first time. I figure Perl is
the more common language for this kind of stuff, but I did want to try
to learn some Python while I am at it. I have started the tutorial,
but being a businessman, time is an issue, which, if I had an example
script that did a similar thing, I can learn by doing that (I am
looking for something similar now).

I do live in a low-labor cost country, so I can hire someone to do it
for a small amount of money, but Python people are a little harder to
find.

Thanks for the comments,
Clint

Aug 14 '05 #4
CG
John,

Your comments are very helpful. I will take the datetime stamp as the
way to go. I don't have a need to throw away the time info, it is

You said:
What do you do if servers are in different
timezones?
This is all inhouse in a non-daylight savings country and would not be
an issue

You also said:Any chance of your using ISO standard format
for representing dates?


I think I have very little control over the actual logfile data. I
seem to be able to control what info it collects, but I don't think I
can change the formatting.

Thanks,
Clint

Aug 14 '05 #5
I am similarly not a programmer but am trying to learn python to do
tasks like this. I would read through the regular expressions
tutorial. You could probably easily read in all teh lines of the log
file, and then split them up by " " (spaces)..

If you're right about the lines all being consistent, that should
easily handle each line.
From there you could amost certainly drop off the trailling

milliseconds on the timestamps and do the simple data manipulation
you'd like.

here are a couple of links:

http://www.amk.ca/python/howto/regex/
http://gnosis.cx/publish/programming...pressions.html

HTH

googleboy

Aug 14 '05 #6
Completly untested:

#!/usr/bin/env python

import sys, datetime

user = sys.argv[1]

starttime = None
for l in sys.stdin:
flds = l.strip().split()
datestr, timestr, prog, op, to, sname = flds
month, day, year = [int(x) for x in datestr.split("-", 2)]
hour, min, sec, ms = [int(x) for x in timestr.split(":")]
timestamp = datetime.datetime(year, month, day, hour, min, sec)
if op == 'CONNECTED':
assert starttime is None
starttime = timestamp
elif op == 'DISCONNECTED':
assert starttime is not None
endtime = timestamp
sql = "insert into data (start, end, user) value (%r, %r, %r);"
print sql % (starttime, endtime, user)
else:
raise AssertationError("%r is not a valid line" % l)

Am Sonntag, den 14.08.2005, 07:31 -0700 schrieb CG:
Thanks Andreas,

In your first paragraph, you ask about incorrect input. I guess it is
possible, but without that information, my collection of the data is
useless, so I really don't know what I would do with that.

As for the other stuff, I can hack the data in other ways, such as with
VBA and MSAccess, which I am more familiar with, but I am trying to
move to Linux and want to do it right the first time. I figure Perl is
the more common language for this kind of stuff, but I did want to try
to learn some Python while I am at it. I have started the tutorial,
but being a businessman, time is an issue, which, if I had an example
script that did a similar thing, I can learn by doing that (I am
looking for something similar now).

I do live in a low-labor cost country, so I can hire someone to do it
for a small amount of money, but Python people are a little harder to
find.

Thanks for the comments,
Clint


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQBC/7N5HJdudm4KnO0RAtMtAJ9i/+y83Y/ISJOUmhW22YvHjBuz1wCePLFq
3NMs2bO+4nqgNscSouhN208=
=579W
-----END PGP SIGNATURE-----

Aug 14 '05 #7
Completly untested:

#!/usr/bin/env python

import sys, datetime

user = sys.argv[1]

starttime = None
for l in sys.stdin:
flds = l.strip().split()
datestr, timestr, prog, op, to, sname = flds
month, day, year = [int(x) for x in datestr.split("-", 2)]
hour, min, sec, ms = [int(x) for x in timestr.split(":")]
timestamp = datetime.datetime(year, month, day, hour, min, sec)
if op == 'CONNECTED':
assert starttime is None
starttime = timestamp
elif op == 'DISCONNECTED':
assert starttime is not None
endtime = timestamp
sql = "insert into data (start, end, user) value (%r, %r, %r);"
print sql % (starttime, endtime, user)
else:
raise AssertationError("%r is not a valid line" % l)

Am Sonntag, den 14.08.2005, 07:31 -0700 schrieb CG:
Thanks Andreas,

In your first paragraph, you ask about incorrect input. I guess it is
possible, but without that information, my collection of the data is
useless, so I really don't know what I would do with that.

As for the other stuff, I can hack the data in other ways, such as with
VBA and MSAccess, which I am more familiar with, but I am trying to
move to Linux and want to do it right the first time. I figure Perl is
the more common language for this kind of stuff, but I did want to try
to learn some Python while I am at it. I have started the tutorial,
but being a businessman, time is an issue, which, if I had an example
script that did a similar thing, I can learn by doing that (I am
looking for something similar now).

I do live in a low-labor cost country, so I can hire someone to do it
for a small amount of money, but Python people are a little harder to
find.

Thanks for the comments,
Clint


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQBC/7N5HJdudm4KnO0RAtMtAKDooZ+aqUQjGgRlJUPDOzCkm6MeRwC fbXTr
1Xl2sb6Fn9fuq0wM46t/jM0=
=pe/a
-----END PGP SIGNATURE-----

Aug 14 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
3639
by: Willem Ligtenberg | last post by:
I decided to use SAX to parse my xml file. But the parser crashes on: File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception...
2
3916
by: Cigdem | last post by:
Hello, I am trying to parse the XML files that the user selects(XML files are on anoher OS400 system called "wkdis3"). But i am permenantly getting that error: Directory0: \\wkdis3\ROOT\home...
3
3485
by: Pir8 | last post by:
I have a complex xml file, which contains stories within a magazine. The structure of the xml file is as follows: <?xml version="1.0" encoding="ISO-8859-1" ?> <magazine> <story>...
1
2440
by: Christoph Bisping | last post by:
Hello! Maybe someone is able to give me a little hint on this: I've written a vb.net app which is mainly an interpreter for specialized CAD/CAM files. These files mainly contain simple movement...
4
4846
by: Rick Walsh | last post by:
I have an HTML table in the following format: <table> <tr><td>Header 1</td><td>Header 2</td></tr> <tr><td>1</td><td>2</td></tr> <tr><td>3</td><td>4</td></tr> <tr><td>5</td><td>6</td></tr>...
3
4359
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...
9
1971
by: Paulers | last post by:
Hello, I have a log file that contains many multi-line messages. What is the best approach to take for extracting data out of each message and populating object properties to be stored in an...
13
4474
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
13
2788
by: charliefortune | last post by:
I am fetching some product feeds with PHP like this $merch = substr($key,1); $feed = file_get_contents($_POST); $fp = fopen("./feeds/feed".$merch.".txt","w+"); fwrite ($fp,$feed); fclose...
2
3597
by: Felipe De Bene | last post by:
I'm having problems parsing an HTML file with the following syntax : <TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'> <TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH> <TH...
0
7353
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7418
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7075
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
1
5063
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4737
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3222
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3212
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1572
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
446
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.