473,395 Members | 1,968 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

How to do a report on a .txt log file

alexphd
19
I have a .txt log file and must of it is crap. But there are parts that display when a user logs in, and at what time the logged in. Below is a portion of the log file. For example, "mwoelk" is a user logging in and "dcurtin" is another user logging in. So far I have created a python app that counts how many times a user logged in, but I'm a little clueless on how to pull when the user logged in. Any help on what I could do would help a lot.

172.16.9.206 - mwoelk [01/Feb/2008:04:32:12 -0500] "GET /controller?method=getUser HTTP/1.0" 200 305
172.16.9.166 - - [01/Feb/2008:04:57:38 -0500] "HEAD /images/DCI.gif HTTP/1.1" 200 -
172.16.9.166 - - [01/Feb/2008:04:57:38 -0500] "HEAD /eagent.jnlp HTTP/1.1" 200 -
172.16.9.166 - - [01/Feb/2008:04:57:38 -0500] "HEAD /jh.jnlp HTTP/1.1" 200 -
172.16.9.166 - - [01/Feb/2008:04:57:38 -0500] "HEAD /smack.jar HTTP/1.1" 200 -
172.16.9.166 - - [01/Feb/2008:04:57:38 -0500] "HEAD /jh.jar HTTP/1.1" 200 -
172.16.9.166 - - [01/Feb/2008:04:57:39 -0500] "HEAD /images/DCI.gif HTTP/1.1" 200 -
172.16.9.166 - noone [01/Feb/2008:04:57:40 -0500] "GET /controller?method=getNode&name=S14000068 HTTP/1.0" 200 499
172.16.9.166 - - [01/Feb/2008:04:57:40 -0500] "GET /help/helpset.hs HTTP/1.1" 200 547
172.16.9.166 - - [01/Feb/2008:04:57:43 -0500] "GET /help/map.jhm HTTP/1.1" 200 59650
172.16.9.162 - dcurtin [01/Feb/2008:00:19:16 -0500] "GET /controller?method=getUser HTTP/1.0" 200 307

Here is what I have done so far to count the frequency of a user logging in.

Expand|Select|Wrap|Line Numbers
  1. file = open("localhost_access_log.2008-02-01.txt", "r")
  2. text = file.read()
  3. file.close()
  4.  
  5. word_list = text.lower().split(None)
  6.  
  7. word_freq = {}
  8. for word in word_list:
  9.     word_freq[word] = word_freq.get(word, 0) + 1
  10.  
  11. keys = sorted(word_freq.keys())
  12. for word in keys:
  13.     print "%-10s %d" % (word, word_freq[word])
  14.  
  15.  
Feb 22 '08 #1
13 1759
bvdet
2,851 Expert Mod 2GB
alexphd,

That's pretty good work you have done so far. The data is actually ordered very well for parsing. Since the log in time is enclosed in brackets, you can find the log in times by getting the index of the brackets (using the string method index()) and slicing the string ("the_string"[start:end]). You can also do it with a regular expression. While we're at it, why not get the user name at the same time? Let's assume the user name can only contain alphanumeric characters.
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fn = 'access_log.txt'
  4. pattLog = re.compile(r'([a-zA-Z0-9]+) \[(.+)\]')
  5. fileList = open(fn).readlines()
  6. logdict = {}
  7. for item in fileList:
  8.     m = pattLog.search(item)
  9.     if m:
  10.         logdict.setdefault(m.group(1), []).append(m.group(2))
  11.  
  12. for key in logdict:
  13.     n = len(logdict[key])
  14.     print 'User %s logged in %d time%s:\n%s\n' % \
  15.           (key, n, ['','s'][n > 1 or 0], '\n'.join(logdict[key]))
Output:
>>> User mwoelk logged in 2 times:
01/Feb/2008:04:32:12 -0500
03/Feb/2008:12:01:10 -0500

User noone logged in 2 times:
01/Feb/2008:04:57:40 -0500
02/Feb/2008:14:00:40 -0500

User dcurtin logged in 1 time:
01/Feb/2008:00:19:16 -0500

>>>

HTH :)
Feb 22 '08 #2
alexphd
19
alexphd,

That's pretty good work you have done so far. The data is actually ordered very well for parsing. Since the log in time is enclosed in brackets, you can find the log in times by getting the index of the brackets (using the string method index()) and slicing the string ("the_string"[start:end]). You can also do it with a regular expression. While we're at it, why not get the user name at the same time? Let's assume the user name can only contain alphanumeric characters.
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fn = 'access_log.txt'
  4. pattLog = re.compile(r'([a-zA-Z0-9]+) \[(.+)\]')
  5. fileList = open(fn).readlines()
  6. logdict = {}
  7. for item in fileList:
  8.     m = pattLog.search(item)
  9.     if m:
  10.         logdict.setdefault(m.group(1), []).append(m.group(2))
  11.  
  12. for key in logdict:
  13.     n = len(logdict[key])
  14.     print 'User %s logged in %d time%s:\n%s\n' % \
  15.           (key, n, ['','s'][n > 1 or 0], '\n'.join(logdict[key]))
Output:
>>> User mwoelk logged in 2 times:
01/Feb/2008:04:32:12 -0500
03/Feb/2008:12:01:10 -0500

User noone logged in 2 times:
01/Feb/2008:04:57:40 -0500
02/Feb/2008:14:00:40 -0500

User dcurtin logged in 1 time:
01/Feb/2008:00:19:16 -0500

>>>

HTH :)
Okay, I see what you're doing but I have a few questions. What does logdict={} exactly do. Baiscally this code block of code that you wrote
Expand|Select|Wrap|Line Numbers
  1. for item in fileList:
  2.     m = pattLog.search(item)
  3.     if m:
  4.         logdict.setdefault(m.group(1), []).append(m.group(2))
  5.  
I'm a little bit confused.

Also, I if I wanted to find the most reoccurring user would I add a counter to that for loop you created? Or would it be different? Sorry , if I'm asking a lot of questions I'm juststarted to learn python. And trying to understand the syntax fully.
Feb 22 '08 #3
bvdet
2,851 Expert Mod 2GB
Okay, I see what you're doing but I have a few questions. What does logdict={} exactly do. Baiscally this code block of code that you wrote
Expand|Select|Wrap|Line Numbers
  1. for item in fileList:
  2.     m = pattLog.search(item)
  3.     if m:
  4.         logdict.setdefault(m.group(1), []).append(m.group(2))
  5.  
I'm a little bit confused.

Also, I if I wanted to find the most reoccurring user would I add a counter to that for loop you created? Or would it be different? Sorry , if I'm asking a lot of questions I'm juststarted to learn python. And trying to understand the syntax fully.
The keys in logdict are the user names and the values are lists of the log in times. The count of the number of log ins is the length of each log in list.
Expand|Select|Wrap|Line Numbers
  1. >>> logdict
  2. {'mwoelk': ['01/Feb/2008:04:32:12 -0500', '03/Feb/2008:12:01:10 -0500'], 'noone': ['01/Feb/2008:04:57:40 -0500', '02/Feb/2008:14:00:40 -0500'], 'dcurtin': ['01/Feb/2008:00:19:16 -0500']}
  3. >>> 
From Python documentation:
a.setdefault(k[, x]) returns a[k] if k in a, else x (also setting it)

To determine the user with the most log ins:
Expand|Select|Wrap|Line Numbers
  1. freqList = [[len(logdict[key]), key] for key in logdict]
  2. freqList.sort()
  3.  
  4. print freqList
  5. print 'The user that logged in the most times is %s.' % (freqList[-1][1])
  6.  
Output:

>>> [[1, 'dcurtin'], [2, 'mwoelk'], [4, 'noone']]
The user that logged in the most times is noone.
>>>
Feb 22 '08 #4
alexphd
19
I actually down something very similar. Except I wanted to display the top three uses. So, I reversed the sort and sliced the list. Below is my code.

Expand|Select|Wrap|Line Numbers
  1. freqList = [[len(logdict[key]), key] for key in logdict]
  2. freqList.sort(reverse=True)
  3.  
  4. print freqList
  5. # print 'The user that logged in the most times is %s.' % (freqList[-2][1])
  6.  
  7. print 'The user that logged in the most times is %s.' % (freqList[1:4])
  8.  
Right now I'm working trying to display who logged in from a certain time frame. So, let's say I want to see you logged in from 8:00 to 10:00 and who logged in from 12:00 to 16:00. etc etc. Any idea on how that can be done?
Feb 22 '08 #5
bvdet
2,851 Expert Mod 2GB
I actually down something very similar. Except I wanted to display the top three uses. So, I reversed the sort and sliced the list. Below is my code.

Expand|Select|Wrap|Line Numbers
  1. freqList = [[len(logdict[key]), key] for key in logdict]
  2. freqList.sort(reverse=True)
  3.  
  4. print freqList
  5. # print 'The user that logged in the most times is %s.' % (freqList[-2][1])
  6.  
  7. print 'The user that logged in the most times is %s.' % (freqList[1:4])
  8.  
Right now I'm working trying to display who logged in from a certain time frame. So, let's say I want to see you logged in from 8:00 to 10:00 and who logged in from 12:00 to 16:00. etc etc. Any idea on how that can be done?
Check out the datetime module. It supports mathematical and comparison operations and is ideal for your application.
Feb 22 '08 #6
alexphd
19
I added how many users logged in that day now I want to narrow it down to how many users are logging in every three hours. So I did how many logged in for that day by doing what's below. And I was able to do it this way because the log file is only for a day.

Expand|Select|Wrap|Line Numbers
  1. count = 0
  2. for key in logdict:
  3.     count += 1
  4.  
  5. print '%s users logged in today' % (count)
  6.  
But I have having trouble doing the three hours. I tried the datetime module, but I cant figure it out. I tried to do something like this:
Expand|Select|Wrap|Line Numbers
  1. datetime.datetime.fromtimestamp(mod_time)
  2.  
What do you think?
Feb 22 '08 #7
bvdet
2,851 Expert Mod 2GB
Actually the time module can be used to compare time objects to see if a specific time falls in a range. Example:
Expand|Select|Wrap|Line Numbers
  1. import time
  2.  
  3. d1 = '01/Feb/2008:04:57:40 -0500'
  4. d2 = '01/Feb/2008:15:57:40 -0500'
  5.  
  6. def time_comp(upper, lower, d):
  7.     # upper and lower format %H:%M:%S
  8.     tu = time.strptime(upper, '%H:%M:%S')
  9.     tl = time.strptime(lower, '%H:%M:%S')
  10.     # parse d
  11.     # example string: '01/Feb/2008:04:57:40 -0500'
  12.     tm = time.strptime(d.split()[0].split(':',1)[1], '%H:%M:%S')
  13.     if tl <= tm <= tu:
  14.         return True
  15.     return False
  16.  
  17. print time_comp('16:00:00', '10:00:00', d1)
  18. print time_comp('16:00:00', '10:00:00', d2)
  19.  
  20. if time_comp('16:00:00', '10:00:00', d2):
  21.     print 'User logged in during the target time.'
  22. else:
  23.     print 'Out of range'
  24.  
  25. if time_comp('16:00:00', '10:00:00', d1):
  26.     print 'User logged in during the target time.'
  27. else:
  28.     print 'Out of range'
Output:

>>> False
True
User logged in during the target time.
Out of range
>>>
Feb 22 '08 #8
alexphd
19
I got another output printed out None. Do you know where that comes from?

Also, how can I make that work for my whole txt file?
Feb 23 '08 #9
bvdet
2,851 Expert Mod 2GB
I got another output printed out None. Do you know where that comes from?

Also, how can I make that work for my whole txt file?
I don't know what output you are referring to. What do you want to do to your whole txt file?
Feb 23 '08 #10
alexphd
19
I get this output when I run it.

"""
None
True
Out of range
User logged in during the target time.
"""

What I want it to do is to be able to interpret the whole text file not just d1, and d2. And I also want to to display how many users logged in during that target time. I think I could do that with just a for loop.

I hope that helps clarify what I was trying to say. Thanks for your help.
Feb 23 '08 #11
bvdet
2,851 Expert Mod 2GB
I get this output when I run it.

"""
None
True
Out of range
User logged in during the target time.
"""

What I want it to do is to be able to interpret the whole text file not just d1, and d2. And I also want to to display how many users logged in during that target time. I think I could do that with just a for loop.

I hope that helps clarify what I was trying to say. Thanks for your help.
You must be running the code I suggested. I'm not sure why your output is None (it should be False). My intent was not to provide you with a solution, but to give you a function for comparing dates so you can implement your own solution. You need to put some effort into it. I am not here to write your program for you.
Feb 23 '08 #12
alexphd
19
Sorry about that I was not trying to make you write my whole program there was just some parts I was confused about. I think I figured it out though. I actually ended up using the datetime module and using a for loop to count how many user were in the target time. I got a few errors because I was trying to use a list in the time function that you gave me.

Thanks, for all your help again I really appreciate. I just started to learn python and I'm sorry for asking so many questions.


Thanks again.
Feb 23 '08 #13
bvdet
2,851 Expert Mod 2GB
Sorry about that I was not trying to make you write my whole program there was just some parts I was confused about. I think I figured it out though. I actually ended up using the datetime module and using a for loop to count how many user were in the target time. I got a few errors because I was trying to use a list in the time function that you gave me.

Thanks, for all your help again I really appreciate. I just started to learn python and I'm sorry for asking so many questions.


Thanks again.
Sorry if I misunderstood you. I appreciate someone putting forth effort and getting results. It looks like you are getting there. Please ask if you have questions.
Feb 23 '08 #14

Sign in to post your reply or Sign up for a free account.

Similar topics

8
by: David Horsman | last post by:
I have a report that lists File-B. My macro runs this report using a query as a filter. The query uses two files, the parent File-B and with a 0-many relationship to File-C. The query selects...
2
by: Henry | last post by:
I am trying to write an application that allows me to dynamically load selected Crystal Report files, read the parameters from the report file, give the user a chance to enter relevant data in a...
13
by: salad | last post by:
Hi Guys: I was stuck. I needed to send a report to a file. My beautiful report(s) in Access were going to require loss of formatting with RTFs, a PITA in WordMailMerge, sending it as a text...
0
by: smkkaleem | last post by:
I am stuck with the error I have posted above in the question title I am developing ASP.NET 2.0 web site and I have added a new rdlc file to my project by using the following process: -Right...
4
by: somanyusernamesaretakenal | last post by:
What I am trying to achieve: Basically I have generated a report in access. This report needs to be updated using excel. (Updating the new data, not changing existing data) What I did was I...
1
by: Sport Girl | last post by:
Hi everybody , i have the task of developing in Perl a script that retrieves data from 3 tables ( bugs, profiles, products) from a MySQL database called bugs and display them in an excel sheet...
3
by: jambonjamasb | last post by:
Hi I have two tables: email_tbl Data_table Data table is is used to create a Form Data_form
2
by: =?Utf-8?B?UmljaA==?= | last post by:
On my development machine where I have Visual Studio 2005 loaded, I have an app that uses the Report control. I can view data in report format with the Report control -- the data renders OK in the...
12
by: Studiotyphoon | last post by:
Hi, I have report which I need to print 3 times, but would like to have the following headings Customer Copy - Print 1 Accounts Copy - Print 2 File Copy -Print 3 I created a macro to...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.