By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,638 Members | 2,204 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,638 IT Pros & Developers. It's quick & easy.

Simple log file parsing using Python

P: 3
I have just started learning Python and am trying to write a simple script to extract the IP address and the URL from a log file. The log file has around 600 entries and looks like this:
208.115.113.86 - - [08/Apr/2016:17:36:09 -0700] "GET /paper2003/0306AssElection.htm HTTP/1.1" 200 5551 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "www.redlug.com"

My coding example is like this:

f = open("log.txt", 'r')

urlpattern = r'(%ref)'
urls = {}

totalCount = 0

entries = f.readlines()
i = 0
while (i != len(entries)):

if (not re.search(r'^#', entries[i])):

totalCount = totalCount + 1
match = re.search(urlpattern, entries[i])
if (match):
url = match.group(1)
if (url in urls.keys()):
urls[url] = urls[url] + 1
else:
urls[url] = 1
i = i + 1
f.close()

I get an error code saying that the url is not defined. Can anyone verify where I an going wrong and help me out?
Thanks
Apr 17 '17 #1
Share this Question
Share on Google+
5 Replies


Expert 100+
P: 621
Can not tell exactly what is going on because there is no code tags/indentation, but what happens when there is no match (so no url is created)? Does the following if try to execute, or is it indented under the previous if?
Expand|Select|Wrap|Line Numbers
  1. if (match):    ## not found
  2.     url = match.group(1)  ## this line never executes
  3.  
  4.     ## indent this if, so it only executes if match
  5.     if (url in urls.keys()):
  6.     ## you can just use
  7.     ## if url in urls:
Apr 18 '17 #2

P: 3
Sorry about the script; it messed up when i posted it. What I need to do is extract a list of IP addresses and the URL they accessed.
Apr 18 '17 #3

Expert 100+
P: 621
Start with something simple, as you don't know what is/is not being found in the code posted.
Expand|Select|Wrap|Line Numbers
  1. for entry in entries:    
  2.     if urlpattern in entry:
  3.         print urlpattern, entry 
Also you can use collections' defaultdict, which adds the key if it is not in the dictionary. See "counter objects" at https://docs.python.org/2/library/collections.html
Apr 18 '17 #4

Expert 100+
P: 621
What I need to do is extract a list of IP addresses and the URL they accessed.
And perhaps I am making this too complicated. Is this what you want?
Expand|Select|Wrap|Line Numbers
  1. url='208.115.113.86 - - [08/Apr/2016:17:36:09 -0700] "GET /paper2003/0306AssElection.htm HTTP/1.1" 200 5551 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" "www.redlug.com"'
  2. split_url=url.split()
  3. print split_url[0], "-->", split_url[-1]  
Apr 18 '17 #5

P: 3
Thanks dwblas. Works well.
Apr 19 '17 #6

Post your reply

Sign in to post your reply or Sign up for a free account.