469,156 Members | 2,161 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,156 developers. It's quick & easy.

RE: Extract string from log file

from each line separate out url and request parts. split the request into key-value pairs, use urllib to unquote key-value pairs......as show below...

import urllib
line = "GET /stat.gif?stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton%20360%2 0%28Symantec%20Corporation%29+69%3B%7Dsw%7BNorton% 20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7B Norton%20360%20%28Symantec%20Corporation%29+5%3B%7 Dv%7BMicrosoft%20Windows%20XP+insecure%3BMicrosoft %20Windows%20XP%20Professional+f%3B26027%3B26447%3 B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1"
words = line.split()
for word in words:
if word.find('?') >= 0:
req = word[word.find('?') + 1:]
kwds = req.split('&')
for kv in kwds:
print urllib.unquote(kv)
stat=v
c=F-Secure
v=1.1 Build 14231
s=av{Norton 360 (Symantec Corporation)+69;}sw{Norton 360 (Symantec Corporation)+69;}fw{Norton 360 (Symantec Corporation)+5;}v{Microsoft Windows XP+insecure;Microsoft Windows XP Professional+f;26027;26447;26003;22452;}
r=0.9496

good luck
Edwin

-----Original Message-----
From: py************************************************ **@python.org
[mailto:py***************************************** *********@python.org]
On Behalf Of jo*********@googlemail.com
Sent: Saturday, August 09, 2008 10:48 AM
To: py*********@python.org
Subject: Extract string from log file
203.114.10.66 - - [01/Aug/2008:05:41:21 +0300] "GET /stat.gif?
stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton
%20360%20%28Symantec%20Corporation%29+69%3B%7Dsw%7 BNorton
%20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7 BNorton
%20360%20%28Symantec%20Corporation%29+5%3B%7Dv%7BM icrosoft%20Windows
%20XP+insecure%3BMicrosoft%20Windows%20XP%20Profes sional+f
%3B26027%3B26447%3B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1" 200 43
"http://dfstage1.f-secure.com/fshc/1.1/release/devbw/1.1.14231/
card.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
SV1; .NET CLR 2.0.50727)"

does anyone know how can i extract certain string from this log file
using regular expression in python or using XML. can teach me.
--
http://mail.python.org/mailman/listinfo/python-list
The information contained in this message and any attachment may be
proprietary, confidential, and privileged or subject to the work
product doctrine and thus protected from disclosure. If the reader
of this message is not the intended recipient, or an employee or
agent responsible for delivering this message to the intended
recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify me
immediately by replying to this message and deleting it and all
copies and backups thereof. Thank you.
Aug 9 '08 #1
1 2458
On Aug 9, 11:22*pm, Edwin.Mad...@VerizonWireless.com wrote:
from each line separate out url and request parts. split the request intokey-value pairs, use urllib to unquote key-value pairs......as show below....

import urllib
line = "GET /stat.gif?stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton%20360%2 0%28Symantec%20Corporation%29+69%3B%7Dsw%7BNorton% 20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7B Norton%20360%20%28Symantec%20Corporation%29+5%3B%7 Dv%7BMicrosoft%20Windows%20XP+insecure%3BMicrosoft %20Windows%20XP%20Professional+f%3B26027%3B26447%3 B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1"
words = line.split()
for word in words:
if word.find('?') >= 0:
* * * * req = word[word.find('?') + 1:]
* * * kwds = req.split('&')
* * * for kv in kwds:
* * * * print urllib.unquote(kv)

stat=v
c=F-Secure
v=1.1 Build 14231
s=av{Norton 360 (Symantec Corporation)+69;}sw{Norton 360 (Symantec Corporation)+69;}fw{Norton 360 (Symantec Corporation)+5;}v{Microsoft Windows XP+insecure;Microsoft Windows XP Professional+f;26027;26447;26003;22452;}
r=0.9496

good luck
Edwin

-----Original Message-----
From: python-list-bounces+edwin.madari=verizonwireless....@python.or g

[mailto:python-list-bounces+edwin.madari=verizonwireless....@python.or g]
On Behalf Of josephty...@googlemail.com
Sent: Saturday, August 09, 2008 10:48 AM
To: python-l...@python.org
Subject: Extract string from log file

203.114.10.66 - - [01/Aug/2008:05:41:21 +0300] "GET /stat.gif?
stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton
%20360%20%28Symantec%20Corporation%29+69%3B%7Dsw%7 BNorton
%20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7 BNorton
%20360%20%28Symantec%20Corporation%29+5%3B%7Dv%7BM icrosoft%20Windows
%20XP+insecure%3BMicrosoft%20Windows%20XP%20Profes sional+f
%3B26027%3B26447%3B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1" 200 43
"http://dfstage1.f-secure.com/fshc/1.1/release/devbw/1.1.14231/
card.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
SV1; .NET CLR 2.0.50727)"

does anyone know how can i extract certain string from this log file
using regular expression in python or using XML. can teach me.
--http://mail.python.org/mailman/listinfo/python-list

The information contained in this message and any attachment may be
proprietary, confidential, and privileged or subject to the work
product doctrine and thus protected from disclosure. *If the reader
of this message is not the intended recipient, or an employee or
agent responsible for delivering this message to the intended
recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify me
immediately by replying to this message and deleting it and all
copies and backups thereof. *Thank you.

do you mind to explain further. based on the source code that you gave
me. what will it output. i wonder. Sorry i am new to string
extraction. i do understand your python coding. the only thing i don't
understand is this part.
for word in words:
if word.find('?') >= 0:
req = word[word.find('?') + 1:]
kwds = req.split('&')
for kv in kwds:
print urllib.unquote(kv)

what does this code do?
anyway, is this code automatic. what i mean is can it extract the
string everytime when a new log file is being output by the sever?
Aug 9 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by Sharon | last post: by
6 posts views Thread by Mohammad-Reza | last post: by
8 posts views Thread by nick | last post: by
5 posts views Thread by deko | last post: by
7 posts views Thread by erikcw | last post: by
reply views Thread by napolpie | last post: by
5 posts views Thread by Steve | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.