By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,997 Members | 1,260 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,997 IT Pros & Developers. It's quick & easy.

RE: Extract string from log file

P: n/a
from each line separate out url and request parts. split the request into key-value pairs, use urllib to unquote key-value pairs......as show below...

import urllib
line = "GET /stat.gif?stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton%20360%2 0%28Symantec%20Corporation%29+69%3B%7Dsw%7BNorton% 20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7B Norton%20360%20%28Symantec%20Corporation%29+5%3B%7 Dv%7BMicrosoft%20Windows%20XP+insecure%3BMicrosoft %20Windows%20XP%20Professional+f%3B26027%3B26447%3 B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1"
words = line.split()
for word in words:
if word.find('?') >= 0:
req = word[word.find('?') + 1:]
kwds = req.split('&')
for kv in kwds:
print urllib.unquote(kv)
stat=v
c=F-Secure
v=1.1 Build 14231
s=av{Norton 360 (Symantec Corporation)+69;}sw{Norton 360 (Symantec Corporation)+69;}fw{Norton 360 (Symantec Corporation)+5;}v{Microsoft Windows XP+insecure;Microsoft Windows XP Professional+f;26027;26447;26003;22452;}
r=0.9496

good luck
Edwin

-----Original Message-----
From: py************************************************ **@python.org
[mailto:py***************************************** *********@python.org]
On Behalf Of jo*********@googlemail.com
Sent: Saturday, August 09, 2008 10:48 AM
To: py*********@python.org
Subject: Extract string from log file
203.114.10.66 - - [01/Aug/2008:05:41:21 +0300] "GET /stat.gif?
stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton
%20360%20%28Symantec%20Corporation%29+69%3B%7Dsw%7 BNorton
%20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7 BNorton
%20360%20%28Symantec%20Corporation%29+5%3B%7Dv%7BM icrosoft%20Windows
%20XP+insecure%3BMicrosoft%20Windows%20XP%20Profes sional+f
%3B26027%3B26447%3B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1" 200 43
"http://dfstage1.f-secure.com/fshc/1.1/release/devbw/1.1.14231/
card.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
SV1; .NET CLR 2.0.50727)"

does anyone know how can i extract certain string from this log file
using regular expression in python or using XML. can teach me.
--
http://mail.python.org/mailman/listinfo/python-list
The information contained in this message and any attachment may be
proprietary, confidential, and privileged or subject to the work
product doctrine and thus protected from disclosure. If the reader
of this message is not the intended recipient, or an employee or
agent responsible for delivering this message to the intended
recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify me
immediately by replying to this message and deleting it and all
copies and backups thereof. Thank you.
Aug 9 '08 #1
Share this Question
Share on Google+
1 Reply


P: n/a
On Aug 9, 11:22*pm, Edwin.Mad...@VerizonWireless.com wrote:
from each line separate out url and request parts. split the request intokey-value pairs, use urllib to unquote key-value pairs......as show below....

import urllib
line = "GET /stat.gif?stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton%20360%2 0%28Symantec%20Corporation%29+69%3B%7Dsw%7BNorton% 20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7B Norton%20360%20%28Symantec%20Corporation%29+5%3B%7 Dv%7BMicrosoft%20Windows%20XP+insecure%3BMicrosoft %20Windows%20XP%20Professional+f%3B26027%3B26447%3 B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1"
words = line.split()
for word in words:
if word.find('?') >= 0:
* * * * req = word[word.find('?') + 1:]
* * * kwds = req.split('&')
* * * for kv in kwds:
* * * * print urllib.unquote(kv)

stat=v
c=F-Secure
v=1.1 Build 14231
s=av{Norton 360 (Symantec Corporation)+69;}sw{Norton 360 (Symantec Corporation)+69;}fw{Norton 360 (Symantec Corporation)+5;}v{Microsoft Windows XP+insecure;Microsoft Windows XP Professional+f;26027;26447;26003;22452;}
r=0.9496

good luck
Edwin

-----Original Message-----
From: python-list-bounces+edwin.madari=verizonwireless....@python.or g

[mailto:python-list-bounces+edwin.madari=verizonwireless....@python.or g]
On Behalf Of josephty...@googlemail.com
Sent: Saturday, August 09, 2008 10:48 AM
To: python-l...@python.org
Subject: Extract string from log file

203.114.10.66 - - [01/Aug/2008:05:41:21 +0300] "GET /stat.gif?
stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton
%20360%20%28Symantec%20Corporation%29+69%3B%7Dsw%7 BNorton
%20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7 BNorton
%20360%20%28Symantec%20Corporation%29+5%3B%7Dv%7BM icrosoft%20Windows
%20XP+insecure%3BMicrosoft%20Windows%20XP%20Profes sional+f
%3B26027%3B26447%3B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1" 200 43
"http://dfstage1.f-secure.com/fshc/1.1/release/devbw/1.1.14231/
card.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
SV1; .NET CLR 2.0.50727)"

does anyone know how can i extract certain string from this log file
using regular expression in python or using XML. can teach me.
--http://mail.python.org/mailman/listinfo/python-list

The information contained in this message and any attachment may be
proprietary, confidential, and privileged or subject to the work
product doctrine and thus protected from disclosure. *If the reader
of this message is not the intended recipient, or an employee or
agent responsible for delivering this message to the intended
recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify me
immediately by replying to this message and deleting it and all
copies and backups thereof. *Thank you.

do you mind to explain further. based on the source code that you gave
me. what will it output. i wonder. Sorry i am new to string
extraction. i do understand your python coding. the only thing i don't
understand is this part.
for word in words:
if word.find('?') >= 0:
req = word[word.find('?') + 1:]
kwds = req.split('&')
for kv in kwds:
print urllib.unquote(kv)

what does this code do?
anyway, is this code automatic. what i mean is can it extract the
string everytime when a new log file is being output by the sever?
Aug 9 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.