from each line separate out url and request parts. split the request into key-value pairs, use urllib to unquote key-value pairs......as show below...
import urllib
line = "GET /stat.gif?stat=v &c=F-Secure&v=1.1%20 Build%2014231&s =av%7BNorton%20 360%20%28Symant ec%20Corporatio n%29+69%3B%7Dsw %7BNorton%20360 %20%28Symantec% 20Corporation%2 9+69%3B%7Dfw%7B Norton%20360%20 %28Symantec%20C orporation%29+5 %3B%7Dv%7BMicro soft%20Windows% 20XP+insecure%3 BMicrosoft%20Wi ndows%20XP%20Pr ofessional+f%3B 26027%3B26447%3 B26003%3B22452% 3B%7D&r=0.9496 HTTP/1.1"
words = line.split()
for word in words:
if word.find('?') >= 0:
req = word[word.find('?') + 1:]
kwds = req.split('&')
for kv in kwds:
print urllib.unquote( kv)
stat=v
c=F-Secure
v=1.1 Build 14231
s=av{Norton 360 (Symantec Corporation)+69 ;}sw{Norton 360 (Symantec Corporation)+69 ;}fw{Norton 360 (Symantec Corporation)+5; }v{Microsoft Windows XP+insecure;Mic rosoft Windows XP Professional+f; 26027;26447;260 03;22452;}
r=0.9496
good luck
Edwin
-----Original Message-----
From: py************* *************** *************** *******@python. org
[mailto:py****** *************** *************** **************@ python.org]
On Behalf Of jo*********@goo glemail.com
Sent: Saturday, August 09, 2008 10:48 AM
To: py*********@pyt hon.org
Subject: Extract string from log file
203.114.10.66 - - [01/Aug/2008:05:41:21 +0300] "GET /stat.gif?
stat=v&c=F-Secure&v=1.1%20 Build%2014231&s =av%7BNorton
%20360%20%28Sym antec%20Corpora tion%29+69%3B%7 Dsw%7BNorton
%20360%20%28Sym antec%20Corpora tion%29+69%3B%7 Dfw%7BNorton
%20360%20%28Sym antec%20Corpora tion%29+5%3B%7D v%7BMicrosoft%2 0Windows
%20XP+insecure% 3BMicrosoft%20W indows%20XP%20P rofessional+f
%3B26027%3B2644 7%3B26003%3B224 52%3B%7D&r=0.94 96 HTTP/1.1" 200 43
"http://dfstage1.f-secure.com/fshc/1.1/release/devbw/1.1.14231/
card.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
SV1; .NET CLR 2.0.50727)"
does anyone know how can i extract certain string from this log file
using regular expression in python or using XML. can teach me.
--
http://mail.python.org/mailman/listinfo/python-list
The information contained in this message and any attachment may be
proprietary, confidential, and privileged or subject to the work
product doctrine and thus protected from disclosure. If the reader
of this message is not the intended recipient, or an employee or
agent responsible for delivering this message to the intended
recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify me
immediately by replying to this message and deleting it and all
copies and backups thereof. Thank you.