Hi,
we had some problems in the last weeks with our mailserver.
Some messages were not delivered and we wanted to know why.
But looking through the logfile is a time consuming process.
So I wanted to write a parser to analyse the logs and parse them as XML.
But I have never written a parser before and know I'm sitting in front
of the logfile trying to write the grammar for pyparsing.
First of all I need to know if it is possible to parse that kind of info
into XML.
Here is an excerpt of the logfile lines I'm interested in:
Sep 18 04:15:22 mailrelay postfix/cleanup[12103]: 755387301:
message-id=<20********* *************** *@mforward2.dta g.de>
Sep 18 04:15:22 mailrelay spamd[1364]: spamd: processing message
<20************ *************@m forward2.dtag.d efor nobody:65534
Sep 18 04:15:25 mailrelay spamd[1364]: spamd: result: Y 15 -
BAYES_99,DATE_I N_PAST_03_06,DN S_FROM_RFC_ABUS E,DNS_FROM_RFC_ DSN,DNS_FROM_RF C_POST,DNS_FROM _RFC_WHOIS,FORG ED_MUA_OUTLOOK, SPF_SOFTFAIL
scantime=3.1,si ze=8086,user=no body,uid=65534, required_score= 5.0,rhost=local host,raddr=127. 0.0.1,rport=552 77,mid=<20***** *************** *****@mforward2 .dtag.de>,bayes =1,autolearn=no
Sep 18 04:15:25 mailrelay postfix/cleanup[12074]: DA1431965E:
message-id=<20********* *************** *@mforward2.dta g.de>
Sep 18 04:15:26 mailrelay postfix/cleanup[13057]: EF90720AD:
message-id=<20********* *************** *@mforward2.dta g.de>
Sep 18 04:15:26 mailrelay postfix/smtp[10879]: EF90720AD:
to=<SP********@ OUR-MAILSERVER.mail .com>, relay=10.49.0.7[10.49.0.7],
delay=1, status=sent (250 2.6.0
<20************ *************@m forward2.dtag.d eQueued mail for delivery)
They are filtered by "message-id", so all these lines above have
something to do with the message
"20************ *************@m forward2.dtag.d e".
The original logfile is about 25 MB big, so I can't post all of the
lines of course ;-)
Looking at these lines I realized that there are "Queue IDs":
755387301
DA1431965E
EF90720AD
Filtering the log for these IDs results in the following lines:
Sep 18 02:15:11 mailrelay postfix/smtpd[10841]: 755387301:
client=unknown[194.25.242.123]
Sep 18 04:15:22 mailrelay postfix/cleanup[12103]: 755387301:
message-id=<20********* *************** *@mforward2.dta g.de>
Sep 18 04:15:22 mailrelay postfix/qmgr[11082]: 755387301:
from=<se****@ma il.net.mx>, size=8152, nrcpt=7 (queue active)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301:
to=<re*******@m ail.com>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301:
to=<re*******@m ail.com>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301:
to=<re*******@m ail.com>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301:
to=<re*******@m ail.com>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/qmgr[11082]: 755387301: removed
Sep 18 04:15:25 mailrelay postfix/pickup[13175]: DA1431965E: uid=65534
from=<nobody>
Sep 18 04:15:25 mailrelay postfix/cleanup[12074]: DA1431965E:
message-id=<20********* *************** *@mforward2.dta g.de>
Sep 18 04:15:25 mailrelay postfix/qmgr[11082]: DA1431965E:
from=<no****@OU R-MAILSERVER.mail .com>, size=11074, nrcpt=1 (queue active)
Sep 18 04:15:26 mailrelay postfix/smtp[11703]: DA1431965E:
to=<SP********@ OUR-MAILSERVER.mail .com>, relay=localhost[127.0.0.1],
delay=1, status=sent (250 Ok: queued as EF90720AD)
Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: DA1431965E: removed
Sep 18 04:15:25 mailrelay postfix/smtpd[11704]: EF90720AD:
client=localhos t[127.0.0.1]
Sep 18 04:15:26 mailrelay postfix/cleanup[13057]: EF90720AD:
message-id=<20********* *************** *@mforward2.dta g.de>
Sep 18 04:15:26 mailrelay postfix/smtp[11703]: DA1431965E:
to=<SP********@ OUR-MAILSERVER.mail .com>, relay=localhost[127.0.0.1],
delay=1, status=sent (250 Ok: queued as EF90720AD)
Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: EF90720AD:
from=<no****@OU R-MAILSERVER.mail .com>, size=11263, nrcpt=1 (queue active)
Sep 18 04:15:26 mailrelay postfix/smtp[10879]: EF90720AD:
to=<SP********@ OUR-MAILSERVER.mail .com>, relay=10.49.0.7[10.49.0.7],
delay=1, status=sent (250 2.6.0
<20************ *************@m forward2.dtag.d eQueued mail for delivery)
Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: EF90720AD: removed
All this work is done with command line and grep...
Is it possible to parse this big logfile only ONCE and extract all this
info into XML?
Like this:
<message id="20********* *************** *@mforward2.dta g.de">
<timestamp>Se p 18 04:15:26</timestamp>
<from>se****@ma il.net.mx</from>
<to>re*******@m ail.com</to>
<to>re*******@m ail.com</to>
<to>re*******@m ail.com</to>
<to>re*******@m ail.com</to>
<queueID>EF9072 0AD</queueID>
<queueID>DA1431 965E</queueID>
<queueID>755387 301</queueID>
<spamd>
<score>15</score>
<filtered>yes </filtered>
<sendto>SP***** ***@OUR-MAILSERVER.mail .com</sendto>
</spamd>
</message>
The goal of this is to provide a web interface were we can see if the
messages were filtered as spam (or deleted by our virus scanner).
Is it possible? Or do I have to scan / parse the file more than once?
Andi
--
Mozilla Thunderbird 1.5.0.7
Arch Linux