Hi,
we had some problems in the last weeks with our mailserver.
Some messages were not delivered and we wanted to know why.
But looking through the logfile is a time consuming process.
So I wanted to write a parser to analyse the logs and parse them as XML.
But I have never written a parser before and know I'm sitting in front
of the logfile trying to write the grammar for pyparsing.
First of all I need to know if it is possible to parse that kind of info
into XML.
Here is an excerpt of the logfile lines I'm interested in:
Sep 18 04:15:22 mailrelay postfix/cleanup[12103]: 755387301:
message-id=<20*************************@mforward2.dtag.de>
Sep 18 04:15:22 mailrelay spamd[1364]: spamd: processing message
<20*************************@mforward2.dtag.defo r nobody:65534
Sep 18 04:15:25 mailrelay spamd[1364]: spamd: result: Y 15 -
BAYES_99,DATE_IN_PAST_03_06,DNS_FROM_RFC_ABUSE,DNS _FROM_RFC_DSN,DNS_FROM_RFC_POST,DNS_FROM_RFC_WHOIS ,FORGED_MUA_OUTLOOK,SPF_SOFTFAIL
scantime=3.1,size=8086,user=nobody,uid=65534,requi red_score=5.0,rhost=localhost,raddr=127.0.0.1,rpor t=55277,mid=<20*************************@mforward2 .dtag.de>,bayes=1,autolearn=no
Sep 18 04:15:25 mailrelay postfix/cleanup[12074]: DA1431965E:
message-id=<20*************************@mforward2.dtag.de>
Sep 18 04:15:26 mailrelay postfix/cleanup[13057]: EF90720AD:
message-id=<20*************************@mforward2.dtag.de>
Sep 18 04:15:26 mailrelay postfix/smtp[10879]: EF90720AD:
to=<SP********@OUR-MAILSERVER.mail.com>, relay=10.49.0.7[10.49.0.7],
delay=1, status=sent (250 2.6.0
<20*************************@mforward2.dtag.deQueu ed mail for delivery)
They are filtered by "message-id", so all these lines above have
something to do with the message
"20*************************@mforward2.dtag.de ".
The original logfile is about 25 MB big, so I can't post all of the
lines of course ;-)
Looking at these lines I realized that there are "Queue IDs":
755387301
DA1431965E
EF90720AD
Filtering the log for these IDs results in the following lines:
Sep 18 02:15:11 mailrelay postfix/smtpd[10841]: 755387301:
client=unknown[194.25.242.123]
Sep 18 04:15:22 mailrelay postfix/cleanup[12103]: 755387301:
message-id=<20*************************@mforward2.dtag.de>
Sep 18 04:15:22 mailrelay postfix/qmgr[11082]: 755387301:
from=<se****@mail.net.mx>, size=8152, nrcpt=7 (queue active)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301:
to=<re*******@mail.com>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301:
to=<re*******@mail.com>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301:
to=<re*******@mail.com>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301:
to=<re*******@mail.com>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/qmgr[11082]: 755387301: removed
Sep 18 04:15:25 mailrelay postfix/pickup[13175]: DA1431965E: uid=65534
from=<nobody>
Sep 18 04:15:25 mailrelay postfix/cleanup[12074]: DA1431965E:
message-id=<20*************************@mforward2.dtag.de>
Sep 18 04:15:25 mailrelay postfix/qmgr[11082]: DA1431965E:
from=<no****@OUR-MAILSERVER.mail.com>, size=11074, nrcpt=1 (queue active)
Sep 18 04:15:26 mailrelay postfix/smtp[11703]: DA1431965E:
to=<SP********@OUR-MAILSERVER.mail.com>, relay=localhost[127.0.0.1],
delay=1, status=sent (250 Ok: queued as EF90720AD)
Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: DA1431965E: removed
Sep 18 04:15:25 mailrelay postfix/smtpd[11704]: EF90720AD:
client=localhost[127.0.0.1]
Sep 18 04:15:26 mailrelay postfix/cleanup[13057]: EF90720AD:
message-id=<20*************************@mforward2.dtag.de>
Sep 18 04:15:26 mailrelay postfix/smtp[11703]: DA1431965E:
to=<SP********@OUR-MAILSERVER.mail.com>, relay=localhost[127.0.0.1],
delay=1, status=sent (250 Ok: queued as EF90720AD)
Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: EF90720AD:
from=<no****@OUR-MAILSERVER.mail.com>, size=11263, nrcpt=1 (queue active)
Sep 18 04:15:26 mailrelay postfix/smtp[10879]: EF90720AD:
to=<SP********@OUR-MAILSERVER.mail.com>, relay=10.49.0.7[10.49.0.7],
delay=1, status=sent (250 2.6.0
<20*************************@mforward2.dtag.deQueu ed mail for delivery)
Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: EF90720AD: removed
All this work is done with command line and grep...
Is it possible to parse this big logfile only ONCE and extract all this
info into XML?
Like this:
<message id="20*************************@mforward2.dtag.de" >
<timestamp>Sep 18 04:15:26</timestamp>
<from>se****@mail.net.mx</from>
<to>re*******@mail.com</to>
<to>re*******@mail.com</to>
<to>re*******@mail.com</to>
<to>re*******@mail.com</to>
<queueID>EF90720AD</queueID>
<queueID>DA1431965E</queueID>
<queueID>755387301</queueID>
<spamd>
<score>15</score>
<filtered>yes</filtered>
<sendto>SP********@OUR-MAILSERVER.mail.com</sendto>
</spamd>
</message>
The goal of this is to provide a web interface were we can see if the
messages were filtered as spam (or deleted by our virus scanner).
Is it possible? Or do I have to scan / parse the file more than once?
Andi
--
Mozilla Thunderbird 1.5.0.7
Arch Linux