By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,964 Members | 1,997 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,964 IT Pros & Developers. It's quick & easy.

Help on Email Parsing

P: n/a
Hey,
I have been trying to parse emails:
But I could not find any examples or snippets of
parsing emails in python from the documentation.
Google did not help me much too.
I am trying to understand the module 'email' and the
functions described there to parse email but seems
difficult.
Can anyone help me in locating some pointers or
snippets on this issue.
Thanks a Ton
Dont

__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools

Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On Mon, 23 Feb 2004 00:47:17 -0800, dont bother wrote:
I have been trying to parse emails:

But I could not find any examples or snippets of parsing emails in
python from the documentation.


Here is a simple program (a bit of a hack) I wrote to count the number of
messages in a mailbox in each day (used for counting spams). It may be of
some use to you, although I don't actually parse the message itself, and
only the headers.

Jeremy

# Released under the GPL (version 2 or greater)
# Copyright (C) 2003 Jeremy Sanders

import mailbox
import string
import email
import email.Utils
import time
import sys

# open passed mailbox filename
# (yes - we need checking of this)
fp = open(sys.argv[1], 'r')

# open mailbox from file
mbox = mailbox.PortableUnixMailbox(fp)

secsinday = 86400
counts = {}

# get current time
nowtime = time.time()

# iterate over mail messages
while 1:
# get next message
msg = mbox.next()
# exit if we've looked at the last one
if msg == None:
break

# get received header
received = msg.get('received')
# skip messages with no received header
if received == None:
continue

# get unix time of email
date_rfind = string.rfind(received, ';')
date = received[date_rfind+1:]
pd = email.Utils.parsedate( string.strip(date) )

# skip messages we can't parse the date on
if pd == None:
continue

# get time between now and received date in message
unixtime = time.mktime(pd)
day = int( (unixtime-nowtime) / secsinday)

# increment counter for day
# (using a dict allows us to parse the messages only once)
if not day in counts:
counts[day] = 0
counts[day] += 1

# sort days into numerical order
daylist = counts.keys()
daylist.sort()

# print out counts
for d in daylist:
print d, counts[d]
Jul 18 '05 #2

P: n/a
dont bother wrote:
Hey,
I have been trying to parse emails:
But I could not find any examples or snippets of
parsing emails in python from the documentation.
Google did not help me much too.
I am trying to understand the module 'email' and the
functions described there to parse email but seems
difficult.
Can anyone help me in locating some pointers or
snippets on this issue.


this script will extract one or more images
from an email message given as argument

hope this helps.

"""Extracts all images from given rfc822-compliant email message.
A quick hack by deelan

python extract.py filename
"""

# good MIME's
mimes = 'image/gif', 'image/jpeg', 'image/png'

import email

def main(filename):
f = file(filename, 'r')
m = email.message_from_file(f)
f.close()

# loop thru message body and look for JPEG, GIF and PNG images
images = [(part.get_filename(), part.get_payload(decode=True))
for part in m.get_payload() if part.get_type() in mimes]

for name, data in images:
print 'writing', name, '...'
f = file(name, 'wb')
f.write(data)
f.close()

print 'done %d image(s).' % len(images)

if __name__ == '__main__':
import sys
if len(sys.argv) > 1:
main(sys.argv[1])
else:
print __doc__

--
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<#me> a foaf:Person ; foaf:nick "deelan" ;
foaf:weblog <http://www.deelan.com/> .
Jul 18 '05 #3

P: n/a

"dont bother" <do*************@yahoo.com> wrote in message
news:ma**************************************@pyth on.org...
Hey,
I have been trying to parse emails:
But I could not find any examples or snippets of
parsing emails in python from the documentation.
Google did not help me much too.
I am trying to understand the module 'email' and the
functions described there to parse email but seems
difficult.
Can anyone help me in locating some pointers or
snippets on this issue.
Thanks a Ton
Dont
You may want to study the MIME format a
bit first. It's not a particularly simple format.

The final example in the email documentation
seems to be fairly straightforward. The line:

msg = email.message_from_file(fp)

does everything and leaves the result in
memory as objects.

Of course, this is the *new* email package
that is in 2.2.3 and later. I don't believe the
old one was particularly easy to work with.

John Roth

..
__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools

Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.