By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
426,115 Members | 894 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 426,115 IT Pros & Developers. It's quick & easy.

Extract zip file from email attachment

P: n/a
Hi all,

I'm trying to extract zip file (containing an xml file) from an email
so I can process it. But I'm running up against some brick walls.
I've been googling and reading all afternoon, and can't seem to figure
it out.

Here is what I have so far.

p = POP3("mail.server.com")
print p.getwelcome()
# authentication, etc.
print p.user("USER")
print p.pass_("PASS")
print "This mailbox has %d messages, totaling %d bytes." % p.stat()
msg_list = p.list()
print msg_list
if not msg_list[0].startswith('+OK'):
# Handle error
exit(1)

for msg in msg_list[1]:
msg_num, _ = msg.split()
resp = p.retr(msg_num)
if resp[0].startswith('+OK'):
#print resp, '=======================\n'
#extract message body and attachment.
parsed_msg = email.message_from_string('\n'.join(resp[1]))
payload= parsed_msg.get_payload(decode=True)
print payload #doesn't seem to work
else:
pass# Deal with error retrieving message.

How do I:
a) retrieve the body of the email into a string so I can do some
processing? (I can get at the header attributes without any trouble)
b) retrieve the zip file attachment, and unzip into a string for xml
processing?

Thanks so much for your help!
Erik

Apr 5 '07 #1
Share this Question
Share on Google+
7 Replies


P: n/a
erikcw wrote:
Hi all,

I'm trying to extract zip file (containing an xml file) from an email
so I can process it. But I'm running up against some brick walls.
I've been googling and reading all afternoon, and can't seem to figure
it out.

Here is what I have so far.

p = POP3("mail.server.com")
print p.getwelcome()
# authentication, etc.
print p.user("USER")
print p.pass_("PASS")
print "This mailbox has %d messages, totaling %d bytes." % p.stat()
msg_list = p.list()
print msg_list
if not msg_list[0].startswith('+OK'):
# Handle error
exit(1)

for msg in msg_list[1]:
msg_num, _ = msg.split()
resp = p.retr(msg_num)
if resp[0].startswith('+OK'):
#print resp, '=======================\n'
#extract message body and attachment.
parsed_msg = email.message_from_string('\n'.join(resp[1]))
payload= parsed_msg.get_payload(decode=True)
print payload #doesn't seem to work
else:
pass# Deal with error retrieving message.

How do I:
a) retrieve the body of the email into a string so I can do some
processing? (I can get at the header attributes without any trouble)
b) retrieve the zip file attachment, and unzip into a string for xml
processing?

Thanks so much for your help!
Erik
Hi,

some weeks ago I wrote some code to extract attachments from emails.
It's not that long, so maybe it could be of help for you:

-------------------------------------------

#!/usr/bin/env python

import poplib
import email
import os
import sys
import string

#
# attsave.py
# Check emails at PROVIDER for attachments and save them to SAVEDIR.
#

PROVIDER = "pop.YourMailProvider.de"
USER = "YourUserName"
PASSWORD = "YourPassword"

SAVEDIR = "/home/YourUserDirectory"
def saveAttachment(mstring):

filenames = []
attachedcontents = []

msg = email.message_from_string(mstring)

for part in msg.walk():

fn = part.get_filename()

if fn <None:
filenames.append(fn)
attachedcontents.append(part.get_payload())

for i in range(len(filenames)):
fp = file(SAVEDIR + "/" + filenames[i], "wb")
fp.write(attachedcontents[i])
print 'Found and saved attachment "' + filenames[i] + '".'
fp.close()

try:
client = poplib.POP3(PROVIDER)
except:
print "Error: Provider not found."
sys.exit(1)

client.user(USER)
client.pass_(PASSWORD)

anzahl_mails = len(client.list()[1])

for i in range(anzahl_mails):
lines = client.retr(i + 1)[1]
mailstring = string.join(lines, "\n")
saveAttachment(mailstring)

client.quit()

-------------------------------------------

See you

H.
Apr 6 '07 #2

P: n/a
On Apr 5, 8:00 pm, hlubenow <hluben...@gmx.netwrote:
erikcw wrote:
Hi all,
I'm trying to extract zip file (containing an xml file) from an email
so I can process it. But I'm running up against some brick walls.
I've been googling and reading all afternoon, and can't seem to figure
it out.
Here is what I have so far.
p = POP3("mail.server.com")
print p.getwelcome()
# authentication, etc.
print p.user("USER")
print p.pass_("PASS")
print "This mailbox has %d messages, totaling %d bytes." % p.stat()
msg_list = p.list()
print msg_list
if not msg_list[0].startswith('+OK'):
# Handle error
exit(1)
for msg in msg_list[1]:
msg_num, _ = msg.split()
resp = p.retr(msg_num)
if resp[0].startswith('+OK'):
#print resp, '=======================\n'
#extract message body and attachment.
parsed_msg = email.message_from_string('\n'.join(resp[1]))
payload= parsed_msg.get_payload(decode=True)
print payload #doesn't seem to work
else:
pass# Deal with error retrieving message.
How do I:
a) retrieve the body of the email into a string so I can do some
processing? (I can get at the header attributes without any trouble)
b) retrieve the zip file attachment, and unzip into a string for xml
processing?
Thanks so much for your help!
Erik

Hi,

some weeks ago I wrote some code to extract attachments from emails.
It's not that long, so maybe it could be of help for you:

-------------------------------------------

#!/usr/bin/env python

import poplib
import email
import os
import sys
import string

#
# attsave.py
# Check emails at PROVIDER for attachments and save them to SAVEDIR.
#

PROVIDER = "pop.YourMailProvider.de"
USER = "YourUserName"
PASSWORD = "YourPassword"

SAVEDIR = "/home/YourUserDirectory"

def saveAttachment(mstring):

filenames = []
attachedcontents = []

msg = email.message_from_string(mstring)

for part in msg.walk():

fn = part.get_filename()

if fn <None:
filenames.append(fn)
attachedcontents.append(part.get_payload())

for i in range(len(filenames)):
fp = file(SAVEDIR + "/" + filenames[i], "wb")
fp.write(attachedcontents[i])
print 'Found and saved attachment "' + filenames[i] + '".'
fp.close()

try:
client = poplib.POP3(PROVIDER)
except:
print "Error: Provider not found."
sys.exit(1)

client.user(USER)
client.pass_(PASSWORD)

anzahl_mails = len(client.list()[1])

for i in range(anzahl_mails):
lines = client.retr(i + 1)[1]
mailstring = string.join(lines, "\n")
saveAttachment(mailstring)

client.quit()

-------------------------------------------

See you

H.
Thanks H!

I'm now able to get the name of the zip file, and the contents (is it
still encoded?).

I now need to be able to unzip the zip file into a string and get the
body of the email into a string.

Here is my updated code:
p = POP3("mail.**********.com")
print p.getwelcome()
# authentication, etc.
print p.user("USER")
print p.pass_("PASS")
print "This mailbox has %d messages, totaling %d bytes." % p.stat()
msg_list = p.list()
print msg_list
if not msg_list[0].startswith('+OK'):
# Handle error in listings
exit(1)

for msg in msg_list[1]:
msg_num, _ = msg.split()
resp = p.retr(msg_num)
if resp[0].startswith('+OK'):
#print resp, '=======================\n'
parsed_msg = email.message_from_string('\n'.join(resp[1]))
for part in parsed_msg.walk():
fn = part.get_filename()
if fn <None:
fileObj = StringIO.StringIO()
fileObj.write( part.get_payload() )
#attachment = zlib.decompress(part.get_payload())
#print zipfile.is_zipfile(fileObj)
attachment = zipfile.ZipFile(fileObj)
print fn, '\n', attachment
payload= parsed_msg.get_payload(decode=True)
print payload

else:
pass# Deal with error retrieving message.
I get this error:
Traceback (most recent call last):
File "wa.py", line 208, in <module>
attachment = zipfile.ZipFile(fileObj)
File "/usr/lib/python2.5/zipfile.py", line 346, in __init__
self._GetContents()
File "/usr/lib/python2.5/zipfile.py", line 366, in _GetContents
self._RealGetContents()
File "/usr/lib/python2.5/zipfile.py", line 378, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file

Is the zip file still encoded? Or am I passing in the wrong arguments
to the zipfile module?

Thanks for your help!
Erik

Apr 6 '07 #3

P: n/a

erikcw wrote:

resp = p.retr(msg_num)
if resp[0].startswith('+OK'):
You don't have to check this; errors are transformed into exceptions.
fileObj = StringIO.StringIO()
cStringIO is faster
fileObj.write( part.get_payload() )
You have to reset the file pointer to the beginning: fileObj.seek(0),
else ZipFile will not be able to read the contents.

--
Gabriel Genellina

Apr 6 '07 #4

P: n/a
On Apr 6, 12:51 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
erikcw wrote:
resp = p.retr(msg_num)
if resp[0].startswith('+OK'):

You don't have to check this; errors are transformed into exceptions.
fileObj = StringIO.StringIO()

cStringIO is faster
fileObj.write( part.get_payload() )

You have to reset the file pointer to the beginning: fileObj.seek(0),
else ZipFile will not be able to read the contents.

--
Gabriel Genellina
Hi Gabriel,

I added fileObj.seek(0) on the line directly after
fileObj.write( part.get_payload() ) and I'm still getting the
following error.

Traceback (most recent call last):
File "wa.py", line 209, in <module>
attachment = zipfile.ZipFile(fileObj)
File "/usr/lib/python2.5/zipfile.py", line 346, in __init__
self._GetContents()
File "/usr/lib/python2.5/zipfile.py", line 366, in _GetContents
self._RealGetContents()
File "/usr/lib/python2.5/zipfile.py", line 378, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file

Could the file like object still be encoded in MIME or something?

Thanks!
Erik

Apr 6 '07 #5

P: n/a
>
Could the file like object still be encoded in MIME or something?
Yes it is. You don't need to seek(0).
Try this:

decoded = email.base64mime.decode(part.get_payload())
fileObj.write(decoded)
-Basilisk96

Apr 7 '07 #6

P: n/a

Basilisk96 wrote:

Could the file like object still be encoded in MIME or something?

Yes it is. You don't need to seek(0).
Try this:

decoded = email.base64mime.decode(part.get_payload())
fileObj.write(decoded)
-Basilisk96
Apr 7 '07 #7

P: n/a

Basilisk96 wrote:

Could the file like object still be encoded in MIME or something?

Yes it is. You don't need to seek(0).
Try this:

decoded = email.base64mime.decode(part.get_payload())
fileObj.write(decoded)
Or better:
decoded = part.get_payload(decode=True)
fileObj.write(decoded)
fileObj.seek(0)
zip = zipfile.ZipFile(fileObj)
zip.printdir()

Apr 7 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.