By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,179 Members | 1,175 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,179 IT Pros & Developers. It's quick & easy.

How can I get text of the body (payload) of an email?

P: n/a
Hello,

I need to get the text of the body (the payload) of an email.

As I understand it, an email has headers at the top, then a blank line,
then the body of the message.

I want to get the text of the body - every character from the new line
after the headers until the end of the message.

My objective is to do an SHA hash on the body text so the get_payload
method isn't what I am after.

Can anyone suggest a convenient way to get access to the raw message
payload?

Thanks in advance for your help.

Andrew Stuart

Jul 18 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Can anyone suggest a convenient way to get access to the raw message
payload?


body = message.split('\r\n\r\n', 1)[1]

- Josiah

Jul 18 '05 #2

P: n/a
"andrew blah" <an***********@xse.com.au> writes:
Can anyone suggest a convenient way to get access to the raw message
payload?


If you're using the mailbox module, the body text is what you get
from message.fp.read() where message is an rfc822 message object
from reading the mailbox. Is that what you wanted to know?
Jul 18 '05 #3

P: n/a
I'm puzzled. Josiah suggested that this would allow me to get the
payload of an email message.

body = message.split('\r\n\r\n', 1)[1]

As I understand it, the headers of an email are terminated by a blank
line, after which comes the message payload. A blank line being
represented by \r\n\r\n

After trying Josiah's above suggestion on many emails and failing to
get it to work, I found that in fact the following works:

self.raw_data.split('\n\n', 1)[0]

But this doesn't agree with my understanding of the RFC822 email
format, which is that the blank line should be represented by \r\n\r\n

Can anyone suggest where my understanding is wrong?
Thanks

Andrew Stuart

Jul 18 '05 #4

P: n/a
andrew blah wrote:
I want to get the text of the body - every character from the new line
after the headers until the end of the message.

My objective is to do an SHA hash on the body text so the get_payload
method isn't what I am after.


Funny, I recently undertook the same task. Here's my solution:

msg = email.message_from_string(foo)
x = sha.new()
for line in email.Iterators.body_line_iterator(msg):
x.update(line)
hash = x.digest()

This very cool iterator returns every body line, but skips all the headers,
including the headers present in each sub-part of the email. If you only
want plain text parts, you might combine this iterator with
email.Iterators.typed_subpart_iterator().

Jeffrey
Jul 18 '05 #5

P: n/a
> I'm puzzled. Josiah suggested that this would allow me to get the
payload of an email message.

body = message.split('\r\n\r\n', 1)[1]

As I understand it, the headers of an email are terminated by a blank
line, after which comes the message payload. A blank line being
represented by \r\n\r\n

After trying Josiah's above suggestion on many emails and failing to
get it to work, I found that in fact the following works:

self.raw_data.split('\n\n', 1)[0]

But this doesn't agree with my understanding of the RFC822 email
format, which is that the blank line should be represented by \r\n\r\n

Can anyone suggest where my understanding is wrong?
Thanks

Your understanding isn't wrong, but somehow you are acquiring emails
with only line feed line endings. This may be the case of opening a
file and getting universal line-ending support (which tosses '\r'). This
could be the case of some other processing you do perhaps stripping it
out (I don't use the email package, so don't know what it may or may not
be doing).

A known method of normalizing line endings for data that could come from
anywhere is through the use of regular expressions:

email = re.sub('(\r\n|\r|\n)', email_with_ambiguous_line_endings, '\r\n')
If you know your data to be good on disk, perhaps it would be better to
open files as 'rb' to make sure that universal line ending support is
not used.

- Josiah

Jul 18 '05 #6

P: n/a
"andrew blah" <an***********@xse.com.au> wrote in message news:<10**********************@f14g2000cwb.googleg roups.com>...
I need to get the text of the body (the payload) of an email.
As I understand it, an email has headers at the top, then a blank line,
then the body of the message.
I want to get the text of the body - every character from the new line
after the headers until the end of the message.
[headers]
[blank line]
[body]

You explained how to do it ;)
I want to get the text of the body - every character from the new line
after the headers until the end of the message.
If you just find the first blank line then the next line is the start
of the email body ;)

import poplib
Mail = poplib.POP3('mail.yourserver.net')
Mail.user('username')
Mail.pass_("userpass")
# just get the first message
MyMessage=Mail.retr(1)
FullText=""
PastHeaders=0
for MsgLine in MyMessage[1]:
if PastHeaders==0:
if (len(MsgLine)==0):
PastHeaders = 1
else:
FullText +=MsgLine+'\n'
Mail.quit()
print FullText

This is from Python 2.1 Bible(Dave Brueck,Stephen Tanner);)
That book is an awesome reference still today!
My objective is to do an SHA hash on the body text so the get_payload
method isn't what I am after.
Can anyone suggest a convenient way to get access to the raw message
payload?
Thanks in advance for your help.

HTH,
M.E.Farmer :)
Jul 18 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.