469,903 Members | 2,312 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,903 developers. It's quick & easy.

way to extract only the message from pop3

Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

like remove the fields below:
"
Return-Path:
X-Original-To:
Received: from [
by (Postfix) with ESMTP id B32382613C
for Tue, 3 Apr 2007 09:54:28 -0300 (BRT)
Date: Tue, 03 Apr 2007 09:52:15 -0300
From: <@>
To:
Subject: test
Message-Id:
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.24.02 [en]
X-UIDL: !Dn!!HKT!!/k
Status: RO
"
and only get this:

this is a text message..
...

Thanks

Apr 3 '07 #1
9 8472
On Apr 3, 2:36 pm, "flit" <superf...@gmail.comwrote:
Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

like remove the fields below:
"
Return-Path:
X-Original-To:
Received: from [
by (Postfix) with ESMTP id B32382613C
for Tue, 3 Apr 2007 09:54:28 -0300 (BRT)
Date: Tue, 03 Apr 2007 09:52:15 -0300
From: <@>
To:
Subject: test
Message-Id:
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Becky! ver. 2.24.02 [en]
X-UIDL: !Dn!!HKT!!/k
Status: RO
"
and only get this:

this is a text message..
..

Thanks


I found a tutorial on parsing email that should help you:

http://www.devshed.com/c/a/Python/Py...Email-Parsing/

Also see the email module:

http://www.python.org/doc/2.3.5/lib/module-email.html

Mike

Apr 3 '07 #2
ky******@gmail.com wrote:
I found a tutorial on parsing email that should help you:

http://www.devshed.com/c/a/Python/Py...Email-Parsing/
>
Also see the email module:

http://www.python.org/doc/2.3.5/lib/module-email.html

Mike
Well, I couldn't work with that stuff, especially with that tutorial, some
time ago. It worked better, when I took a look at the code, user "rogen"
wrote in his first posting here:

http://www.python-forum.de/topic-7507.html

It's really not beautiful code, but as soon as you understand, what he does,
you're almost there.

See You

H.
Apr 3 '07 #3
"flit" <su*******@gmail.comwrote:
>
Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?
Only by using Python code. The other responses gave you good pointers
toward that path, but I thought I would point out the reason why.

The POP3 protocol is surprisingly primitive. There are only two commands
to fetch a message: RETR, which fetches the headers and the entire message,
and TOP, which fetches the headers and the first N lines of the message.
The key point is that both commands fetch the headers.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Apr 4 '07 #4
Yep you are right..
I made an filter to get the data in the message I want..
So it´s not the most beatiful code, but works. :)

On Apr 4, 4:11 am, Tim Roberts <t...@probo.comwrote:
"flit" <superf...@gmail.comwrote:
Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?

Only by using Python code. The other responses gave you good pointers
toward that path, but I thought I would point out the reason why.

The POP3 protocol is surprisingly primitive. There are only two commands
to fetch a message: RETR, which fetches the headers and the entire message,
and TOP, which fetches the headers and the first N lines of the message.
The key point is that both commands fetch the headers.
--
Tim Roberts, t...@probo.com
Providenza & Boekelheide, Inc.

Apr 4 '07 #5
flit wrote:
Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?
As mentioned before, you should use module "email":

------------------------------------

#!/usr/bin/env python

import poplib
import email
import os
import sys
import string
PROVIDER = "pop.YourMailProvider.de"
USER = "YourUserName"
PASSWORD = "YourPassword"

try:
client = poplib.POP3(PROVIDER)
except:
print "Error: Provider not found."
sys.exit(1)

client.user(USER)
client.pass_(PASSWORD)

nrof_mails = len(client.list()[1])

for i in range(nrof_mails):
lines = client.retr(i + 1)[1]
mailstring = string.join(lines, "\n")
msg = email.message_from_string(mailstring)

for part in msg.walk():
blockit = 0

if part.get_content_maintype() == "text" and blockit == 0:
blockit = 1
mycontent = part.get_payload()
mycontent = mycontent.decode("quopri_codec")
print mycontent
print

client.quit()

------------------------------------

See You

H.
Apr 5 '07 #6
flit wrote:
Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?
As mentioned before, you should use module "email":

------------------------------------

#!/usr/bin/env python

import poplib
import email
import os
import sys
import string
PROVIDER = "pop.YourMailProvider.de"
USER = "YourUserName"
PASSWORD = "YourPassword"

try:
client = poplib.POP3(PROVIDER)
except:
print "Error: Provider not found."
sys.exit(1)

client.user(USER)
client.pass_(PASSWORD)

nrof_mails = len(client.list()[1])

for i in range(nrof_mails):
lines = client.retr(i + 1)[1]
mailstring = string.join(lines, "\n")
blockit = 0

msg = email.message_from_string(mailstring)

for part in msg.walk():

if part.get_content_maintype() == "text" and blockit == 0:
blockit = 1
mycontent = part.get_payload()
mycontent = mycontent.decode("quopri_codec")
print mycontent
print

client.quit()

------------------------------------

See You

H.
Apr 5 '07 #7
En Thu, 05 Apr 2007 15:09:18 -0300, Collin Stocks <co**********@gmail.com>
escribió:
message=whole_message[len(headers):None]

You can omit the word None: it is just there for clarity purposes.
Uhm... I can't find any usage of slices including an explicit None in
code.google.com (except on the Python test suite), and really I don't
consider that to be more readable than whole_message[len(headers):]
But of course this is just a stylistic issue.

--
Gabriel Genellina

Apr 5 '07 #8
On 05/04/07, Collin Stocks <co**********@gmail.comwrote:
On 3 Apr 2007 12:36:10 -0700, flit <su*******@gmail.comwrote:
Hello All,

Using poplib in python I can extract only the headers using the .top,
there is a way to extract only the message text without the headers?
so get two strings: only headers, and the whole message.
find the length of the headers, and chop that off the beginning of the whole
message:
message=whole_message[len(headers):None]
This way you have to perform 2 downloads, the headers and the whole
message. Then join them both into strings and subtract one from the
other by slicing or other means.

(other means? body = whole_message.replace(headers,'' ) or maybe not ! :) )

The body starts at the first blank line after the Subject: header, in
practice this is the first blank line. This is a good starting point
for something simple like my earlier suggestion:

msg = '\r\n'.join( M.retr(i+1)[1] ) # retrieve the email into string
hdrs,body = msg.split('\r\n\r\n',1) # split it into hdrs & body

If the original poster required the body to be seperated from the
headers (and I received a private reply from the OP to my original
post that suggested it probably was) then splitting a joined whole
message at the first blank line is sufficient and only requires 1
download without using the email module

If the OP required just the text parts extracted from the message then
it gets a bit trickier, the email module is the way to go but not
quite how a previous poster used it.

Consider an email that routed through my (python) SMTP servers and
filters today,.

Content: ['text/plain', 'text/html', 'message/delivery-status',
'text/plain', 'text/plain', 'text/plain', 'unknown', 'message/rfc822',
'text/plain', 'text/html']

Is text/html a text part or an html part for this exercise ? :)

You need to walk the parts and use something like

# part.get_content_maintype() requires a further call
# to get_content_subtype() , so use
# part.get_content_type() instead.

required = ['text/plain', 'text/tab-separated-values']
for part in EMAIL_OBJ.walk():
text_parts = []
if part.get_content_type() in required:
text_parts.append(part)

print ('\r\n' + '='*76 +'\r\n').join(text_parts)
# print all the text parts seperated by a line of '='
# end

Whether you use the email module or not, you need to join the
retrieved message into a string. You can use \n but if you plan to
push the text back out in an email '\r\n' is required for the SMTP
sending part. Your client may or may not convert \n to \r\n at
sending time :)

HTH :)

--

Tim Williams
Apr 5 '07 #9
On 06/04/07, Tim Williams <ti*@tdw.netwrote:
Content: ['text/plain', 'text/html', 'message/delivery-status',
'text/plain', 'text/plain', 'text/plain', 'unknown', 'message/rfc822',
'text/plain', 'text/html']
I should explain that this was the content in a single email
>
# part.get_content_maintype() requires a further call
# to get_content_subtype() , so use
# part.get_content_type() instead.

required = ['text/plain', 'text/tab-separated-values']
for part in EMAIL_OBJ.walk():
text_parts = []
if part.get_content_type() in required:
text_parts.append(part)

print ('\r\n' + '='*76 +'\r\n').join(text_parts)
# print all the text parts seperated by a line of '='
# end
Content: ['text/plain', 'text/html', 'message/delivery-status',
'text/plain', 'text/plain', 'text/plain', 'unknown', 'message/rfc822',
'text/plain', 'text/html']

Is text/html a text part or an html part for this exercise ? :)

You need to walk the parts and use something like

# part.get_content_maintype() requires a further call
# to get_content_subtype() , so use
# part.get_content_type() instead.

required = ['text/plain', 'text/tab-separated-values']
for part in EMAIL_OBJ.walk():
# text_parts = [] <== oops, this should be above the for.....
if part.get_content_type() in required:
text_parts.append(part)

print ('\r\n' + '='*76 +'\r\n').join(text_parts)
# print all the text parts seperated by a line of '='
# end
Apr 5 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by matt roberts | last post: by
1 post views Thread by Lev Altshuler | last post: by
reply views Thread by Eric McDaniel | last post: by
2 posts views Thread by Rui | last post: by
7 posts views Thread by erikcw | last post: by
1 post views Thread by rodny.romero | last post: by
reply views Thread by =?Utf-8?B?Q2hhcmxlcw==?= | last post: by
reply views Thread by Salome Sato | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.