473,327 Members | 2,094 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

Problem with parsing email message with extraneous MIME information

I am working on processing eml email message using the email module (python
2.5), on files exported from an Outlook PST file, to extract the composite
parts of the email. In most instances this works fine, the message is read
in using message_from_file, is_multipart returns True and I can process each
component and extract message attachments.

I am however running into problem with email messages that contain emails
forwarded as attachments. The email has some additional encapulated header
information from each of the forwared emails.When I processes the files
is_multipart returns False the content-type is reported as text/plain
and the payload includes all the message body from 'This message is in MIME
format' though to the end.

for example.

<email header>
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_000_01C43634.1A06A235
------_=_NextPart_001_01C43634.1A06A235
------_=_NextPart_001_01C43634.1A06A235
------_=_NextPart_001_01C43634.1A06A235--
------_=_NextPart_000_01C43634.1A06A235
<attached message header>
------_=_NextPart_002_01C43634.1A06A235
------_=_NextPart_003_01C43634.1A06A235
------_=_NextPart_003_01C43634.1A06A235
------_=_NextPart_003_01C43634.1A06A235--
------_=_NextPart_002_01C43634.1A06A235
------_=_NextPart_002_01C43634.1A06A235--
------_=_NextPart_000_01C43634.1A06A235
Mime-Version: 1.0
Content-Type: multipart/mixed;
boundary="------------m.182DA3C.BE6A21A3"
<rest of the message body>

If I remove the section of the email from the 'This is in MIME format'
through to Mime-Version: 1.0 the message is processed correctly. (ie.
is_multipart = True , Content-Type = multipart/mixed etc.)

Could anybody tell me if the above message header breaks the conventions for
email messages or is it just some that is not handled correctly by the email
module.

I would appreciate any feedback from anyone else who has experienced such
problems or could provide hints to a reliable solution.

Thanks,
Steve
Dec 21 '07 #1
3 3801
En Fri, 21 Dec 2007 10:22:53 -0300, Steven Allport <sa******@altirium.com>
escribió:
I am working on processing eml email message using the email module
(python
2.5), on files exported from an Outlook PST file, to extract the
composite
parts of the email. In most instances this works fine, the message is
read
in using message_from_file, is_multipart returns True and I can process
each
component and extract message attachments.

I am however running into problem with email messages that contain emails
forwarded as attachments. The email has some additional encapulated
header
information from each of the forwared emails.When I processes the files
is_multipart returns False the content-type is reported as text/plain
and the payload includes all the message body from 'This message is in
MIME
format' though to the end.

for example.

<email header>
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
This message is in MIME format. Since your mail reader does not
understand
this format, some or all of this message may not be legible.
------_=_NextPart_000_01C43634.1A06A235
------_=_NextPart_001_01C43634.1A06A235
------_=_NextPart_001_01C43634.1A06A235
------_=_NextPart_001_01C43634.1A06A235--
------_=_NextPart_000_01C43634.1A06A235
<attached message header>
------_=_NextPart_002_01C43634.1A06A235
------_=_NextPart_003_01C43634.1A06A235
------_=_NextPart_003_01C43634.1A06A235
------_=_NextPart_003_01C43634.1A06A235--
------_=_NextPart_002_01C43634.1A06A235
------_=_NextPart_002_01C43634.1A06A235--
------_=_NextPart_000_01C43634.1A06A235
Mime-Version: 1.0
Content-Type: multipart/mixed;
boundary="------------m.182DA3C.BE6A21A3"
<rest of the message body>

If I remove the section of the email from the 'This is in MIME format'
through to Mime-Version: 1.0 the message is processed correctly. (ie.
is_multipart = True , Content-Type = multipart/mixed etc.)
Is this an actual message fragment? Can't be, or else it's broken. Headers
are separated from message body by one blank line. At least there should
be a blank line before "This message is in MIME...".
And are actually all those xxx_NextPart_xxx lines one after the other?
Could anybody tell me if the above message header breaks the conventions
for
email messages or is it just some that is not handled correctly by the
email
module.
Could you post, or better leave available somewhere, a complete message
(as originally exported by Outlook, before any processing)?

--
Gabriel Genellina

Dec 27 '07 #2
I think I faced same problem quite sometime back...
but in our case, due to some settings in Microsoft outlook , forwarded
emails were also coming as an attachment to email.

so That attachement itself has same format as email's format, so to
get information from attachment we needed to treat attachment as a
email to parse it.
can you send me some full email file? it will help to analyze
problem..
sandip

On Dec 21, 6:22*pm, "Steven Allport" <sallp...@altirium.comwrote:
I am working on processing eml email message using the email module (python
2.5), on files exported from an Outlook PST file, to extract the composite
parts of the email. In most instances this works fine, the message is read
in using message_from_file, is_multipart returns True and I can process each
component and extract message attachments.

I am however running into problem with email messages that contain emails
forwarded as attachments. The email has some additional encapulated header
information from each of the forwared emails.When I processes the files
is_multipart returns False the content-type is reported as text/plain
and the payload includes all the message body from 'This message is in MIME
format' though to the end.

for example.

<email header>
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_000_01C43634.1A06A235
------_=_NextPart_001_01C43634.1A06A235
------_=_NextPart_001_01C43634.1A06A235
------_=_NextPart_001_01C43634.1A06A235--
------_=_NextPart_000_01C43634.1A06A235
<attached message header>
------_=_NextPart_002_01C43634.1A06A235
------_=_NextPart_003_01C43634.1A06A235
------_=_NextPart_003_01C43634.1A06A235
------_=_NextPart_003_01C43634.1A06A235--
------_=_NextPart_002_01C43634.1A06A235
------_=_NextPart_002_01C43634.1A06A235--
------_=_NextPart_000_01C43634.1A06A235
Mime-Version: 1.0
Content-Type: multipart/mixed;
*boundary="------------m.182DA3C.BE6A21A3"
<rest of the message body>

If I remove the section of the email from the 'This is in MIME format'
through to Mime-Version: 1.0 the message is processed correctly. (ie.
is_multipart = True , Content-Type = multipart/mixed etc.)

Could anybody tell me if the above message header breaks the conventions for
email messages or is it just some that is not handled correctly by the email
module.

I would appreciate any feedback from anyone else who has experienced such
problems or could provide hints to a reliable solution.

Thanks,
Steve
Dec 27 '07 #3
Thanks for the response.

The section of the email is an actual message fragment. The first blank line
that
appears in the message is immediately after the 1st

' boundary="------------m.182DA3C.BE6A21A3"'

There are no blank line prior to this in the message.

In the example that was snipped from an actual exported message there
is a set of 5 _NextPart_ lines followed by the message header for the 1st
attached message then a set of 7 _NextPart_ lines followed by the messge
header for the 2nd attached message. Comprising in total 6 set of _NextPart_
lines. As some of the attached messages also contained messages as
attachments.

Unfortunately it is not possible for me to post or leave the message
anywhere
for you and so far I have been unable to recreate a test message of similar
format. I will endeavour to do so and will if I can will let you know where
it
is.

"Gabriel Genellina" <ga*******@yahoo.com.arwrote in message
news:ma***************************************@pyt hon.org...
En Fri, 21 Dec 2007 10:22:53 -0300, Steven Allport <sa******@altirium.com>
escribió:
>I am working on processing eml email message using the email module
(python
2.5), on files exported from an Outlook PST file, to extract the
composite
parts of the email. In most instances this works fine, the message is
read
in using message_from_file, is_multipart returns True and I can process
each
component and extract message attachments.

I am however running into problem with email messages that contain emails
forwarded as attachments. The email has some additional encapulated
header
information from each of the forwared emails.When I processes the files
is_multipart returns False the content-type is reported as text/plain
and the payload includes all the message body from 'This message is in
MIME
format' though to the end.

for example.

<email header>
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
This message is in MIME format. Since your mail reader does not
understand
this format, some or all of this message may not be legible.
------_=_NextPart_000_01C43634.1A06A235
------_=_NextPart_001_01C43634.1A06A235
------_=_NextPart_001_01C43634.1A06A235
------_=_NextPart_001_01C43634.1A06A235--
------_=_NextPart_000_01C43634.1A06A235
<attached message header>
------_=_NextPart_002_01C43634.1A06A235
------_=_NextPart_003_01C43634.1A06A235
------_=_NextPart_003_01C43634.1A06A235
------_=_NextPart_003_01C43634.1A06A235--
------_=_NextPart_002_01C43634.1A06A235
------_=_NextPart_002_01C43634.1A06A235--
------_=_NextPart_000_01C43634.1A06A235
Mime-Version: 1.0
Content-Type: multipart/mixed;
boundary="------------m.182DA3C.BE6A21A3"
<rest of the message body>

If I remove the section of the email from the 'This is in MIME format'
through to Mime-Version: 1.0 the message is processed correctly. (ie.
is_multipart = True , Content-Type = multipart/mixed etc.)

Is this an actual message fragment? Can't be, or else it's broken. Headers
are separated from message body by one blank line. At least there should
be a blank line before "This message is in MIME...".
And are actually all those xxx_NextPart_xxx lines one after the other?
>Could anybody tell me if the above message header breaks the conventions
for
email messages or is it just some that is not handled correctly by the
email
module.

Could you post, or better leave available somewhere, a complete message
(as originally exported by Outlook, before any processing)?

--
Gabriel Genellina


Jan 2 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Tony Vitonis | last post by:
In my solution's Setup project, I'm adding a folder to the user's Start Menu, and putting a couple of shortcuts in it. When I use Add/Remove Programs to uninstall the application, the shortcuts...
0
by: | last post by:
Hello, Is anyone have an example of RegExp expression to parse .EML files (Email Message)? I need to extract headers, HTML body, Textual body and attachments if any exists. I did some...
3
by: Cuong.Tong | last post by:
Greeting, I am writing my own web server and having some problme parsing the the mulitpart/form-data stream that is sent from the browsers. I have a form looks something like this <form...
7
by: Ron Garret | last post by:
I'm writing a little HTTP server and need to parse request content that is mime-encoded. All the MIME routines in the Python standard library seem to have been subsumed into the email package,...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.