473,320 Members | 1,945 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

how we best parse incoming email messages?

Hi,

We find ourselves in the unenviable position of creating an email
reader, may I ask how we best parse incoming messages? Ideally we would
point the parser at a email stored in a POP3--grab the email body's
bytestreams, and get back an array of AttachmentFile collections
(filename, size, mime/type), as well as a HTMLBody and a TextBody*.

Both HTMLBody and TextBody would be filled if it's a multipart/alternate.

Only HTMLBody would be filled if it's HTML only.

Only TextBody would be filled if it's a text/plain message.

Ofcourse it's never this simple, with text-encoding to deal
with--ideally everything looks like unicode String() classes .NET
programmers are familiar with.

We are wondering if we should be studying webmail solutions in PHP siace
to see how they parse it. Or ask if you know of a truly commercial
COM/.NET component.

We are working with an email deployment partner at the moment and we are
having a lot of trouble parsing emails, any help would be greatly
appreciated.

Also bounces and error messages, what's a good component that will catch
all signatures (for most email servers)--it s a whole different (perhaps
bigger) can of worms.

Any advice would be greatly appreciated, thank you ahead of time!

Best regards,
-- Li-fan Chen
Apr 25 '06 #1
2 13659
Sounds to me like you need a POP3 EMail message Mime parser. I've seen
several open-source implementations of POP3 "applications" out there, all you
need to do is tune up those google or MSN search key phrases.
Peter

--
Co-founder, Eggheadcafe.com developer portal:
http://www.eggheadcafe.com
UnBlog:
http://petesbloggerama.blogspot.com


"Li-fan Chen" wrote:
Hi,

We find ourselves in the unenviable position of creating an email
reader, may I ask how we best parse incoming messages? Ideally we would
point the parser at a email stored in a POP3--grab the email body's
bytestreams, and get back an array of AttachmentFile collections
(filename, size, mime/type), as well as a HTMLBody and a TextBody*.

Both HTMLBody and TextBody would be filled if it's a multipart/alternate.

Only HTMLBody would be filled if it's HTML only.

Only TextBody would be filled if it's a text/plain message.

Ofcourse it's never this simple, with text-encoding to deal
with--ideally everything looks like unicode String() classes .NET
programmers are familiar with.

We are wondering if we should be studying webmail solutions in PHP siace
to see how they parse it. Or ask if you know of a truly commercial
COM/.NET component.

We are working with an email deployment partner at the moment and we are
having a lot of trouble parsing emails, any help would be greatly
appreciated.

Also bounces and error messages, what's a good component that will catch
all signatures (for most email servers)--it s a whole different (perhaps
bigger) can of worms.

Any advice would be greatly appreciated, thank you ahead of time!

Best regards,
-- Li-fan Chen

Apr 25 '06 #2
Li-fan Chen <ob********@hotmail.com> wrote:
We find ourselves in the unenviable position of creating an email
reader, may I ask how we best parse incoming messages? Ideally we would
point the parser at a email stored in a POP3--grab the email body's
bytestreams, and get back an array of AttachmentFile collections
(filename, size, mime/type), as well as a HTMLBody and a TextBody*.


I hate POP3! it seems bad to alienate your non-pop3 audience.

I write an email archiver+reader. To get the emails, (1) I scanned all
available MAPI messages. This will cover all the messages in Outlook
that are in a local PST, and all the ones that are available through
Exchange, and all the ones that are available offline. I told the user
about the ones that weren't available offline. (2) I scanned through
available Outlook Express messages. This covers all the ones that were
available offline. (3) I allowed power-users to download email folders
in Berkely Mail Format, the unix standard (basically just concatenated
RFC822 or whatever messages).

I read up on the RFC specs about multipart/alternative &c.
Unfortunately MAPI doesn't expose the MIME structure of its emails.
Unfortunately it wraps HTML up inside an RTF. Fortunately I was able
to read the RTF and extract out the original HTML.

Some of my code is available online. It's in C++ only, but it may give
you ideas. (C++ let me write very efficient finite-state-machines to
parse the email structure, letting me chew through several megabytes
of email per second).

http://www.wischik.com/lu/programmer/mapi_utils.html
http://www.wischik.com/lu/programmer/dbx_utils.html

--
Lucian
Apr 25 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Erik Rosenbach | last post by:
I have a question about how to parse out mime from a message. I have email messages that are stored in a database table and these messages have mime headers embedded within them. How can I parse...
6
by: chuck amadi | last post by:
Hi , Im trying to parse a specific users mailbox (testwwws) and output the body of the messages to a file ,that file will then be loaded into a PostGresql DB at some point . I have read the...
1
by: chuck amadi | last post by:
By the way list is there a better way than using the readlines() to > > >parse the mail data into a file , because Im using > > >email.message_from_file it returns > > >all the data i.e reads one...
7
by: Adam Clauss | last post by:
I am trying to work-around a firewall which limits me to only being able to accept inbound connections on port 80. Unfortunately, I need to two different applications to be able to accept...
3
by: danavni | last post by:
i need to build a service that will accept incoming TCP/IP connections. the service should act like a "HUB" where on one side clients connect to it and stay connected for as long as they like and...
10
by: Mike Logan | last post by:
I am using the "contract first" design methodology. Contract First is design the WSDL first then design the server and client. However I must design my XSD/XML Schema before anything. I am...
3
by: Alex | last post by:
Hi everyone, We're trying to write a script that will monitor 6 email addresses and if an email arrives it will print to a specific network printer based on the email address. So if the address...
2
by: pmarg212 | last post by:
Greetings, I currently have incoming email piped to a php script using a .forward file. I would like to be able to parse the incoming mail, perform operations, and then fire back an email...
29
by: gs | last post by:
let say I have to deal with various date format and I am give format string from one of the following dd/mm/yyyy mm/dd/yyyy dd/mmm/yyyy mmm/dd/yyyy dd/mm/yy mm/dd/yy dd/mmm/yy mmm/dd/yy
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.