473,405 Members | 2,334 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Standard module for parsing emails?

Is there a standard library for parsing emails that can cope with the
different way email clients quote?
Jul 30 '08 #1
13 1934
Phillip B Oldham wrote:
Is there a standard library for parsing emails that can cope with the
different way email clients quote?
AFAIK not - as unfortunately that's something the user can configure, and
thus no atrocity is unimaginable. Hard to write a module for that...

All you can try is to apply a heuristic like "if there are lines all
starting with a certain prefix that contains non-alphanumeric characters".
But then if the user configures to quote using

XX

you're doomed...

Diez
Jul 30 '08 #2
Phillip B Oldham <ph************@gmail.comwrites:
Is there a standard library for parsing emails that can cope with
the different way email clients quote?
"Cope with" in what sense? i.e., what would the behaviour of such a
library be? What would it do?

Note also that it's not merely the mail client that does the quoting;
frequently the user composing the message will have a heavy hand in
how the quoted material appears.

--
\ “Time flies like an arrow. Fruit flies like a banana.” —Groucho |
`\ Marx |
_o__) |
Ben Finney
Jul 30 '08 #3
Phillip B Oldham schrieb:
Is there a standard library for parsing emails that can cope with the
different way email clients quote?
What do you mean with "quote" here?
1. Encode utf8/latin1 to ascii
2. Prefix of quoted text like your text above in my mail

Thomas
--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de
Jul 30 '08 #4
On Jul 30, 2:36 pm, Thomas Guettler <h...@tbz-pariv.dewrote:
What do you mean with "quote" here?
2. Prefix of quoted text like your text above in my mail
Basically, just be able to parse an email into its actual and "quoted"
parts - lines which have been prefixed to indent from a previous
email.

Most clients use ">" which is easy to check for, but I've seen some
which use "|" and some which *don't* quote at all. Its causing us
nightmares in parsing responses to system-generated emails. I was
hoping someone might've seen the problem previously and released some
code.
Jul 30 '08 #5
If there isn't a standard library for parsing emails, is there one for
connecting to a pop/imap resource and reading the mailbox?
Jul 30 '08 #6
Le Wednesday 30 July 2008 17:15:07 Phillip B Oldham, vous avez crit*:
If there isn't a standard library for parsing emails, is there one for
connecting to a pop/imap resource and reading the mailbox?
--
http://mail.python.org/mailman/listinfo/python-list
There are both shipped with python, email module and poplib, both very well
documented in the official doc (with examples and all).

email module is rather easy to use, and really powerful, but you'l need to
manage yourself the many ways email clients compose a message, and broken php
webmails that doesn't respect RFCs (notably about encoding)...

--
_____________

Maric Michaud
Jul 30 '08 #7
Phillip B Oldham wrote:
If there isn't a standard library for parsing emails, is there one for
connecting to a pop/imap resource and reading the mailbox?
--
http://mail.python.org/mailman/listinfo/python-list

The search [1] yielded these results:
1) http://docs.python.org/lib/module-email.html
2)
http://www.devshed.com/c/a/Python/Py...Email-Parsing/

I have used the email module very successfully.

Also you can try the following to connect to mailboxes:
1) poplib
2) smtplib

For parsing the mails I would recommend pyparsing.
[1]
http://www.google.com/search?client=...utf-8&oe=utf-8

Regards

Nicolaas

--

The three things to remember about Llamas:
1) They are harmless
2) They are deadly
3) They are made of lava, and thus nice to cuddle.
Jul 30 '08 #8
Le Wednesday 30 July 2008 17:55:35 Aspersieman, vous avez crit*:
For parsing the mails I would recommend pyparsing.
Why ? email module is a great parser IMO.

--
_____________

Maric Michaud
Jul 30 '08 #9
On Jul 30, 3:11*pm, Phillip B Oldham <phillip.old...@gmail.comwrote:
On Jul 30, 2:36 pm, Thomas Guettler <h...@tbz-pariv.dewrote:
What do you mean with "quote" here?
* 2. Prefix of quoted text like your text above in my mail

Basically, just be able to parse an email into its actual and "quoted"
parts - lines which have been prefixed to indent from a previous
email.

Most clients use ">" which is easy to check for, but I've seen some
which use "|" and some which *don't* quote at all. Its causing us
nightmares in parsing responses to system-generated emails. I was
hoping someone might've seen the problem previously and released some
code.
The problem is that sometimes lines might start with ">" for other
reasons, eg text copied from an interactive Python session, which
could occur in ... um ... _this_ newsgroup. :-)
Jul 30 '08 #10
Maric Michaud wrote:
Le Wednesday 30 July 2008 17:55:35 Aspersieman, vous avez écrit*:
>For parsing the mails I would recommend pyparsing.

Why ? email module is a great parser IMO.
He talks about parsing the *content*, not the email envelope and possible
mime-body.

Diez
Jul 30 '08 #11
Le Wednesday 30 July 2008 19:25:31 Diez B. Roggisch, vous avez écrit*:
Maric Michaud wrote:
Le Wednesday 30 July 2008 17:55:35 Aspersieman, vous avez écrit*:
For parsing the mails I would recommend pyparsing.
Why ? email module is a great parser IMO.

He talks about parsing the *content*, not the email envelope and possible
mime-body.
Yes ? I don't know what the OP want to do with the content, but if it's just
filtering the lines begining with a '>', pyparsing might be a bit
overweighted.

--
_____________

Maric Michaud

Jul 30 '08 #12
On Wed, 30 Jul 2008 07:11:45 -0700, Phillip B Oldham wrote:
Most clients use ">" which is easy to check for, but I've seen some
which use "|" and some which *don't* quote at all. Its causing us
nightmares in parsing responses to system-generated emails. I was hoping
someone might've seen the problem previously and released some code.
My sympathies.

I've even seen clients that prefix new (unquoted) text with the quote
character ">".

Well, possibly it's not the mail client, but the user. Who knows?

I will sometimes quote text like this:

[quote]
Something quoted.
[end quote]

But I'm writing for a human audience, not for a program.

The simple answer is that you can catch 90% of cases by checking for ">",
and another 1% by checking for "|". If the email contains HTML, I have
found that quoted text is sometimes in another colour. As for the rest,
well, sometimes even human beings can't easily determine what's quoted
and what isn't. Good luck getting a program to do it.

(Percentages are plucked out of thin air. YMMV.)
--
Steven
Jul 31 '08 #13
On Thu, 31 Jul 2008 02:25:37 +0000, Steven D'Aprano wrote:
On Wed, 30 Jul 2008 07:11:45 -0700, Phillip B Oldham wrote:
>Most clients use ">" which is easy to check for, but I've seen some
which use "|" and some which *don't* quote at all. Its causing us
nightmares in parsing responses to system-generated emails. I was
hoping someone might've seen the problem previously and released some
code.

My sympathies.

I've even seen clients that prefix new (unquoted) text with the quote
character ">".

Well, this is a new one I've never seen before: found on the python-dev
mailing list, somebody who (apparently) marks quoted text by inserting a
bare quote character on an otherwise empty line after each line of text,
similar to this:

I've even seen clients that prefix new (unquoted) text with the quote
>
character ">".
>
The user in question seems to be using gmail. I suspect a PEBCAK error.

--
Steven
Jul 31 '08 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Boris Boutillier | last post by:
Hi all, I'm looking for parsing a Verilog file in my python module, is there already such a tool in python (a module in progress) to help instead of doing a duplicate job. And do you know of...
3
by: dont bother | last post by:
Hey, I have been trying to parse emails: But I could not find any examples or snippets of parsing emails in python from the documentation. Google did not help me much too. I am trying to...
2
by: Michele Simionato | last post by:
I use Mozilla or Netscape, so my emails are stored in the nsmail directory or the Mozilla's equivalent. What's the simplest way to look at them and extract the mails with a given subject? In...
10
by: Ben Finney | last post by:
Howdy all, Question: I have Python modules named without '.py' as the extension, and I'd like to be able to import them. How can I do that? Background: On Unix, I write programs intended to...
2
by: Nirnimesh | last post by:
I want to extract emails from an mbox-type file which contains a number of individual emails. I tried the python mailbox and email modules individually, but I'm unable to combine them to get...
24
by: Marc Dubois | last post by:
hi, is it possible to parse an XML file in C so that i can fulfill these requirements : 1) replace all "<" and ">" signs inside the body of tag by a space, e.g. : Example 1: <fooblabla < bla...
4
by: =?Utf-8?B?QWxwYW5h?= | last post by:
I am making a thin email client and want to get emails from a pop3 server...Is there any built in support in C# to get emails from a pop3 server and parse the email to show up on the UI ?
0
by: anonymous | last post by:
Hi! My current task included date parsing and recognizing. It's not enough hard but there are some problems. A lot of standards of date exists at this time: ISO,ARPA...+non standard date...
0
by: Anonymous | last post by:
I'm looking for an ASP.Net 2.0 module/library for my new ASP.Net 2.0 website. I need to be able to generate support tickets from emails received from users (the email will contain an attachment...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.