Is there a standard library for parsing emails that can cope with the
different way email clients quote? 13 1934
Phillip B Oldham wrote:
Is there a standard library for parsing emails that can cope with the
different way email clients quote?
AFAIK not - as unfortunately that's something the user can configure, and
thus no atrocity is unimaginable. Hard to write a module for that...
All you can try is to apply a heuristic like "if there are lines all
starting with a certain prefix that contains non-alphanumeric characters".
But then if the user configures to quote using
XX
you're doomed...
Diez
Phillip B Oldham <ph************@gmail.comwrites:
Is there a standard library for parsing emails that can cope with
the different way email clients quote?
"Cope with" in what sense? i.e., what would the behaviour of such a
library be? What would it do?
Note also that it's not merely the mail client that does the quoting;
frequently the user composing the message will have a heavy hand in
how the quoted material appears.
--
\ “Time flies like an arrow. Fruit flies like a banana.” —Groucho |
`\ Marx |
_o__) |
Ben Finney
Phillip B Oldham schrieb:
Is there a standard library for parsing emails that can cope with the
different way email clients quote?
What do you mean with "quote" here?
1. Encode utf8/latin1 to ascii
2. Prefix of quoted text like your text above in my mail
Thomas
--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de
On Jul 30, 2:36 pm, Thomas Guettler <h...@tbz-pariv.dewrote:
What do you mean with "quote" here?
2. Prefix of quoted text like your text above in my mail
Basically, just be able to parse an email into its actual and "quoted"
parts - lines which have been prefixed to indent from a previous
email.
Most clients use ">" which is easy to check for, but I've seen some
which use "|" and some which *don't* quote at all. Its causing us
nightmares in parsing responses to system-generated emails. I was
hoping someone might've seen the problem previously and released some
code.
If there isn't a standard library for parsing emails, is there one for
connecting to a pop/imap resource and reading the mailbox?
Le Wednesday 30 July 2008 17:15:07 Phillip B Oldham, vous avez crit*:
If there isn't a standard library for parsing emails, is there one for
connecting to a pop/imap resource and reading the mailbox?
-- http://mail.python.org/mailman/listinfo/python-list
There are both shipped with python, email module and poplib, both very well
documented in the official doc (with examples and all).
email module is rather easy to use, and really powerful, but you'l need to
manage yourself the many ways email clients compose a message, and broken php
webmails that doesn't respect RFCs (notably about encoding)...
--
_____________
Maric Michaud
Le Wednesday 30 July 2008 17:55:35 Aspersieman, vous avez crit*:
For parsing the mails I would recommend pyparsing.
Why ? email module is a great parser IMO.
--
_____________
Maric Michaud
On Jul 30, 3:11*pm, Phillip B Oldham <phillip.old...@gmail.comwrote:
On Jul 30, 2:36 pm, Thomas Guettler <h...@tbz-pariv.dewrote:
What do you mean with "quote" here?
* 2. Prefix of quoted text like your text above in my mail
Basically, just be able to parse an email into its actual and "quoted"
parts - lines which have been prefixed to indent from a previous
email.
Most clients use ">" which is easy to check for, but I've seen some
which use "|" and some which *don't* quote at all. Its causing us
nightmares in parsing responses to system-generated emails. I was
hoping someone might've seen the problem previously and released some
code.
The problem is that sometimes lines might start with ">" for other
reasons, eg text copied from an interactive Python session, which
could occur in ... um ... _this_ newsgroup. :-)
Maric Michaud wrote:
Le Wednesday 30 July 2008 17:55:35 Aspersieman, vous avez écrit*:
>For parsing the mails I would recommend pyparsing.
Why ? email module is a great parser IMO.
He talks about parsing the *content*, not the email envelope and possible
mime-body.
Diez
Le Wednesday 30 July 2008 19:25:31 Diez B. Roggisch, vous avez écrit*:
Maric Michaud wrote:
Le Wednesday 30 July 2008 17:55:35 Aspersieman, vous avez écrit*:
For parsing the mails I would recommend pyparsing.
Why ? email module is a great parser IMO.
He talks about parsing the *content*, not the email envelope and possible
mime-body.
Yes ? I don't know what the OP want to do with the content, but if it's just
filtering the lines begining with a '>', pyparsing might be a bit
overweighted.
--
_____________
Maric Michaud
On Wed, 30 Jul 2008 07:11:45 -0700, Phillip B Oldham wrote:
Most clients use ">" which is easy to check for, but I've seen some
which use "|" and some which *don't* quote at all. Its causing us
nightmares in parsing responses to system-generated emails. I was hoping
someone might've seen the problem previously and released some code.
My sympathies.
I've even seen clients that prefix new (unquoted) text with the quote
character ">".
Well, possibly it's not the mail client, but the user. Who knows?
I will sometimes quote text like this:
[quote]
Something quoted.
[end quote]
But I'm writing for a human audience, not for a program.
The simple answer is that you can catch 90% of cases by checking for ">",
and another 1% by checking for "|". If the email contains HTML, I have
found that quoted text is sometimes in another colour. As for the rest,
well, sometimes even human beings can't easily determine what's quoted
and what isn't. Good luck getting a program to do it.
(Percentages are plucked out of thin air. YMMV.)
--
Steven
On Thu, 31 Jul 2008 02:25:37 +0000, Steven D'Aprano wrote:
On Wed, 30 Jul 2008 07:11:45 -0700, Phillip B Oldham wrote:
>Most clients use ">" which is easy to check for, but I've seen some which use "|" and some which *don't* quote at all. Its causing us nightmares in parsing responses to system-generated emails. I was hoping someone might've seen the problem previously and released some code.
My sympathies.
I've even seen clients that prefix new (unquoted) text with the quote
character ">".
Well, this is a new one I've never seen before: found on the python-dev
mailing list, somebody who (apparently) marks quoted text by inserting a
bare quote character on an otherwise empty line after each line of text,
similar to this:
I've even seen clients that prefix new (unquoted) text with the quote
>
character ">".
>
The user in question seems to be using gmail. I suspect a PEBCAK error.
--
Steven This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Boris Boutillier |
last post by:
Hi all,
I'm looking for parsing a Verilog file in my python module,
is there already such a tool in python (a module in progress) to
help instead of doing a duplicate job.
And do you know of...
|
by: dont bother |
last post by:
Hey,
I have been trying to parse emails:
But I could not find any examples or snippets of
parsing emails in python from the documentation.
Google did not help me much too.
I am trying to...
|
by: Michele Simionato |
last post by:
I use Mozilla or Netscape, so my emails are stored in the nsmail
directory
or the Mozilla's equivalent. What's the simplest way to look at them
and
extract the mails with a given subject? In...
|
by: Ben Finney |
last post by:
Howdy all,
Question: I have Python modules named without '.py' as the extension,
and I'd like to be able to import them. How can I do that?
Background:
On Unix, I write programs intended to...
|
by: Nirnimesh |
last post by:
I want to extract emails from an mbox-type file which contains a number
of individual emails.
I tried the python mailbox and email modules individually, but I'm
unable to combine them to get...
|
by: Marc Dubois |
last post by:
hi,
is it possible to parse an XML file in C so that i can fulfill these
requirements :
1) replace all "<" and ">" signs inside the body of tag by a space, e.g. :
Example 1:
<fooblabla < bla...
|
by: =?Utf-8?B?QWxwYW5h?= |
last post by:
I am making a thin email client and want to get emails from a pop3
server...Is there any built in support in C# to get emails from a pop3 server
and parse the email to show up on the UI ?
|
by: anonymous |
last post by:
Hi!
My current task included date parsing and recognizing.
It's not enough hard but there are some problems.
A lot of standards of date exists at this time:
ISO,ARPA...+non standard date...
|
by: Anonymous |
last post by:
I'm looking for an ASP.Net 2.0 module/library for my new ASP.Net 2.0
website.
I need to be able to generate support tickets from emails received from
users (the email will contain an attachment...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing,...
| |