I'm working on an app that's processing Usenet messages. I'm making a
connection to my NNTP feed and grabbing the headers for the groups I'm
interested in, saving the info to disk, and doing some post-processing.
I'm finding a few bizarre characters and I'm not sure how to handle them
pythonically.
One of the lines I'm finding this problem with contains:
137050 Cleo and I have an anouncement! "Mlle. =?iso-8859-1?Q?Ana=EFs?="
<no*@aol.com Sun, 21 Nov 2004 16:21:50 -0500
<lm***************************@40tude.net 4478 69 Xref:
sn-us rec.pets.cats.community:137050
The interesting patch is the string that reads "=?iso-8859-1?Q?Ana=EFs?=".
An HTML rendering of what this string should look would be "Anaïs".
What I'm doing now is a brute-force substitution from the version in the
file to the HTML version. That's ugly. What's a better way to translate
that string? Or is my problem that I'm grabbing the headers from the NNTP
server incorrectly?