473,288 Members | 1,750 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,288 software developers and data experts.

Handling some isolated iso-8859-1 characters

I'm working on an app that's processing Usenet messages. I'm making a
connection to my NNTP feed and grabbing the headers for the groups I'm
interested in, saving the info to disk, and doing some post-processing.
I'm finding a few bizarre characters and I'm not sure how to handle them
pythonically.

One of the lines I'm finding this problem with contains:
137050 Cleo and I have an anouncement! "Mlle. =?iso-8859-1?Q?Ana=EFs?="
<no*@aol.com Sun, 21 Nov 2004 16:21:50 -0500
<lm***************************@40tude.net 4478 69 Xref:
sn-us rec.pets.cats.community:137050

The interesting patch is the string that reads "=?iso-8859-1?Q?Ana=EFs?=".
An HTML rendering of what this string should look would be "Ana&iuml;s".

What I'm doing now is a brute-force substitution from the version in the
file to the HTML version. That's ugly. What's a better way to translate
that string? Or is my problem that I'm grabbing the headers from the NNTP
server incorrectly?

Jun 27 '08 #1
5 1105
On Jun 4, 2:38 am, Daniel Mahoney <d...@catfolks.netwrote:
I'm working on an app that's processing Usenet messages. I'm making a
connection to my NNTP feed and grabbing the headers for the groups I'm
interested in, saving the info to disk, and doing some post-processing.
I'm finding a few bizarre characters and I'm not sure how to handle them
pythonically.

One of the lines I'm finding this problem with contains:
137050 Cleo and I have an anouncement! "Mlle. =?iso-8859-1?Q?Ana=EFs?="
<n...@aol.com Sun, 21 Nov 2004 16:21:50 -0500
<lmzdkqmqt2fj.54wmpv3zmvvx....@40tude.net 4478 69 Xref:
sn-us rec.pets.cats.community:137050

The interesting patch is the string that reads "=?iso-8859-1?Q?Ana=EFs?=".
An HTML rendering of what this string should look would be "Ana&iuml;s".

What I'm doing now is a brute-force substitution from the version in the
file to the HTML version. That's ugly. What's a better way to translate
that string? Or is my problem that I'm grabbing the headers from the NNTP
server incorrectly?
>>from email.Header import decode_header
decode_header("=?iso-8859-1?Q?Ana=EFs?=")
[('Ana\xefs', 'iso-8859-1')]
>>(s, e), = decode_header("=?iso-8859-1?Q?Ana=EFs?=")
s
'Ana\xefs'
>>e
'iso-8859-1'
>>s.decode(e)
u'Ana\xefs'
>>import unicodedata
import htmlentitydefs
for c in s.decode(e):
.... print ord(c), unicodedata.name(c)
....
65 LATIN CAPITAL LETTER A
110 LATIN SMALL LETTER N
97 LATIN SMALL LETTER A
239 LATIN SMALL LETTER I WITH DIAERESIS
115 LATIN SMALL LETTER S
>>htmlentitydefs.codepoint2name[239]
'iuml'
>>>
Jun 27 '08 #2
En Tue, 03 Jun 2008 15:38:09 -0300, Daniel Mahoney <da*@catfolks.net>
escribió:
I'm working on an app that's processing Usenet messages. I'm making a
connection to my NNTP feed and grabbing the headers for the groups I'm
interested in, saving the info to disk, and doing some post-processing.
I'm finding a few bizarre characters and I'm not sure how to handle them
pythonically.

One of the lines I'm finding this problem with contains:
137050 Cleo and I have an anouncement! "Mlle.
=?iso-8859-1?Q?Ana=EFs?="
<no*@aol.com Sun, 21 Nov 2004 16:21:50 -0500
<lm***************************@40tude.net 4478 69 Xref:
sn-us rec.pets.cats.community:137050

The interesting patch is the string that reads
"=?iso-8859-1?Q?Ana=EFs?=".
An HTML rendering of what this string should look would be "Ana&iuml;s".

What I'm doing now is a brute-force substitution from the version in the
file to the HTML version. That's ugly. What's a better way to translate
that string? Or is my problem that I'm grabbing the headers from the NNTP
server incorrectly?
No, it's not you, those headers are formatted following RFC 2047
<http://www.faqs.org/ftp/rfc/rfc2047.txt>
Python already has support for that format, use the email.header class,
see <http://docs.python.org/lib/module-email.header.html>

--
Gabriel Genellina

Jun 27 '08 #3
No, it's not you, those headers are formatted following RFC 2047
<http://www.faqs.org/ftp/rfc/rfc2047.txt>
Python already has support for that format, use the email.header class,
see <http://docs.python.org/lib/module-email.header.html>
Excellent, that's exactly what I was looking for. Thanks!

Jun 27 '08 #4
... print ord(c), unicodedata.name(c)
...
65 LATIN CAPITAL LETTER A
110 LATIN SMALL LETTER N
97 LATIN SMALL LETTER A
239 LATIN SMALL LETTER I WITH DIAERESIS
115 LATIN SMALL LETTER S
Looks like I need to explore the unicodedata class. Thanks!
Jun 27 '08 #5
Daniel Mahoney skrev:
The interesting patch is the string that reads "=?iso-8859-1?Q?Ana=EFs?=".
An HTML rendering of what this string should look would be "Ana&iuml;s".

There is a mention of email headers and unicode in the end of this article:

http://mxm-mad-science.blogspot.com/...school-of.html

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science

Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Jody Gelowitz | last post by:
Exactly what is the size limit of Isolated Storage in the Internet permission set? I have a document indicating it is 10MB (http://msdn.microsoft.com/msdnmag/issues/02/07/NetSmartClients/) ...
5
by: juergen perlinger | last post by:
Hello out there. sometimes I need to have proper control of the floating point arithmetic of the C(and C++) runtime system, and using the f.p. exception handling of the C99 standard is quite...
3
by: gani | last post by:
hi, how to get the fullpath of created IsolatedStorage directory. thanks. -- gani
1
by: malcolm | last post by:
Hello, We use several user controls and derived custom controls. Some of which actually hit the database at design time to show data (such as filling a list box, etc...) Our c# client server...
0
by: Namratha Shah \(Nasha\) | last post by:
Hey Group, After a long week end I am back again. Its nice and refreshing after a short vacation so lets get started with .NET once again. Today we will discuss about Isolated Storage. This is...
0
by: Robert Love | last post by:
I am trying to save some boolean values from checkboxes using isolated storage. I am able to do strings and integers without a problem but I can't work out how to save boolean values without seeing...
7
by: Stanley S | last post by:
Hi, Are Signal Handling part of ANSI C? I am not able to find any reference of Sig Handling in Stephen Prata's "C Primer Plus". The usage of signals is to trap errors I guess. (It looks...
7
by: Jon Berry | last post by:
I'm executing an external process from my C# app. It uses a typical command line type interface: program.exe inputfile outputfile I want the outputfile stored in IsolatedStorage. Is there...
7
by: yogeshnelwadkar | last post by:
Hello, i have a problem with replacing c++ exception handling with structured exception handling. How to replace the " catch(...) " in c++ exception handling with, __except , a structured...
1
by: C#Coder | last post by:
I need to create an Isolated Storage File for my assembly with machine level scope. I have used the 'GetMachineStoreForAssembly' method to do this and this creates the Isolated Storage File...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: marcoviolo | last post by:
Dear all, I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.