473,657 Members | 2,727 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Handling some isolated iso-8859-1 characters

I'm working on an app that's processing Usenet messages. I'm making a
connection to my NNTP feed and grabbing the headers for the groups I'm
interested in, saving the info to disk, and doing some post-processing.
I'm finding a few bizarre characters and I'm not sure how to handle them
pythonically.

One of the lines I'm finding this problem with contains:
137050 Cleo and I have an anouncement! "Mlle. =?iso-8859-1?Q?Ana=EFs?="
<no*@aol.com Sun, 21 Nov 2004 16:21:50 -0500
<lm************ *************** @40tude.net 4478 69 Xref:
sn-us rec.pets.cats.c ommunity:137050

The interesting patch is the string that reads "=?iso-8859-1?Q?Ana=EFs?=".
An HTML rendering of what this string should look would be "Ana&iuml;s ".

What I'm doing now is a brute-force substitution from the version in the
file to the HTML version. That's ugly. What's a better way to translate
that string? Or is my problem that I'm grabbing the headers from the NNTP
server incorrectly?

Jun 27 '08 #1
5 1130
On Jun 4, 2:38 am, Daniel Mahoney <d...@catfolks. netwrote:
I'm working on an app that's processing Usenet messages. I'm making a
connection to my NNTP feed and grabbing the headers for the groups I'm
interested in, saving the info to disk, and doing some post-processing.
I'm finding a few bizarre characters and I'm not sure how to handle them
pythonically.

One of the lines I'm finding this problem with contains:
137050 Cleo and I have an anouncement! "Mlle. =?iso-8859-1?Q?Ana=EFs?="
<n...@aol.com Sun, 21 Nov 2004 16:21:50 -0500
<lmzdkqmqt2fj.5 4wmpv3zmvvx.... @40tude.net 4478 69 Xref:
sn-us rec.pets.cats.c ommunity:137050

The interesting patch is the string that reads "=?iso-8859-1?Q?Ana=EFs?=".
An HTML rendering of what this string should look would be "Ana&iuml;s ".

What I'm doing now is a brute-force substitution from the version in the
file to the HTML version. That's ugly. What's a better way to translate
that string? Or is my problem that I'm grabbing the headers from the NNTP
server incorrectly?
>>from email.Header import decode_header
decode_header ("=?iso-8859-1?Q?Ana=EFs?=")
[('Ana\xefs', 'iso-8859-1')]
>>(s, e), = decode_header(" =?iso-8859-1?Q?Ana=EFs?=")
s
'Ana\xefs'
>>e
'iso-8859-1'
>>s.decode(e)
u'Ana\xefs'
>>import unicodedata
import htmlentitydefs
for c in s.decode(e):
.... print ord(c), unicodedata.nam e(c)
....
65 LATIN CAPITAL LETTER A
110 LATIN SMALL LETTER N
97 LATIN SMALL LETTER A
239 LATIN SMALL LETTER I WITH DIAERESIS
115 LATIN SMALL LETTER S
>>htmlentitydef s.codepoint2nam e[239]
'iuml'
>>>
Jun 27 '08 #2
En Tue, 03 Jun 2008 15:38:09 -0300, Daniel Mahoney <da*@catfolks.n et>
escribió:
I'm working on an app that's processing Usenet messages. I'm making a
connection to my NNTP feed and grabbing the headers for the groups I'm
interested in, saving the info to disk, and doing some post-processing.
I'm finding a few bizarre characters and I'm not sure how to handle them
pythonically.

One of the lines I'm finding this problem with contains:
137050 Cleo and I have an anouncement! "Mlle.
=?iso-8859-1?Q?Ana=EFs?="
<no*@aol.com Sun, 21 Nov 2004 16:21:50 -0500
<lm************ *************** @40tude.net 4478 69 Xref:
sn-us rec.pets.cats.c ommunity:137050

The interesting patch is the string that reads
"=?iso-8859-1?Q?Ana=EFs?=".
An HTML rendering of what this string should look would be "Ana&iuml;s ".

What I'm doing now is a brute-force substitution from the version in the
file to the HTML version. That's ugly. What's a better way to translate
that string? Or is my problem that I'm grabbing the headers from the NNTP
server incorrectly?
No, it's not you, those headers are formatted following RFC 2047
<http://www.faqs.org/ftp/rfc/rfc2047.txt>
Python already has support for that format, use the email.header class,
see <http://docs.python.org/lib/module-email.header.ht ml>

--
Gabriel Genellina

Jun 27 '08 #3
No, it's not you, those headers are formatted following RFC 2047
<http://www.faqs.org/ftp/rfc/rfc2047.txt>
Python already has support for that format, use the email.header class,
see <http://docs.python.org/lib/module-email.header.ht ml>
Excellent, that's exactly what I was looking for. Thanks!

Jun 27 '08 #4
... print ord(c), unicodedata.nam e(c)
...
65 LATIN CAPITAL LETTER A
110 LATIN SMALL LETTER N
97 LATIN SMALL LETTER A
239 LATIN SMALL LETTER I WITH DIAERESIS
115 LATIN SMALL LETTER S
Looks like I need to explore the unicodedata class. Thanks!
Jun 27 '08 #5
Daniel Mahoney skrev:
The interesting patch is the string that reads "=?iso-8859-1?Q?Ana=EFs?=".
An HTML rendering of what this string should look would be "Ana&iuml;s ".

There is a mention of email headers and unicode in the end of this article:

http://mxm-mad-science.blogspot.com/...school-of.html

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science

Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2453
by: Jody Gelowitz | last post by:
Exactly what is the size limit of Isolated Storage in the Internet permission set? I have a document indicating it is 10MB (http://msdn.microsoft.com/msdnmag/issues/02/07/NetSmartClients/) There is another document indicating it is 10240 in size (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/ht ml/cpcondefaultsecuritypolicy.asp)
5
2563
by: juergen perlinger | last post by:
Hello out there. sometimes I need to have proper control of the floating point arithmetic of the C(and C++) runtime system, and using the f.p. exception handling of the C99 standard is quite handy for that purpose. The only problem when dealing with f.p. exception signals is that there is (afaik) no specification *when* the f.p. exception is raised, with one notable exception: 'feraiseexcept(int)' raises the exceptions passed in the...
3
2275
by: gani | last post by:
hi, how to get the fullpath of created IsolatedStorage directory. thanks. -- gani
1
1904
by: malcolm | last post by:
Hello, We use several user controls and derived custom controls. Some of which actually hit the database at design time to show data (such as filling a list box, etc...) Our c# client server app uses the .NET Isolated storage libraries for storing connection string and other info about the application. The problem is that the Isolated storage bombs at design time (when you try and view a control that hits the database at design time).
0
1871
by: Namratha Shah \(Nasha\) | last post by:
Hey Group, After a long week end I am back again. Its nice and refreshing after a short vacation so lets get started with .NET once again. Today we will discuss about Isolated Storage. This is one of the topics which I find interesting as I feel that it has a lot of practical usage or applicability. We all know that all applications need some storage space to archive certain
0
1269
by: Robert Love | last post by:
I am trying to save some boolean values from checkboxes using isolated storage. I am able to do strings and integers without a problem but I can't work out how to save boolean values without seeing the error message below: 'Additional information: Argument 'Prompt' cannot be converted to type 'String' The error message is displayed at this line of code in the form load event.
7
2208
by: Stanley S | last post by:
Hi, Are Signal Handling part of ANSI C? I am not able to find any reference of Sig Handling in Stephen Prata's "C Primer Plus". The usage of signals is to trap errors I guess. (It looks similiar to the concept of try-catch to me.) It seems to relate more to nix OS. Are signals handling part of Windows too?
7
6235
by: Jon Berry | last post by:
I'm executing an external process from my C# app. It uses a typical command line type interface: program.exe inputfile outputfile I want the outputfile stored in IsolatedStorage. Is there anyway to do this without re-reading the file and re-writing it?
7
3318
by: yogeshnelwadkar | last post by:
Hello, i have a problem with replacing c++ exception handling with structured exception handling. How to replace the " catch(...) " in c++ exception handling with, __except , a structured exception handling. If i write "#define catch(...) ----" ; then it doesn't take ' ... ' as a string ; so, prompts an error.. error C2010: '.' : unexpected in macro formal parameter list what to be written in place of ' --- ' ; to be convert it into an...
1
3206
by: C#Coder | last post by:
I need to create an Isolated Storage File for my assembly with machine level scope. I have used the 'GetMachineStoreForAssembly' method to do this and this creates the Isolated Storage File successfully but when I log onto another profile and run the code, it creates another Isolated Storage File (in C:\Documents and Settings\All Users\Application Data\IsolatedStorage\) rather than writing/reading to/from the existing file. It would seem...
0
8394
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
1
8503
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7327
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6164
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4152
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4304
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2726
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1955
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1615
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.