By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,907 Members | 1,832 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,907 IT Pros & Developers. It's quick & easy.

UTF8 & HTMLParser

P: n/a
Hello all,

I'm writing a python script which fetches a HTML-page (using wget),
and then parses the retrieved page using a custom htmllib HTMLParser.

The page I fetch is encoded in utf8, and my text-handler currently
looks like this:

def handle_data(self, text):
if self.inOption:
self.currentName = text

However, I would like to convert the "text" (which is utf8) to
latin-1. How do I do that? I've been trying to figure it out for some
time now, and I'm just getting frustrated. :-(

--
Kind Regards,
Jan Danielsson
Te audire non possum. Musa sapientum fixa est in aure.
Dec 1 '06 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Jan Danielsson wrote:
Hello all,

I'm writing a python script which fetches a HTML-page (using wget),
and then parses the retrieved page using a custom htmllib HTMLParser.

The page I fetch is encoded in utf8, and my text-handler currently
looks like this:

def handle_data(self, text):
if self.inOption:
self.currentName = text

However, I would like to convert the "text" (which is utf8) to
latin-1. How do I do that? I've been trying to figure it out for some
time now, and I'm just getting frustrated. :-(
I should have mentioned: The problem appears to be that I can't seem
to find a way to make python understand that "text" (the above argument)
is in fact already utf-8.

--
Kind Regards,
Jan Danielsson
Te audire non possum. Musa sapientum fixa est in aure.
Dec 1 '06 #2

P: n/a
Jan Danielsson wrote:
However, I would like to convert the "text" (which is utf8)
to latin-1. How do I do that?
How about:

latin = unicode(text, 'utf-8').encode('iso-8859-1')

Please see help(u''.encode) for details about error handling. You
might also want to trap errors in a try-except statement.

Cheers,

--
Klaus Alexander Seistrup
http://klaus.seistrup.dk/
Dec 1 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.