En Mon, 29 Jan 2007 00:05:24 -0300, Steven D'Aprano
<st***@REMOVEME.cybersource.com.auescribió:
I have a string containing Latin-1 characters:
s = u"© and many more..."
I want to convert it to HTML entities:
result =>
"© and many more..."
Module htmlentitydefs contains the tables you're looking for, but you need
a few transforms:
<code>
# -*- coding: iso-8859-15 -*-
from htmlentitydefs import codepoint2name
unichr2entity = dict((unichr(code), u'&%s;' % name)
for code,name in codepoint2name.iteritems()
if code!=38) # exclude "&"
def htmlescape(text, d=unichr2entity):
if u"&" in text:
text = text.replace(u"&", u"&")
for key, value in d.iteritems():
if key in text:
text = text.replace(key, value)
return text
print '%r' % htmlescape(u'hello')
print '%r' % htmlescape(u'"©® áé&ö <²³>')
</code>
Output:
u'hello'
u'"©® áé&ö <²³>'
The result is an unicode object, with all known entities replaced. It does
not handle missing, unknown entities - as the docs for htmlentitydefs say,
"the definition provided here contains all the entities defined by XHTML
1.0 that can be handled using simple textual substitution in the Latin-1
character set (ISO-8859-1)."
--
Gabriel Genellina