On May 30, 8:53 am, Tommy Nordgren <tommy.nordg...@comhem.sewrote:
On 29 maj 2007, at 17.52, Clodoaldo wrote:
I was looking for a function to transform a unicode string into
htmlentities. Not only the usual html escaping thing but all
characters.
As I didn't find I wrote my own:
# -*- coding: utf-8 -*-
from htmlentitydefs import codepoint2name
def unicode2htmlentities(u):
htmlentities = list()
for c in u:
if ord(c) < 128:
htmlentities.append(c)
else:
htmlentities.append('&%s;' % codepoint2name[ord(c)])
return ''.join(htmlentities)
print unicode2htmlentities(u'São Paulo')
Is there a function like that in one of python builtin modules? If not
is there a better way to do it?
Regards, Clodoaldo Pinto Neto
In many cases, the need to use html/xhtml entities can be avoidedby
generating
utf8- coded pages.
Sure. All my pages are utf-8 encoded. The case I'm dealing with is an
email link which subject has non ascii characters like in:
<a href=mailto:ex*****@sample.com?subject=Dúvidas>Mai l to</a>
Somehow when the user clicks on the link the subject goes to his email
client with the non ascii chars as garbage.
And before someone points that I should not expose email addresses,
the email is only linked with the consent of the owner and the source
is obfuscated to make it harder for a robot to harvest it.
Regards, Clodoaldo