By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,621 Members | 1,074 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,621 IT Pros & Developers. It's quick & easy.

Unicode to HTML entities

P: n/a
I was looking for a function to transform a unicode string into
htmlentities. Not only the usual html escaping thing but all
characters.

As I didn't find I wrote my own:

# -*- coding: utf-8 -*-
from htmlentitydefs import codepoint2name

def unicode2htmlentities(u):

htmlentities = list()

for c in u:
if ord(c) < 128:
htmlentities.append(c)
else:
htmlentities.append('&%s;' % codepoint2name[ord(c)])

return ''.join(htmlentities)

print unicode2htmlentities(u'São Paulo')

Is there a function like that in one of python builtin modules? If not
is there a better way to do it?

Regards, Clodoaldo Pinto Neto

May 29 '07 #1
Share this Question
Share on Google+
6 Replies


P: n/a

"Clodoaldo" <cl*************@gmail.comwrote in message
news:11*********************@n15g2000prd.googlegro ups.com...
>I was looking for a function to transform a unicode string into
htmlentities.
>>u'São Paulo'.encode('ascii', 'xmlcharrefreplace')
'São Paulo'
May 29 '07 #2

P: n/a
On May 29, 12:57 pm, "Richard Brodie" <R.Bro...@rl.ac.ukwrote:
"Clodoaldo" <clodoaldo.pi...@gmail.comwrote in message

news:11*********************@n15g2000prd.googlegro ups.com...
I was looking for a function to transform a unicode string into
htmlentities.
>u'São Paulo'.encode('ascii', 'xmlcharrefreplace')

'São Paulo'
That was a fast answer. I would never find that myself.

Thanks, Clodoaldo

May 29 '07 #3

P: n/a
Clodoaldo <cl*************@gmail.comwrote:
On May 29, 12:57 pm, "Richard Brodie" <R.Bro...@rl.ac.ukwrote:
>"Clodoaldo" <clodoaldo.pi...@gmail.comwrote in message

news:11*********************@n15g2000prd.googlegr oups.com...
>I was looking for a function to transform a unicode string into
htmlentities.
u'São Paulo'.encode('ascii', 'xmlcharrefreplace')

'São Paulo'

That was a fast answer. I would never find that myself.
You might actually want:
>>cgi.escape(u'São Paulo & Espírito Santo').encode('ascii', 'xmlcharrefreplace')
'São Paulo &amp; Espírito Santo'

as you have to be sure to escape any ampersands in your unicode
string before doing the encode.

May 30 '07 #4

P: n/a

On 29 maj 2007, at 17.52, Clodoaldo wrote:
I was looking for a function to transform a unicode string into
htmlentities. Not only the usual html escaping thing but all
characters.

As I didn't find I wrote my own:

# -*- coding: utf-8 -*-
from htmlentitydefs import codepoint2name

def unicode2htmlentities(u):

htmlentities = list()

for c in u:
if ord(c) < 128:
htmlentities.append(c)
else:
htmlentities.append('&%s;' % codepoint2name[ord(c)])

return ''.join(htmlentities)

print unicode2htmlentities(u'São Paulo')

Is there a function like that in one of python builtin modules? If not
is there a better way to do it?

Regards, Clodoaldo Pinto Neto
In many cases, the need to use html/xhtml entities can be avoided by
generating
utf8- coded pages.
------------------------------------------------------
"Home is not where you are born, but where your heart finds peace" -
Tommy Nordgren, "The dying old crone"
to************@comhem.se
May 30 '07 #5

P: n/a
On May 30, 8:53 am, Tommy Nordgren <tommy.nordg...@comhem.sewrote:
On 29 maj 2007, at 17.52, Clodoaldo wrote:
I was looking for a function to transform a unicode string into
htmlentities. Not only the usual html escaping thing but all
characters.
As I didn't find I wrote my own:
# -*- coding: utf-8 -*-
from htmlentitydefs import codepoint2name
def unicode2htmlentities(u):
htmlentities = list()
for c in u:
if ord(c) < 128:
htmlentities.append(c)
else:
htmlentities.append('&%s;' % codepoint2name[ord(c)])
return ''.join(htmlentities)
print unicode2htmlentities(u'São Paulo')
Is there a function like that in one of python builtin modules? If not
is there a better way to do it?
Regards, Clodoaldo Pinto Neto

In many cases, the need to use html/xhtml entities can be avoidedby
generating
utf8- coded pages.
Sure. All my pages are utf-8 encoded. The case I'm dealing with is an
email link which subject has non ascii characters like in:

<a href=mailto:ex*****@sample.com?subject=Dúvidas>Mai l to</a>

Somehow when the user clicks on the link the subject goes to his email
client with the non ascii chars as garbage.

And before someone points that I should not expose email addresses,
the email is only linked with the consent of the owner and the source
is obfuscated to make it harder for a robot to harvest it.

Regards, Clodoaldo

May 30 '07 #6

P: n/a
On May 30, 4:25 am, Duncan Booth <duncan.bo...@invalid.invalidwrote:
Clodoaldo <clodoaldo.pi...@gmail.comwrote:
On May 29, 12:57 pm, "Richard Brodie" <R.Bro...@rl.ac.ukwrote:
"Clodoaldo" <clodoaldo.pi...@gmail.comwrote in message
>news:11*********************@n15g2000prd.googlegr oups.com...
I was looking for a function to transform a unicode string into
htmlentities.
u'São Paulo'.encode('ascii', 'xmlcharrefreplace')
'São Paulo'
That was a fast answer. I would never find that myself.

You might actually want:
>cgi.escape(u'São Paulo & Espírito Santo').encode('ascii', 'xmlcharrefreplace')

'São Paulo &amp; Espírito Santo'

as you have to be sure to escape any ampersands in your unicode
string before doing the encode.
I will do it. Thanks.

Regards, Clodoaldo.

May 30 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.