By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,767 Members | 2,114 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,767 IT Pros & Developers. It's quick & easy.

html escape sequences

P: n/a
Hi,

I'd like to replace html escape sequences, like &nbsp and &#39 with
single characters. Is there a dictionary defined somewhere I can use to
replace these sequences?

Thanks,

Will McGugan
Jul 18 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Will McGugan wrote:
I'd like to replace html escape sequences, like &nbsp and &#39 with
single characters. Is there a dictionary defined somewhere I can use to
replace these sequences?


How about this?

import re
from htmlentitydefs import name2codepoint

_entity_re = re.compile(r'&(?:(#)(\d+)|([^;]+));')

def _repl_func(match):
if match.group(1): # Numeric character reference
return unichr(int(match.group(2)))
else:
return unichr(name2codepoint[match.group(3)])

def handle_html_entities(string):
return _entity_re.sub(_repl_func, string)
Jul 18 '05 #2

P: n/a
Leif K-Brooks wrote:
Will McGugan wrote:
I'd like to replace html escape sequences, like &nbsp and &#39 with
single characters. Is there a dictionary defined somewhere I can use
to replace these sequences?

How about this?

import re
from htmlentitydefs import name2codepoint

_entity_re = re.compile(r'&(?:(#)(\d+)|([^;]+));')

def _repl_func(match):
if match.group(1): # Numeric character reference
return unichr(int(match.group(2)))
else:
return unichr(name2codepoint[match.group(3)])

def handle_html_entities(string):
return _entity_re.sub(_repl_func, string)


muchas gracias!

Will McGugan
Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.