Dave wrote:
How can I translate this:
gi
to this:
"gi"
the easiest way is to run it through an HTML or XML parser (depending on
what the source is). or you could use something like this:
import re
def fix_charrefs(text):
def fixup(m):
text = m.group(0)
try:
if text[:3] == "&#x":
return unichr(int(text[3:-1], 16))
else:
return unichr(int(text[2:-1]))
except ValueError:
pass
return text # leave as is
return re.sub("&#?\w+;", fixup, text)
>>fix_charrefs("gi")
'gi'
also see:
http://effbot.org/zone/re-sub.htm#strip-html
I've tried urllib.unencode and it doesn't work.
those are HTML/XML character references, not encoded URL characters.
</F>