473,378 Members | 1,580 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

unescaping xml escape codes

I'm working with strings that contain xml escape codes, such as '0'
and need a way in python to unescape these back to their ascii
representation, such as '&' but can't seem to find a python method for
this. I tried xml.sax.saxutils.unescape(s), but while it works with
'&', it doesn't work with '0' and other numeric codes. Any
suggestions on how to decode the numeric xml escape codes such as this?
Thanks.

--
To reply to me directly, please remove "_NoSpam_" from my email address
Jul 18 '05 #1
2 7710
On Sun, 10 Aug 2003 10:08:46 -0700, Daniel <dl**************@yahoo.com> wrote:
I'm working with strings that contain xml escape codes, such as '0'
and need a way in python to unescape these back to their ascii
representation, such as '&' but can't seem to find a python method for
this. I tried xml.sax.saxutils.unescape(s), but while it works with
'&amp;', it doesn't work with '0' and other numeric codes. Any
suggestions on how to decode the numeric xml escape codes such as this?
Thanks.

Maybe just a regex sub function would do it for you? Do you just need the decimal
forms like above or also the hex? If your coded entities are � to ÿ or
&x00; to &xff; this might work. Other entities are converted to '?'.

If you want to do this properly, I think you have to parse the html a little and see
what the encoding is, and convert to unicode, and then do the conversions.

Very little tested!!
====< cvthtmlent.py >======================================
import re
rxo =re.compile(r'\&\#(x?[0-9a-fA-F]+);')
def ent2chr(m):
code = m.group(1)
if code.isdigit(): code = int(code)
else: code = int(code[1:], 16)
if code<256: return chr(code)
else: return '?' #XXX unichr(code).encode('utf-16le') ??

def cvthtmlent(s): return rxo.sub(ent2chr, s)

if __name__ == '__main__':
import sys; args = sys.argv[1:]
if args:
arg = args.pop(0)
if arg == '-test':
print cvthtmlent(
'blah [0] blah [ö] blah [&#x31;&#x32;&#x33;] &#x3c9')
else:
if arg == '-': fi = sys.stdin
else: fi = file(arg)
for line in fi:
sys.stdout.write(cvthtmlent(line))
================================================== =========
If you run this in idle, you can see the umlaut, but not the omega, which becomes a '?'

Martin can tell you the real scoop ;-)
from cvthtmlent import cvthtmlent as cvt
print cvt('blah [0] blah [ö] blah [&#x31;&#x32;&#x33;] &#x3c9;')

blah [0] blah [÷] blah [123] ?

Regards,
Bengt Richter
Jul 18 '05 #2
On 11 Aug 2003 00:09:42 GMT, bo**@oz.net (Bengt Richter) wrote:
[...]

Maybe just a regex sub function would do it for you? Do you just need the decimal
forms like above or also the hex? If your coded entities are � to ÿ or
&x00; to &xff; this might work. Other entities are converted to '?'.

That should be &#x00; and &#xff; respectively. I did implement hex entites after all.
Botched reediting this commentary however ;-P

Regards,
Bengt Richter
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: yawnmoth | last post by:
say i have a for loop that would iterate through every character and put a space between every 80th one, in effect forcing word wrap to occur. this can be implemented easily using a regular...
2
by: Felix | last post by:
If I set a breakpoint in visual studio 2000 and viewed a local variable (in the "locals" panel), I would see something like: sql | " SELECT Operator.FirstName, .... now I am using visual...
5
by: Steve Litvack | last post by:
Hello, I have built an XMLDocument object instance and I get the following string when I examine the InnerXml property: <?xml version=\"1.0\"?><ROOT><UserData UserID=\"2282\"><Tag1...
7
by: teachtiro | last post by:
Hi, 'C' says \ is the escape character to be used when characters are to be interpreted in an uncommon sense, e.g. \t usage in printf(), but for printing % through printf(), i have read that %%...
18
by: Steve Litvack | last post by:
Hello, I have built an XMLDocument object instance and I get the following string when I examine the InnerXml property: <?xml version=\"1.0\"?><ROOT><UserData UserID=\"2282\"><Tag1...
2
by: Vance Kessler | last post by:
We are trying write a new ASP.NET page to work with an existing stateless ASP application. The ASP application creates a cookie and of course stores the cookie values as escaped strings (using the...
1
by: marcvill | last post by:
I need to send printer specific escape codes a printer for a POS register. Can anyone tell me how to send these codes to a printer using VB .NET and the Win32 spooler functions? I have looked at the...
5
by: Micha│ Gancarski | last post by:
Hello! How do one unescape strings prepared with pg_escape_string() ? stripslashes() will not work because both these functions are not completely compatible. Thank you all in advance --...
3
by: John Nagle | last post by:
Here's a URL from a link on the home page of a major company. <a href="/adsk/servlet/index?siteID=123112&amp;id=1860142">About Us</a> Yes, that "&amp;" is in the source text of the page. This is, in...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.