472,371 Members | 1,375 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,371 software developers and data experts.

unescaping xml escape codes

I'm working with strings that contain xml escape codes, such as '0'
and need a way in python to unescape these back to their ascii
representation, such as '&' but can't seem to find a python method for
this. I tried xml.sax.saxutils.unescape(s), but while it works with
'&', it doesn't work with '0' and other numeric codes. Any
suggestions on how to decode the numeric xml escape codes such as this?
Thanks.

--
To reply to me directly, please remove "_NoSpam_" from my email address
Jul 18 '05 #1
2 7650
On Sun, 10 Aug 2003 10:08:46 -0700, Daniel <dl**************@yahoo.com> wrote:
I'm working with strings that contain xml escape codes, such as '0'
and need a way in python to unescape these back to their ascii
representation, such as '&' but can't seem to find a python method for
this. I tried xml.sax.saxutils.unescape(s), but while it works with
'&amp;', it doesn't work with '0' and other numeric codes. Any
suggestions on how to decode the numeric xml escape codes such as this?
Thanks.

Maybe just a regex sub function would do it for you? Do you just need the decimal
forms like above or also the hex? If your coded entities are � to ÿ or
&x00; to &xff; this might work. Other entities are converted to '?'.

If you want to do this properly, I think you have to parse the html a little and see
what the encoding is, and convert to unicode, and then do the conversions.

Very little tested!!
====< cvthtmlent.py >======================================
import re
rxo =re.compile(r'\&\#(x?[0-9a-fA-F]+);')
def ent2chr(m):
code = m.group(1)
if code.isdigit(): code = int(code)
else: code = int(code[1:], 16)
if code<256: return chr(code)
else: return '?' #XXX unichr(code).encode('utf-16le') ??

def cvthtmlent(s): return rxo.sub(ent2chr, s)

if __name__ == '__main__':
import sys; args = sys.argv[1:]
if args:
arg = args.pop(0)
if arg == '-test':
print cvthtmlent(
'blah [0] blah [ö] blah [&#x31;&#x32;&#x33;] &#x3c9')
else:
if arg == '-': fi = sys.stdin
else: fi = file(arg)
for line in fi:
sys.stdout.write(cvthtmlent(line))
================================================== =========
If you run this in idle, you can see the umlaut, but not the omega, which becomes a '?'

Martin can tell you the real scoop ;-)
from cvthtmlent import cvthtmlent as cvt
print cvt('blah [0] blah [ö] blah [&#x31;&#x32;&#x33;] &#x3c9;')

blah [0] blah [ö] blah [123] ?

Regards,
Bengt Richter
Jul 18 '05 #2
On 11 Aug 2003 00:09:42 GMT, bo**@oz.net (Bengt Richter) wrote:
[...]

Maybe just a regex sub function would do it for you? Do you just need the decimal
forms like above or also the hex? If your coded entities are � to ÿ or
&x00; to &xff; this might work. Other entities are converted to '?'.

That should be &#x00; and &#xff; respectively. I did implement hex entites after all.
Botched reediting this commentary however ;-P

Regards,
Bengt Richter
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: yawnmoth | last post by:
say i have a for loop that would iterate through every character and put a space between every 80th one, in effect forcing word wrap to occur. this can be implemented easily using a regular...
2
by: Felix | last post by:
If I set a breakpoint in visual studio 2000 and viewed a local variable (in the "locals" panel), I would see something like: sql | " SELECT Operator.FirstName, .... now I am using visual...
5
by: Steve Litvack | last post by:
Hello, I have built an XMLDocument object instance and I get the following string when I examine the InnerXml property: <?xml version=\"1.0\"?><ROOT><UserData UserID=\"2282\"><Tag1...
7
by: teachtiro | last post by:
Hi, 'C' says \ is the escape character to be used when characters are to be interpreted in an uncommon sense, e.g. \t usage in printf(), but for printing % through printf(), i have read that %%...
18
by: Steve Litvack | last post by:
Hello, I have built an XMLDocument object instance and I get the following string when I examine the InnerXml property: <?xml version=\"1.0\"?><ROOT><UserData UserID=\"2282\"><Tag1...
2
by: Vance Kessler | last post by:
We are trying write a new ASP.NET page to work with an existing stateless ASP application. The ASP application creates a cookie and of course stores the cookie values as escaped strings (using the...
1
by: marcvill | last post by:
I need to send printer specific escape codes a printer for a POS register. Can anyone tell me how to send these codes to a printer using VB .NET and the Win32 spooler functions? I have looked at the...
5
by: Micha³ Gancarski | last post by:
Hello! How do one unescape strings prepared with pg_escape_string() ? stripslashes() will not work because both these functions are not completely compatible. Thank you all in advance --...
3
by: John Nagle | last post by:
Here's a URL from a link on the home page of a major company. <a href="/adsk/servlet/index?siteID=123112&amp;id=1860142">About Us</a> Yes, that "&amp;" is in the source text of the page. This is, in...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
2
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...
0
by: jack2019x | last post by:
hello, Is there code or static lib for hook swapchain present? I wanna hook dxgi swapchain present for dx11 and dx9.
0
DizelArs
by: DizelArs | last post by:
Hi all) Faced with a problem, element.click() event doesn't work in Safari browser. Tried various tricks like emulating touch event through a function: let clickEvent = new Event('click', {...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.