473,799 Members | 3,185 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

xmlrpclib and decoding entity references

I'm writing an XMLRPC server, which is receiving a request (from a
non-Python client) that looks like this (formatted for legibility):

<?xml version="1.0"?>
<methodCall>
<methodName>ech o</methodName>
<params>
<param>
<value>
<string>Le Martyre de Saint Andr&#xe9; &lt;BR&gt; avec inscription
&apos;Le Dominiquain.&ap os; et &apos;Le tableau fait par le dominicain,
d&apos;apr&#xe8 ;s son dessein &#xe0;... est &#xe0; Rome, &#xe0;
l&apos;&#xe9;gl ise Saint Andr&#xe9; della Valle&apos; sur le
cadre&lt;BR&gt; craie noire, plume et encre brune, lavis brun
rehauss&#xe9; de blanc sur papier brun&lt;BR&gt; 190 x 228 mm. (7 1/2 x
9 in.)</string>
</value>
</param>
</params>
</methodCall>

But when my "echo" method is invoked, the value of the string is:

Le Martyre de Saint Andr; <BR> avec inscription 'Le Dominiquain.' et
'Le tableau fait par le dominicain, d'apr:s son dessein 2... est 2
Rome, 2 l';glise Saint Andr; della Valle' sur le cadre<BR> craie noire,
plume et encre brune, lavis brun rehauss; de blanc sur papier brun<BR>
190 x 228 mm. (7 1/2 x 9 in.)

Can anyone give me a lead on how to convert the entity references into
something that will make it through to my method call?

Jul 19 '05 #1
4 3067
yep, I'm using SimpleRPCServer , but something is getting messed up
between the receipt of the XML stream and the delivery to my function.
The "normal" entity references (like &lt; and &amp;) are handled OK,
but the character references are not working. For instance,

"Andr&#xe9; " is received by the server, but it's delivered to the
function as "Andr;"

I've figured out how to parse through the string to find all the
character references and convert them back, but that seems to be
causing a ProtocolError.

Hopefully someone can lend me a clue; I really don't want to have to
switch over to SOAP and end up in WSDL hell.

Jul 19 '05 #2
Here is the solution. Incidentally, the client is Cold Fusion.

import re
import logging
import logging.config
import os
import SimpleXMLRPCSer ver

logging.config. fileConfig("log ging.ini")

############### ############### ############### ############### ############
class
LoggingXMLRPCRe questHandler(Si mpleXMLRPCServe r.CGIXMLRPCRequ estHandler):
def __dereference(s elf, request_text):
entityRe = re.compile("((? P<er>&#x)(?P<co de>..)(?P<semi> ;))")
for m in re.finditer(ent ityRe, request_text):
hexref = int(m.group(3), 16)
charref = chr(hexref)
request_text = request_text.re place(m.group(1 ), charref)

return request_text
#-------------------------------------------------------------------
def handle_xmlrpc(s elf, request_text):
logger = logging.getLogg er()
#logger.debug(" *************** *************** ******")
#logger.debug(r equest_text)
try:
#logger.debug("-------------------------------------")
request_text = self.__derefere nce(request_tex t)
#logger.debug(r equest_text)
request_text = request_text.de code("latin-1").encode(' utf-8')
#logger.debug(" *************** *************** ******")
except Exception, e:
logger.error(re quest_text)
logger.error("h ad a problem dereferencing")
logger.error(e)

SimpleXMLRPCSer ver.CGIXMLRPCRe questHandler.ha ndle_xmlrpc(sel f,
request_text)
############### ############### ############### ############### ############
class Foo:
def settings(self):
return os.environ
def echo(self, something):
logger = logging.getLogg er()
logger.debug(so mething)
return something
def greeting(self, name):
return "hello, " + name

# these are used to run as a CGI
handler = LoggingXMLRPCRe questHandler()
handler.registe r_instance(Foo( ))
handler.handle_ request()

Jul 19 '05 #3
On 3 May 2005 08:07:06 -0700, "Chris Curvey" <cc*****@gmail. com> wrote:
I'm writing an XMLRPC server, which is receiving a request (from a
non-Python client) that looks like this (formatted for legibility):

<?xml version="1.0"?>
<methodCall>
<methodName>ec ho</methodName>
<params>
<param>
<value>
<string>Le Martyre de Saint Andr&#xe9; &lt;BR&gt; avec inscription
&apos;Le Dominiquain.&ap os; et &apos;Le tableau fait par le dominicain,
d&apos;apr&#xe 8;s son dessein &#xe0;... est &#xe0; Rome, &#xe0;
l&apos;&#xe9;g lise Saint Andr&#xe9; della Valle&apos; sur le
cadre&lt;BR&gt ; craie noire, plume et encre brune, lavis brun
rehauss&#xe9 ; de blanc sur papier brun&lt;BR&gt; 190 x 228 mm. (7 1/2 x
9 in.)</string>
</value>
</param>
</params>
</methodCall>

But when my "echo" method is invoked, the value of the string is:

Le Martyre de Saint Andr; <BR> avec inscription 'Le Dominiquain.' et
'Le tableau fait par le dominicain, d'apr:s son dessein 2... est 2
Rome, 2 l';glise Saint Andr; della Valle' sur le cadre<BR> craie noire,
plume et encre brune, lavis brun rehauss; de blanc sur papier brun<BR>
190 x 228 mm. (7 1/2 x 9 in.)

Can anyone give me a lead on how to convert the entity references into
something that will make it through to my method call?

I haven't used XMLRPC but superficially this looks like a quoting and/or encoding
problem. IOW, your "request" is XML, and the <string>...</string> part is also XML
which is part of the whole, not encapsulated in e.g. <![CDATA[...stuff...]]>
(which would tell an XML parser to suspend markup interpretation of ...stuff...).

So IWT you would at least need the <string>...</string> content to be converted to
unicode to preserve all the represented characters. It wouldn't surprise me if the
whole request is routinely converted to unicode, and the "value" you are showing
above is a result of converting from unicode to an encoding that can't represent
everything, and maybe just drops conversion errors. What do you
get if you print repr(value)? (assuming value is passed to you echo method)

If it is a unicode string, you will just have to choose an appropriate value.encode('a ppropriate')
from available codecs. If it looks like e.g., a utf-8 encoding of unicode, you could try
value.decode('u tf-8').encode('app ropriate')

I'm just guessing here. But something is interpreting the basic XML, since
&lt;BR&gt; is being converted to <BR>. Seems not unlikely that the rest are
also being converted, and to unicode. You just wouldn't notice a glitch when
unicode <BR> is converted to any usual western text encoding.

OTOH, if the intent (which I doubt) of the non-python client were to pass through
a block of pre-formatted XML as such (possibly for direct pasting into e.g. web page XHTML?)
then a way to avoid escaping every & and < would be to use CDATA to encapsulate it. That
would have to be fixed on that end.

Regards,
Bengt Richter
Jul 19 '05 #4
On 4 May 2005 08:17:07 -0700, "Chris Curvey" <cc*****@gmail. com> wrote:
Here is the solution. Incidentally, the client is Cold Fusion.
I suspect your solution may be not be general, though it would seem to
satisfy your use case. It seems to be true for python's latin-1 that
all the first 256 character codes are acceptable and match unicode 1:1,
even though the windows character map for lucida sans unicode font
with latin-1 codes shows undefined-char boxes for codes 0x7f-0x9f.
sum(chr(i).deco de('latin-1') == unichr(i) for i in xrange(256)) 256 sum(unichr(i).e ncode('latin-1') == chr(i) for i in xrange(256)) 256

Not sure what to make of that. E.g. should unichr(0x7f).en code('latin-1')
really be legal, or is it just expedient to have latin-1 serves as a kind of
compressed utf_16_le? E.g., there's 256 Trues in these:
sum(unichr(i).e ncode('utf_16_l e')[0] == chr(i) for i in xrange(256)) 256 sum(unichr(i).e ncode('utf_16_l e')[1] == '\x00' for i in xrange(256)) 256

Maybe we could have a 'u_as_str' or 'utf_16_le_lsby te' codec for that, so the above would be spelled sum(unichr(i).e ncode('u_as_str ') == chr(i) for i in xrange(256)) # XXX faked, not implemented 256

Utf-8 only goes half way: sum(unichr(i).e ncode('utf-8') == chr(i) for i in xrange(256))

128
<aside>
What do you think, Martin? ;-)
Maybe 'ubyte' or 'u256' would be a user-friendlier codec name? Or 'ustr'?
</aside>
import re
import logging
import logging.config
import os
import SimpleXMLRPCSer ver

logging.config .fileConfig("lo gging.ini")

############## ############### ############### ############### #############
class
LoggingXMLRPCR equestHandler(S impleXMLRPCServ er.CGIXMLRPCReq uestHandler):
def __dereference(s elf, request_text):
entityRe = re.compile("((? P<er>&#x)(?P<co de>..)(?P<semi> ;))") What about entity &#x263a; ? Or the same in decimal: ☺
:) for m in re.finditer(ent ityRe, request_text):
hexref = int(m.group(3), 16)
charref = chr(hexref) unichr(hexref) would handle >= 256, if you used unicode. request_text = request_text.re place(m.group(1 ), charref)

return request_text
#-------------------------------------------------------------------
def handle_xmlrpc(s elf, request_text):
logger = logging.getLogg er()
#logger.debug(" *************** *************** ******")
#logger.debug(r equest_text) ^^^^^^^^^^^^ I would suggest repr(request_te xt) for debugging, unless you
know that your logger is going to do that for you. Otherwise a '%s' format may hide things that you'd like to know.
try:
#logger.debug("-------------------------------------")
request_text = self.__derefere nce(request_tex t)
#logger.debug(r equest_text)
request_text = request_text.de code("latin-1").encode(' utf-8') AFAIK, XML can be encoded with many encodings other than latin-1, so you are essentially
saying here that you know it's latin-1 somehow. Theoretically, your XML could
start with something like <?xml encoding='UTF-8'?> and .decode("latin-1") is only going to
"work" when the source is plain ascii. I wouldn't be surprised if that's what's happening
up to the point where you __dereference, but str.replace doesn't care that you are potentially
making a utf-8 encoding invalid by just replacing 8-bit characters with what is legal latin-1.
after that, you are decoding your utf-8_clobbered_wit h_latin-1 as latin-1 anyway, so it "works".
At least I think this is a consistent theory. See if you can get the client to send something
with characters >128 that aren't represented as &#x..; to see if it's actually sending utf-8.

#logger.debug(" *************** *************** ******")
except Exception, e:
logger.error(re quest_text) again, suggest repr(request_te xt) logger.error("h ad a problem dereferencing")
logger.error(e)

SimpleXMLRPCSer ver.CGIXMLRPCRe questHandler.ha ndle_xmlrpc(sel f,
request_text )
############## ############### ############### ############### #############
class Foo:
def settings(self):
return os.environ
def echo(self, something):
logger = logging.getLogg er()
logger.debug(so mething) repr it, unless you know ;-)
return something
def greeting(self, name):
return "hello, " + name

# these are used to run as a CGI
handler = LoggingXMLRPCRe questHandler()
handler.regist er_instance(Foo ())
handler.handle _request()


Regards,
Bengt Richter
Jul 19 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2512
by: David Madore | last post by:
Hi! Anyone in for a Byzantine discussion on XML well-formedness? Here's the situation: test.xml contains --- test.xml: cut after --- <?xml version="1.0" encoding="us-ascii"?> <!DOCTYPE foobar > <foobar />
1
2877
by: Zandy Marantal | last post by:
Hello everyone, I'm having trouble using Xerces2(2.4, 2.5) when validating against an XML schema if a general entity reference is defined within the XML file. The error I'm getting is this: "org.xml.sax.SAXParseException: Element type "personnel" must be declared." Xerces1(1.4.3, 1.4.4) doesn't have an issue with this. As is XML Spy.
3
2268
by: Jim Higson | last post by:
Does anyone know a technique in javascript to transform from (for example) &hearts; to the char '♥'? I'm doing this because I have to interpret some data I got over XHTMLHTTP that isn't XML, but might contain some XML char entities. Thanks, Jim
5
2328
by: Stephan Hoffmann | last post by:
Hi, I use XML mainly as a source for HTML. HTML browsers 'know' certain entity references like &eacute; or &auml;. When I use XSL to transform XML to HTML or XML, these entities are replaced by what they refer to. Is there a way to avoid that?
9
2189
by: Jukka K. Korpela | last post by:
I noticed that Internet Explorer (6.0, on Win XP SP 2, all fixes installed) incorrectly renders e.g. &harr &euro &Omega literally and not as characters denoted by the entities, but if a semicolon is appended to each of the entity references, they work. I'm pretty sure that previous versions of IE rendered them by the specifications. I first thought this has something to do with XML (i.e. maybe IE pretends to play a little bit of XML...
2
1299
by: jesl | last post by:
Group, I have created a User Control with the property "Html" of type string. If I declare this control on an ASPX page with the value "<b>This is an entity: &lt;</b>" for the property "Html", the ASP.NET parser seems to automatically convert the entity reference "&lt;" to it's corresponding character value "<". For example, if the tagprefix and tagname for the user control is "dn" and "test":
2
1662
by: Paquette.Jim | last post by:
Hello, I'm trying to get XUL output with an element that has an attribute containing an entity reference. Can this be done? I saw another post exactly like this...but the solutions they gave didn't work for me. <contextMenu>
0
1122
by: Debajit Adhikary | last post by:
I'm writing a SAX parser using Python and need to parse XML with entity references. <tag>&lt;&gt;</tag> Only the last entity reference gets parsed. Why are startEntity() and endEntity() never called? I'm using the following code: http://pastie.textmate.org/62610
1
1199
by: TAL651 | last post by:
I'm having trouble displaying entity references (i.e. >, <, etc). I'll show the code first, then ask my question. This code makes sub items on a menu appear or disappear. The HTML isn't giving me issues, so I'm only posting the javascript for now: Javascript code: function expandcollapseBGrows(anchor) { var span = document.getElementById("BGrows"); span.style.display = (span.style.display=='block')?'none':'block'; anchor.innerHTML...
0
9685
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10470
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10247
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10214
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7561
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5583
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4135
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3751
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2935
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.