xmlrpclib and decoding entity references

Chris Curvey

I'm writing an XMLRPC server, which is receiving a request (from a
non-Python client) that looks like this (formatted for legibility):

<?xml version="1.0"?>
<methodCall>
<methodName>echo</methodName>
<params>
<param>
<value>
<string>Le Martyre de Saint André avec inscription
'Le Dominiquain.' et 'Le tableau fait par le dominicain,
d'après son dessein à... est à Rome, à
l'église Saint André della Valle' sur le
cadre craie noire, plume et encre brune, lavis brun
rehaussé de blanc sur papier brun 190 x 228 mm. (7 1/2 x
9 in.)</string>
</value>
</param>
</params>
</methodCall>

But when my "echo" method is invoked, the value of the string is:

Le Martyre de Saint Andr; avec inscription 'Le Dominiquain.' et
'Le tableau fait par le dominicain, d'apr:s son dessein 2... est 2
Rome, 2 l';glise Saint Andr; della Valle' sur le cadre craie noire,
plume et encre brune, lavis brun rehauss; de blanc sur papier brun 
190 x 228 mm. (7 1/2 x 9 in.)

Can anyone give me a lead on how to convert the entity references into
something that will make it through to my method call?

Jul 19 '05 #1

Subscribe Post Reply

3045

Chris Curvey

yep, I'm using SimpleRPCServer, but something is getting messed up
between the receipt of the XML stream and the delivery to my function.
The "normal" entity references (like < and &) are handled OK,
but the character references are not working. For instance,

"André" is received by the server, but it's delivered to the
function as "Andr;"

I've figured out how to parse through the string to find all the
character references and convert them back, but that seems to be
causing a ProtocolError.

Hopefully someone can lend me a clue; I really don't want to have to
switch over to SOAP and end up in WSDL hell.

Jul 19 '05 #2

Chris Curvey

Here is the solution. Incidentally, the client is Cold Fusion.

import re
import logging
import logging.config
import os
import SimpleXMLRPCServer

logging.config.fileConfig("logging.ini")

################################################## ######################
class
LoggingXMLRPCRequestHandler(SimpleXMLRPCServer.CGI XMLRPCRequestHandler):
def __dereference(self, request_text):
entityRe = re.compile("((?P<er>&#x)(?P<code>..)(?P<semi>;))")
for m in re.finditer(entityRe, request_text):
hexref = int(m.group(3),16)
charref = chr(hexref)
request_text = request_text.replace(m.group(1), charref)

return request_text
#-------------------------------------------------------------------
def handle_xmlrpc(self, request_text):
logger = logging.getLogger()
#logger.debug("*********************************** *")
#logger.debug(request_text)
try:
#logger.debug("-------------------------------------")
request_text = self.__dereference(request_text)
#logger.debug(request_text)
request_text = request_text.decode("latin-1").encode('utf-8')
#logger.debug("*********************************** *")
except Exception, e:
logger.error(request_text)
logger.error("had a problem dereferencing")
logger.error(e)

SimpleXMLRPCServer.CGIXMLRPCRequestHandler.handle_ xmlrpc(self,
request_text)
################################################## ######################
class Foo:
def settings(self):
return os.environ
def echo(self, something):
logger = logging.getLogger()
logger.debug(something)
return something
def greeting(self, name):
return "hello, " + name

# these are used to run as a CGI
handler = LoggingXMLRPCRequestHandler()
handler.register_instance(Foo())
handler.handle_request()

Jul 19 '05 #3

Bengt Richter

On 3 May 2005 08:07:06 -0700, "Chris Curvey" <cc*****@gmail.com> wrote:

I'm writing an XMLRPC server, which is receiving a request (from a
non-Python client) that looks like this (formatted for legibility):

<?xml version="1.0"?>
<methodCall>
<methodName>echo</methodName>
<params>
<param>
<value>
<string>Le Martyre de Saint André avec inscription
'Le Dominiquain.' et 'Le tableau fait par le dominicain,
d'après son dessein à... est à Rome, à
l'église Saint André della Valle' sur le
cadre craie noire, plume et encre brune, lavis brun
rehaussé de blanc sur papier brun 190 x 228 mm. (7 1/2 x
9 in.)</string>
</value>
</param>
</params>
</methodCall>

But when my "echo" method is invoked, the value of the string is:

Le Martyre de Saint Andr; avec inscription 'Le Dominiquain.' et
'Le tableau fait par le dominicain, d'apr:s son dessein 2... est 2
Rome, 2 l';glise Saint Andr; della Valle' sur le cadre craie noire,
plume et encre brune, lavis brun rehauss; de blanc sur papier brun 
190 x 228 mm. (7 1/2 x 9 in.)

Can anyone give me a lead on how to convert the entity references into
something that will make it through to my method call?

I haven't used XMLRPC but superficially this looks like a quoting and/or encoding
problem. IOW, your "request" is XML, and the <string>...</string> part is also XML
which is part of the whole, not encapsulated in e.g. <![CDATA[...stuff...]]>
(which would tell an XML parser to suspend markup interpretation of ...stuff...).

So IWT you would at least need the <string>...</string> content to be converted to
unicode to preserve all the represented characters. It wouldn't surprise me if the
whole request is routinely converted to unicode, and the "value" you are showing
above is a result of converting from unicode to an encoding that can't represent
everything, and maybe just drops conversion errors. What do you
get if you print repr(value)? (assuming value is passed to you echo method)

If it is a unicode string, you will just have to choose an appropriate value.encode('appropriate')
from available codecs. If it looks like e.g., a utf-8 encoding of unicode, you could try
value.decode('utf-8').encode('appropriate')

I'm just guessing here. But something is interpreting the basic XML, since
 is being converted to . Seems not unlikely that the rest are
also being converted, and to unicode. You just wouldn't notice a glitch when
unicode is converted to any usual western text encoding.

OTOH, if the intent (which I doubt) of the non-python client were to pass through
a block of pre-formatted XML as such (possibly for direct pasting into e.g. web page XHTML?)
then a way to avoid escaping every & and < would be to use CDATA to encapsulate it. That
would have to be fixed on that end.

Regards,
Bengt Richter

Jul 19 '05 #4

Bengt Richter

On 4 May 2005 08:17:07 -0700, "Chris Curvey" <cc*****@gmail.com> wrote:

Here is the solution. Incidentally, the client is Cold Fusion.
I suspect your solution may be not be general, though it would seem to
satisfy your use case. It seems to be true for python's latin-1 that
all the first 256 character codes are acceptable and match unicode 1:1,
even though the windows character map for lucida sans unicode font
with latin-1 codes shows undefined-char boxes for codes 0x7f-0x9f.

sum(chr(i).decode('latin-1') == unichr(i) for i in xrange(256)) 256 sum(unichr(i).encode('latin-1') == chr(i) for i in xrange(256)) 256

Not sure what to make of that. E.g. should unichr(0x7f).encode('latin-1')
really be legal, or is it just expedient to have latin-1 serves as a kind of
compressed utf_16_le? E.g., there's 256 Trues in these:
sum(unichr(i).encode('utf_16_le')[0] == chr(i) for i in xrange(256)) 256 sum(unichr(i).encode('utf_16_le')[1] == '\x00' for i in xrange(256)) 256

Maybe we could have a 'u_as_str' or 'utf_16_le_lsbyte' codec for that, so the above would be spelled sum(unichr(i).encode('u_as_str') == chr(i) for i in xrange(256)) # XXX faked, not implemented 256

Utf-8 only goes half way: sum(unichr(i).encode('utf-8') == chr(i) for i in xrange(256))

128
<aside>
What do you think, Martin? ;-)
Maybe 'ubyte' or 'u256' would be a user-friendlier codec name? Or 'ustr'?
</aside>
import re
import logging
import logging.config
import os
import SimpleXMLRPCServer

logging.config.fileConfig("logging.ini")

################################################# #######################
class
LoggingXMLRPCRequestHandler(SimpleXMLRPCServer.CG IXMLRPCRequestHandler):
def __dereference(self, request_text):
entityRe = re.compile("((?P<er>&#x)(?P<code>..)(?P<semi>;))") What about entity ☺ ? Or the same in decimal: ☺
:) for m in re.finditer(entityRe, request_text):
hexref = int(m.group(3),16)
charref = chr(hexref) unichr(hexref) would handle >= 256, if you used unicode. request_text = request_text.replace(m.group(1), charref)

return request_text
#-------------------------------------------------------------------
def handle_xmlrpc(self, request_text):
logger = logging.getLogger()
#logger.debug("*********************************** *")
#logger.debug(request_text) ^^^^^^^^^^^^ I would suggest repr(request_text) for debugging, unless you
know that your logger is going to do that for you. Otherwise a '%s' format may hide things that you'd like to know.
try:
#logger.debug("-------------------------------------")
request_text = self.__dereference(request_text)
#logger.debug(request_text)
request_text = request_text.decode("latin-1").encode('utf-8') AFAIK, XML can be encoded with many encodings other than latin-1, so you are essentially
saying here that you know it's latin-1 somehow. Theoretically, your XML could
start with something like <?xml encoding='UTF-8'?> and .decode("latin-1") is only going to
"work" when the source is plain ascii. I wouldn't be surprised if that's what's happening
up to the point where you __dereference, but str.replace doesn't care that you are potentially
making a utf-8 encoding invalid by just replacing 8-bit characters with what is legal latin-1.
after that, you are decoding your utf-8_clobbered_with_latin-1 as latin-1 anyway, so it "works".
At least I think this is a consistent theory. See if you can get the client to send something
with characters >128 that aren't represented as &#x..; to see if it's actually sending utf-8.

#logger.debug("*********************************** *")
except Exception, e:
logger.error(request_text) again, suggest repr(request_text) logger.error("had a problem dereferencing")
logger.error(e)

SimpleXMLRPCServer.CGIXMLRPCRequestHandler.handle_ xmlrpc(self,
request_text)
################################################# #######################
class Foo:
def settings(self):
return os.environ
def echo(self, something):
logger = logging.getLogger()
logger.debug(something) repr it, unless you know ;-)
return something
def greeting(self, name):
return "hello, " + name

# these are used to run as a CGI
handler = LoggingXMLRPCRequestHandler()
handler.register_instance(Foo())
handler.handle_request()

Regards,
Bengt Richter

Jul 19 '05 #5

by: David Madore | last post by:

Hi! Anyone in for a Byzantine discussion on XML well-formedness? Here's the situation: test.xml contains --- test.xml: cut after --- <?xml version="1.0" encoding="us-ascii"?> <!DOCTYPE...

.NET Framework

General entity references and Schema validation problem (using Xerces2)

by: Zandy Marantal | last post by:

Hello everyone, I'm having trouble using Xerces2(2.4, 2.5) when validating against an XML schema if a general entity reference is defined within the XML file. The error I'm getting is this:...

.NET Framework

expanding character entity references in javascript

by: Jim Higson | last post by:

Does anyone know a technique in javascript to transform from (for example) &hearts; to the char 'â™¥'? I'm doing this because I have to interpret some data I got over XHTMLHTTP that isn't XML,...

Javascript

Is it possible NOT to replace entity references?

by: Stephan Hoffmann | last post by:

Hi, I use XML mainly as a source for HTML. HTML browsers 'know' certain entity references like é or ä. When I use XSL to transform XML to HTML or XML, these entities are replaced...

.NET Framework

When did IE stop recognizing entity references without ";"?

by: Jukka K. Korpela | last post by:

I noticed that Internet Explorer (6.0, on Win XP SP 2, all fixes installed) incorrectly renders e.g. &harr &euro &Omega literally and not as characters denoted by the entities, but if a semicolon...

HTML / CSS

Control properties with entity references declared in ASPX get converted to character values

by: jesl | last post by:

Group, I have created a User Control with the property "Html" of type string. If I declare this control on an ASPX page with the value "This is an entity: <" for the property "Html", the...

ASP.NET

Outputting entity references in XUL attributes with XSL

by: Paquette.Jim | last post by:

Hello, I'm trying to get XUL output with an element that has an attribute containing an entity reference. Can this be done? I saw another post exactly like this...but the solutions they gave...

.NET Framework

Cannot parse simple entity references using xml.sax

by: Debajit Adhikary | last post by:

I'm writing a SAX parser using Python and need to parse XML with entity references. <tag><></tag> Only the last entity reference gets parsed. Why are startEntity() and endEntity() never...

Python

Issues with HTML entity references and Javascript?

by: TAL651 | last post by:

I'm having trouble displaying entity references (i.e. >, <, etc). I'll show the code first, then ask my question. This code makes sub items on a menu appear or disappear. The HTML isn't giving me...

Javascript

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

xmlrpclib and decoding entity references

Similar topics