473,769 Members | 4,584 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

XML_RPC and unicode problems

I am currently passing email messages over XML_RPC as the payload for
a certain function call. On some of these messages, XML_RPC blows up
on the server side and says something to the effect of:

exceptions.Unic odeDecodeError: 'utf8' codec can't decode byte 0xa0 in
position 1599: unexpected code byte

Using the native Python codec for doing conversions gives me a similar
error ('utf8' codec can't decode byte 0x93 in position 1328:
unexpected code byte). That gives me the feeling that these specific
messages are just funky. (Looking at the location in the file that
they are choking seems to be random characters)

What I've come to believe is that XML_RPC automatically assumes any
strings it transfers are unicode and thusly tries to do conversions on
these strings. Therefore, is there any way to keep XML_RPC from doing
unicode conversions, or is there some way for me to just pass raw data
over XML_RPC without having to worry about it?
Jul 18 '05 #1
8 3093
Thomas wrote:
I am currently passing email messages over XML_RPC as the payload for
a certain function call. On some of these messages, XML_RPC blows up
on the server side and says something to the effect of:

exceptions.Uni codeDecodeError : 'utf8' codec can't decode byte 0xa0 in
position 1599: unexpected code byte

Using the native Python codec for doing conversions gives me a similar
error ('utf8' codec can't decode byte 0x93 in position 1328:
unexpected code byte). That gives me the feeling that these specific
messages are just funky. (Looking at the location in the file that
they are choking seems to be random characters)

What I've come to believe is that XML_RPC automatically assumes any
strings it transfers are unicode and thusly tries to do conversions on
these strings. Therefore, is there any way to keep XML_RPC from doing
unicode conversions, or is there some way for me to just pass raw data
over XML_RPC without having to worry about it?

http://www.xmlrpc.com/spec

Have a look at the <base64> data type.
Jul 18 '05 #2
Greg Hamilton wrote:
Thomas wrote:
What I've come to believe is that XML_RPC automatically assumes any
strings it transfers are unicode and thusly tries to do conversions on
these strings. Therefore, is there any way to keep XML_RPC from doing
unicode conversions, or is there some way for me to just pass raw data
over XML_RPC without having to worry about it?

http://www.xmlrpc.com/spec

Have a look at the <base64> data type.


Disclaimer: I haven't used XML-RPC yet.

I looked at the above site, and noted this particular text in the
explanatory section below the very lightweight "spec":

"""Q. What characters are allowed in strings? Non-printable characters?
Null characters? Can a "string" be used to hold an arbitrary chunk of
binary data?

A. Any characters are allowed in a string except < and &, which are
encoded as &lt; and &amp;. A string can be used to encode binary
data.
"""

Seems to me that description is inadequate, if one has to revert
to <base64> to pass through a string with an \xa0 in it.

I did a search and found this page from Fredrik Lundh, which seems
to be more clear on the whole thing, clearer even than the updated
spec which simply removed a previous reference to ASCII:
http://effbot.org/zone/xmlrpc-errata.htm

-Peter
Jul 18 '05 #3
phansen wrote:
Seems to me that description is inadequate, if one has to revert
to <base64> to pass through a string with an \xa0 in it.


No. \xa0 just is not a character. In XML, all bytes must denote
characters, and \xa0 does not denote any character when the
encoding is UTF-8.

To transmit binary data, use the base64 element, available through
xmlrpclib.Binar y in Python.

Regards,
Martin
Jul 18 '05 #4
Martin v. Löwis wrote:
phansen wrote:
Seems to me that description is inadequate, if one has to revert
to <base64> to pass through a string with an \xa0 in it.


No. \xa0 just is not a character. In XML, all bytes must denote
characters, and \xa0 does not denote any character when the
encoding is UTF-8.

To transmit binary data, use the base64 element, available through
xmlrpclib.Binar y in Python.


That's what I said, isn't it? Just checking, because you started
your response with "No", as though I had said something incorrect.

-Peter
Jul 18 '05 #5
phansen wrote:
Seems to me that description is inadequate, if one has to revert
to <base64> to pass through a string with an \xa0 in it.

No. \xa0 just is not a character. In XML, all bytes must denote
characters, and \xa0 does not denote any character when the
encoding is UTF-8.

To transmit binary data, use the base64 element, available through
xmlrpclib.Binar y in Python.

That's what I said, isn't it? Just checking, because you started
your response with "No", as though I had said something incorrect.


Ah, I only read the answer of the fragment you quoted, not the
question :-(

Yes, that answer is confusing, as it doesn't really answer it
(although the answer itself is correct). Binary data and XML-RPC
has a long and confusing history.

Regards,
Martin
Jul 18 '05 #6
Martin v. Löwis wrote:
Binary data and XML-RPC
has a long and confusing history.


Why is that? There's <base64> for data that's expected to be binary[*],
and <string> for everything else that's valid under chosen encoding.

[*] obviously, aside from truly binary data, also for anything not valid
in current encoding.
Jul 18 '05 #7
Ivan Voras wrote:
Martin v. Löwis wrote:
Binary data and XML-RPC
has a long and confusing history.

Why is that? There's <base64> for data that's expected to be binary[*],
and <string> for everything else that's valid under chosen encoding.


base64 originally wasn't part of the XML-RPC spec; it was added on
1/21/99. Before, the spec simultaneously claimed that the string
element contains ASCII, that "full XML" is allowed, and that the
string element can carry arbitrary binary data.

These were all mutually contradicting: If you were to put arbitrary
bytes into a string element, it would neither be well-formed XML
(atleast not if you choose us-ascii or utf-8 as the encoding), nor
would the strings be pure ASCII.

Also, if the string can only carry ASCII, how can it be
simultaneously allow for arbitrary XML?

People have asked all these questions, and Dave Winer always
said "read the spec, it says it all", when it really didn't.

I believe that Dave's understanding was the following: With
"ASCII", he didn't really mean "American Standard Code for
Information Interchange". He meant that all bytes in the
document must have ordinals < 127. He was fine with people
putting character references (such as Ü) into string
elements. He clarified that aspect on 6/30/03, by removing
"ASCII" from the description of string.

Wrt. binary data, I think he meant that you could use
base64, uuencode, hex, whatever, in a string element, and
thus represent arbitrary bytes. Of course, this would not
be very interoperable, so he added base64.

Regards,
Martin
Jul 18 '05 #8
Ivan Voras wrote:
Why is that? There's <base64> for data that's expected to be binary[*], and <string> for
everything else that's valid under chosen encoding.


"that's valid in XML", that is. no matter what encoding you use, you
can still use character references to insert other characters.

</F>

Jul 18 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
11888
by: Gerson Kurz | last post by:
AAAAAAAARG I hate the way python handles unicode. Here is a nice problem for y'all to enjoy: say you have a variable thats unicode directory = u"c:\temp" Its unicode not because you want it to, but because its for example read from _winreg which returns unicode. You do an os.listdir(directory). Note that all filenames returned are now unicode. (Change introduced I believe in 2.3).
8
5277
by: Bill Eldridge | last post by:
I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5, etc.) What I'd like is something as simple as: CREATE TABLE junk (junklet VARCHAR(2500) CHARACTER SET UTF8)); import MySQLdb, re,urllib
30
2764
by: aurora | last post by:
I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up in unexpected places and only when a non-ASCII or unicode character first found its way into the system. Below is an example. The program may runs fine at the beginning. But as soon as an unicode character u'b' is introduced, the program boom...
2
2631
by: Neil Schemenauer | last post by:
python-dev@python.org.] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is feasible. It would be helpful if people could test the patched Python with their own applications and report any incompatibilities. PEP: 349
2
2079
by: bobnotbob | last post by:
I have created an application and am trying to call functions from a previously existing dll. I can call some functions fine, but I get a link error an when I try to call any function that takes either an LPCTSTR or wchar_tas a parameter. Here's two functions that cause me problems: void CADAPICALL JuncFunc9(LPCTSTR aaa); void CADAPICALL JuncFunc10( wchar_t * aaa); (CADAPICALL is defined as __declspec(dllexport) or...
5
2085
by: Eric Smith | last post by:
I'm trying to use Python 2.4.3 and pywin32-209 to access a MySQL database on Windows Server 2003 Standard Edition, and not having much luck. It seems like parts of the MySQLdb module are not getting loaded correctly, but no error message is given during the import, even if I give a "-vv" on the command line. I'm trying to do: import MySQLdb db = MySQLdb.connection (db="database", user="user", passwd="password")
0
1096
by: Ted Zeng | last post by:
HI, I run a xml_rpc server like the following:(sample code from internet) server = SimpleXMLRPCServer.SimpleXMLRPCServer(("localhost", 8000)) server.serve_forever() If my client is on the same machine, I use :(also from internet sample code)
19
3338
by: Thomas W | last post by:
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". I stored the string as "fødselsdag" but somewhere in my code it got translated into the mess above and I cannot get the original string back. It cannot be printed in the console or written a plain text-file. I've tried to convert it using
9
1951
by: Gerry | last post by:
I'm using pyExcelerator and xlrd to read and write data from and to two spreadsheets. I created the "read" spreadsheet by importing a text file - and I had no unicode aspirations. When I read a cell, it appears to be unicode u'Q1", say. I can try cleaning it, like this:
0
9587
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9423
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10211
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9993
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9863
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6672
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5447
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3958
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3561
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.