469,602 Members | 1,838 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,602 developers. It's quick & easy.

xml processing and sys.setdefaultencoding

hi,
i wrote a small application which extracts a javadoc similar documentation
for xslt stylesheets using python, xslt and pyana.
using non-ascii characters was a problem. so i set the defaultending to
UTF-8 and now everything works (at least it seems so, need to do more
testing though).

it may not be the most elegant solution (according to python in a nutshell)
but it almost seems when doing xml processing it is mandatory to set the
default encoding. xml processing should almost only work with unicode
strings and this seems the easiest solution.

any comments on this? better ways to work

thanks
chris
Jul 18 '05 #1
1 2389
christof hoeke wrote:
i wrote a small application which extracts a javadoc similar
documentation
for xslt stylesheets using python, xslt and pyana.
using non-ascii characters was a problem.
That's odd. Did your stylesheets contain non-ascii characters? If yes,
did you declare the character encoding at the beginning of the
document, e.g.

"<?xml version="1.0" encoding="iso-8859-1"?>
so i set the [python] defaultending to
UTF-8 and now everything works (at least it seems so, need to do more
testing though).
If you don't put an encoding declaration in your XML documents
(including XSLT style/transform sheets), then an XML parser would by
default treat the document content as UTF-(8|16), as the XML standard
mandates.

Are you working from XML documents which are stored as strings inside
a python module? In which case, your special characters will actually
be encoded in whatever encoding your python module is stored. So you
might need to put an encoding declaration on your python module:-

http://www.python.org/peps/pep-0263.html
it may not be the most elegant solution (according to python in a
nutshell)
but it almost seems when doing xml processing it is mandatory to set the
default encoding. xml processing should almost only work with unicode
strings and this seems the easiest solution.
It is always recommended to explicitly state the encoding on your XML
documents. If you don't, then the parser assumes UTF-(8|16). If your
documents aren't really UTF-(8|16), then you will get seemingly random
mapping of characters to other characters.
any comments on this? better ways to work


If you're not dealing specifically with ASCII, then declare your
encodings, in both your python modules and your xml documents. Find
out what is the default character set used by your text editor. Find
out how to change which character set is in use.

If you create, sell or maintain text editing or processing software,
make it easy for your users to find out what character encodings are
in effect.

HTH,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Askari | last post: by
2 posts views Thread by Luiz Vianna | last post: by
10 posts views Thread by Enrique Cruiz | last post: by
1 post views Thread by Robin Becker | last post: by
2 posts views Thread by smalltalk | last post: by
reply views Thread by guiromero | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.