473,320 Members | 1,823 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

xml processing and sys.setdefaultencoding

hi,
i wrote a small application which extracts a javadoc similar documentation
for xslt stylesheets using python, xslt and pyana.
using non-ascii characters was a problem. so i set the defaultending to
UTF-8 and now everything works (at least it seems so, need to do more
testing though).

it may not be the most elegant solution (according to python in a nutshell)
but it almost seems when doing xml processing it is mandatory to set the
default encoding. xml processing should almost only work with unicode
strings and this seems the easiest solution.

any comments on this? better ways to work

thanks
chris
Jul 18 '05 #1
1 2523
christof hoeke wrote:
i wrote a small application which extracts a javadoc similar
documentation
for xslt stylesheets using python, xslt and pyana.
using non-ascii characters was a problem.
That's odd. Did your stylesheets contain non-ascii characters? If yes,
did you declare the character encoding at the beginning of the
document, e.g.

"<?xml version="1.0" encoding="iso-8859-1"?>
so i set the [python] defaultending to
UTF-8 and now everything works (at least it seems so, need to do more
testing though).
If you don't put an encoding declaration in your XML documents
(including XSLT style/transform sheets), then an XML parser would by
default treat the document content as UTF-(8|16), as the XML standard
mandates.

Are you working from XML documents which are stored as strings inside
a python module? In which case, your special characters will actually
be encoded in whatever encoding your python module is stored. So you
might need to put an encoding declaration on your python module:-

http://www.python.org/peps/pep-0263.html
it may not be the most elegant solution (according to python in a
nutshell)
but it almost seems when doing xml processing it is mandatory to set the
default encoding. xml processing should almost only work with unicode
strings and this seems the easiest solution.
It is always recommended to explicitly state the encoding on your XML
documents. If you don't, then the parser assumes UTF-(8|16). If your
documents aren't really UTF-(8|16), then you will get seemingly random
mapping of characters to other characters.
any comments on this? better ways to work


If you're not dealing specifically with ASCII, then declare your
encodings, in both your python modules and your xml documents. Find
out what is the default character set used by your text editor. Find
out how to change which character set is in use.

If you create, sell or maintain text editing or processing software,
make it easy for your users to find out what character encodings are
in effect.

HTH,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Askari | last post by:
Where is the method : "sys.setdefaultencoding(name)"? Problem : I want change the default encoding because I want put french letter : éÉàÀ...etc in widget with no problem! When I use the method...
0
by: Ganapathy | last post by:
I have COM dll code written in VC 6.0. When i tried compiling this code in VC 7, The MIDL cmpiler gets called twice. i.e. it initially compiles fully & immediately a line - 64 bit processing'...
2
by: Luiz Vianna | last post by:
Hi folks, I got a problem that certainly someone had too. After a user request, I (my server) must process a lot of data that will expend some time. During this process I must inform the user...
6
by: James Radke | last post by:
Hello, I have a multithreaded windows NT service application (vb.net 2003) that I am working on (my first one), which reads a message queue and creates multiple threads to perform the processing...
10
by: Enrique Cruiz | last post by:
Hello all, I am currently implementing a fairly simple algorithm. It scans a grayscale image, and computes a pixel's new value as a function of its original value. Two passes are made, first...
0
ADezii
by: ADezii | last post by:
In the Tip of the Week #19, we demonstrated Transaction Processing, specifically as it applies to DAO (Data Access Objects). In this week's Tip, we'll illustrate how Transaction Processing can be...
1
by: Robin Becker | last post by:
Can someone explain the rationale of making the default encoding a sitewide setting? I could live with the the default being set on a per process basis, but it baffles me why even that...
2
by: smalltalk | last post by:
>>import sys Traceback (most recent call last): File "<interactive input>", line 1, in <module> AttributeError: 'module' object has no attribute 'setdefaultencoding' but i find the...
0
by: tavares | last post by:
(Our apologies for cross-posting. We appreciate if you kindly distribute this information by your co- workers and colleagues.) ...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.