473,508 Members | 2,329 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Problems with UTF-8 characters and XSLT

Hi,

I'm using Xalan to do some transforming of XML in Java. My problem is:

I have unicode in my XML (i.e., German Umlauts (ä,ö,ü, and since
they trouble me, I did not try out any other unicode characters). When
I do an Identity Transform and output the XMl to a File, the word
'Glättegefahr', for example, will appear in my File (viewed with
XMLSpy Eclipse-PlugIn) as 'Glã³´egefahr' (except that the ? is a box
instead of a ? .

When I output it to System.out, I get: Glättegefahr. (This is also
what I get using XMLSpy directly, except that XMLSpy does not seem to
understand the <xsl:copy-of> tag).

Here is my Java Code for instantiating the transformer.

Transformer trans = TransformerFactory.newInstance().newTransformer();
trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
trans.transform(new DOMSource(source.getDocumentElement()),
new StreamResult(new FileWriter(origin)));
//or: new StreamResult(System.out)

The XML-file is shown in Eclipse as encoded with utf-8 and each file
involved (xslt, xml) has set the encoding="UTF-8" attribute specified.

Any Idea on what else I can try will be most welcome. Thanks in
advance.

Jul 20 '05 #1
2 4794


ja******@gmx.de wrote:

I'm using Xalan to do some transforming of XML in Java. My problem is:

I have unicode in my XML (i.e., German Umlauts (ä,ö,ü, and since
they trouble me, I did not try out any other unicode characters). When
I do an Identity Transform and output the XMl to a File, the word
'Glättegefahr', for example, will appear in my File (viewed with
XMLSpy Eclipse-PlugIn) as 'Glã³´egefahr' (except that the ? is a box
instead of a ? .

When I output it to System.out, I get: Glättegefahr. (This is also
what I get using XMLSpy directly, except that XMLSpy does not seem to
understand the <xsl:copy-of> tag).
As System.out is the console which is usually set to display an 8bit
oriented codepage it seems the output is properly UTF-8 encoded, an ä in
UTF-8 takes two bytes and those ä are the two bytes.
Here is my Java Code for instantiating the transformer.

Transformer trans = TransformerFactory.newInstance().newTransformer();
trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
trans.transform(new DOMSource(source.getDocumentElement()),
new StreamResult(new FileWriter(origin)));
//or: new StreamResult(System.out)

The XML-file is shown in Eclipse as encoded with utf-8 and each file
involved (xslt, xml) has set the encoding="UTF-8" attribute specified.


Does the resulting file have an XML declaration
<?xml version="1.0" encoding="utf-8"?>
?

--

Martin Honnen
http://JavaScript.FAQTs.com/
Jul 20 '05 #2
Yes, it does ... but thanks to your tip I reviewed the API for the
FileWriter class which I used as the Result for the transform method
.... and it showed that it uses a default charset ... and this was the
problem.

I solved it by constructing an OutputStreamWriter using the UTF-8
charset instead, and now the file is transformed, or better, copied,
correctly.

Thank you so much for your help.

Jul 20 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
11851
by: Gerson Kurz | last post by:
AAAAAAAARG I hate the way python handles unicode. Here is a nice problem for y'all to enjoy: say you have a variable thats unicode directory = u"c:\temp" Its unicode not because you want it...
1
5715
by: Irmen de Jong | last post by:
Hi I'm trying to create e-mail content using the email.MIMEText module. It basically works, until I tried to send mail in non-ascii format. What I did, to test both iso-8859-15 and UTF-8...
2
1335
by: Ryan Gregg | last post by:
I'm having a major problem with assembly references that I keep running into, and I'm really hoping that someone can help me out. The problem occurs when I have three projects in my solution, two...
6
6962
by: Mike | last post by:
http://www.calgarymike.com/content/fsbo.html After having validated a couple of pages I'm completely stumped as to why I get this message. I don't have a line counter in my program (Arachnophia)...
1
1535
by: Linus | last post by:
Hi, I'm having problems with some very simple deserialization code and would appreciate it very much if I could get some help here. The following is the code:...
5
1827
by: Kevin Westhead | last post by:
I'm using XslTransform to apply a transform to an XML document, however I get validation problems when parsing the resulting XML document due to invalid whitespace. I'm passing in an XPathNavigator...
1
2071
by: peter pilsl | last post by:
postgres 7.4 on linux, glibc 2.2.4-6 I've a table containing unicode-data and the lower()-function does not work proper. While it lowers standard letters like A->a,B->b ... it fails on special...
3
1949
by: Andreas | last post by:
Hi! I'm currently developing a DLL that makes use of C++ and .net (mixed) using Visual Studio 2003. Now, as I wanted to move to the new Visual Studio 2005, I converted this project into the...
20
2880
by: Chris Withers | last post by:
Hi All, The following piece of code is giving me issues: from email.Charset import Charset,QP from email.MIMEText import MIMEText charset = Charset('utf-8') charset.body_encoding = QP msg =...
15
6010
by: Bexm | last post by:
Hello I have searched through this forum and it seems some people are having similar problems to me but none of the fixes are fixing mine..! :( I have a table in my database that has two xml...
0
7225
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7123
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7324
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7382
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
5627
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
3193
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3181
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1556
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
766
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.