469,362 Members | 2,463 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,362 developers. It's quick & easy.

Problems with UTF-8 characters and XSLT

Hi,

I'm using Xalan to do some transforming of XML in Java. My problem is:

I have unicode in my XML (i.e., German Umlauts (ä,ö,ü, and since
they trouble me, I did not try out any other unicode characters). When
I do an Identity Transform and output the XMl to a File, the word
'Glättegefahr', for example, will appear in my File (viewed with
XMLSpy Eclipse-PlugIn) as 'Gl㳴egefahr' (except that the ? is a box
instead of a ? .

When I output it to System.out, I get: Glättegefahr. (This is also
what I get using XMLSpy directly, except that XMLSpy does not seem to
understand the <xsl:copy-of> tag).

Here is my Java Code for instantiating the transformer.

Transformer trans = TransformerFactory.newInstance().newTransformer();
trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
trans.transform(new DOMSource(source.getDocumentElement()),
new StreamResult(new FileWriter(origin)));
//or: new StreamResult(System.out)

The XML-file is shown in Eclipse as encoded with utf-8 and each file
involved (xslt, xml) has set the encoding="UTF-8" attribute specified.

Any Idea on what else I can try will be most welcome. Thanks in
advance.

Jul 20 '05 #1
2 4671


ja******@gmx.de wrote:

I'm using Xalan to do some transforming of XML in Java. My problem is:

I have unicode in my XML (i.e., German Umlauts (ä,ö,ü, and since
they trouble me, I did not try out any other unicode characters). When
I do an Identity Transform and output the XMl to a File, the word
'Glättegefahr', for example, will appear in my File (viewed with
XMLSpy Eclipse-PlugIn) as 'Gl㳴egefahr' (except that the ? is a box
instead of a ? .

When I output it to System.out, I get: Glättegefahr. (This is also
what I get using XMLSpy directly, except that XMLSpy does not seem to
understand the <xsl:copy-of> tag).
As System.out is the console which is usually set to display an 8bit
oriented codepage it seems the output is properly UTF-8 encoded, an ä in
UTF-8 takes two bytes and those ä are the two bytes.
Here is my Java Code for instantiating the transformer.

Transformer trans = TransformerFactory.newInstance().newTransformer();
trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
trans.transform(new DOMSource(source.getDocumentElement()),
new StreamResult(new FileWriter(origin)));
//or: new StreamResult(System.out)

The XML-file is shown in Eclipse as encoded with utf-8 and each file
involved (xslt, xml) has set the encoding="UTF-8" attribute specified.


Does the resulting file have an XML declaration
<?xml version="1.0" encoding="utf-8"?>
?

--

Martin Honnen
http://JavaScript.FAQTs.com/
Jul 20 '05 #2
Yes, it does ... but thanks to your tip I reviewed the API for the
FileWriter class which I used as the Result for the transform method
.... and it showed that it uses a default charset ... and this was the
problem.

I solved it by constructing an OutputStreamWriter using the UTF-8
charset instead, and now the file is transformed, or better, copied,
correctly.

Thank you so much for your help.

Jul 20 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

19 posts views Thread by Gerson Kurz | last post: by
2 posts views Thread by Ryan Gregg | last post: by
6 posts views Thread by Mike | last post: by
1 post views Thread by Linus | last post: by
5 posts views Thread by Kevin Westhead | last post: by
20 posts views Thread by Chris Withers | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.