469,358 Members | 1,628 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,358 developers. It's quick & easy.

org.apache.xml.serialize.XMLSerializer problem with UTF-8

I must be missing something.

I am using org.apache.xml.serialize.XMLSerializer to save a DOM but I am not
getting non-basic characters converted to UTF-8.

I create Text nodes in the DOM by, for example:

Document doc;
JTextArea textPrompt;
Text newTextNode;
Element descElt;
....
newTextNode = doc.createTextNode(textPrompt.getText());
descElt.appendChild(newTextNode);

The code to serialize the DOM is:

private void saveXml(Document document)
{
// rename the existing layout file
new File(fileName).renameTo(new File(fileName + "~"));
// write the document out
OutputFormat format = new OutputFormat(document);
format.setIndenting(true);
format.setLineWidth(0);
format.setPreserveSpace(true);
try {
XMLSerializer serializer;
serializer = new XMLSerializer (
new FileWriter(fileName),
format);
serializer.asDOMSerializer();
serializer.serialize(document);
}
catch (IOException ioe)
{
....
}
}

If I enter a character such as e' (e with acute accent) into the JTextArea
and I look at the XML file using a non-UTF-8-aware editor I see that the e'
has been inserted as a single byte, not as the 2 character UTF-8 escaped
value. If I subsequently try to read the XML file using XERCES it blows up
because of the invalid escape sequence.

How do I get a valid serialization of this DOM into XML using UTF-8?
--
Jim Cobban jc*****@magma.ca
34 Palomino Dr.
Kanata, ON, CANADA
K2M 1M1
+1-613-592-9438
Jul 20 '05 #1
2 6459
Jim Cobban wrote:
I must be missing something. XMLSerializer serializer;
serializer = new XMLSerializer (
new FileWriter(fileName),
format);
serializer.asDOMSerializer();
If I enter a character such as e' (e with acute accent) into the JTextArea
and I look at the XML file using a non-UTF-8-aware editor I see that the e'
has been inserted as a single byte, not as the 2 character UTF-8 escaped
value. If I subsequently try to read the XML file using XERCES it blows up
because of the invalid escape sequence.

How do I get a valid serialization of this DOM into XML using UTF-8?


As far as I know it is the Writer responsible for the encoding.

From FileWriter API doc:

public class FileWriter
extends OutputStreamWriter

Convenience class for writing character files. The constructors of this
class assume that the default character encoding and the default
byte-buffer size are acceptable. To specify these values yourself,
construct an OutputStreamWriter on a FileOutputStream.
- try that.

Soren

--
Fjern de 4 bogstaver i min mailadresse som er indsat for at hindre s...
Remove the 4 letter word meaning "junk mail" in my mail address.

Jul 20 '05 #2

"Soren Kuula" <do**********@bitplanet.net> wrote in message
news:5K*********************@news000.worldonline.d k...

As far as I know it is the Writer responsible for the encoding.

From FileWriter API doc:

public class FileWriter
extends OutputStreamWriter

Convenience class for writing character files. The constructors of this
class assume that the default character encoding and the default
byte-buffer size are acceptable. To specify these values yourself,
construct an OutputStreamWriter on a FileOutputStream.


Thank you.

The problem was that I copied the code from one of the examples that came
with Xerces. It was that example which constructed the default FileWriter.
Since their is a version of the XMLSerializer constructor which takes an
OutpuStream and internally constructs a Writer with the correct "utf-8"
encoding, that is the form of the constructor which I needed to use. I
should have read the documentation in more detail rather than trusting that
the example had been written correctly.
Jul 20 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Jonah Olsson | last post: by
reply views Thread by John Manion via .NET 247 | last post: by
8 posts views Thread by Andy B | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.