472,353 Members | 1,442 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,353 software developers and data experts.

Java DOM processing XML Keeping Carriage Returns

I am writing a Java program to read in XML file, modify some elements
slightly, and then write it out. That XML file is prepared
in Docbook.

It works fine, except that it is disturbing the carriage returns in places
where they have meaning.

Attached are a sample input file, the sample output file, and a simplified
version of my Java program. My real file, examines certain element's attributes
and adds certain elements to the DOM data structure.

How, best to write such a program, without disturbing the carriage returns
between "<programlisting>" and "</programlisting>"

Here is the input file. I need to preserve the formatting of the material
between "<programlisting>" and "</programlisting>"

<test>
<programlisting>
a
b
c d e f
</programlisting>
</test>

Here is the output file -- observe how the "a b c d e f" are now on
one line.
<?xml version="1.0" encoding="UTF-8"?>
<test>
<programlisting> a b c d e f </programlisting>
</test>

Here is the Java program:

import java.text.*;
import java.io.*;

import javax.xml.parsers.*;
import org.apache.xml.serialize.*;
import org.w3c.dom.*;
import org.xml.sax.*;

public class Test {
static PrintWriter debug = null;
static Document document = null;
static String OutputFileName;

static Element CreateElement (String ElementName, String Contents){
Element ToReturn;
ToReturn = document.createElement(ElementName);
Text T = document.createTextNode(Contents);
ToReturn.appendChild(T);
return ToReturn;
}
static Element CreateElement (String ElementName, String Contents, String AttributeName, String AttributeValue){
Element ToReturn = CreateElement(ElementName,Contents);
ToReturn.setAttribute(AttributeName,AttributeValue );
return ToReturn;
}
public static void main (String[] args) throws FileNotFoundException {
try {
debug = new PrintWriter (new FileWriter("debug.out"));
}
catch (Exception d) {System.out.println("cannot open debug file");}
Text T;
int j;
DocumentBuilder parser = null;
// Here we read in the data from the XML file
DocumentBuilderFactory Factory = DocumentBuilderFactory.newInstance();
String xmlFile = args[0];
File file = new File (xmlFile);
try {
parser = Factory.newDocumentBuilder();
}
catch (ParserConfigurationException pce) {
System.out.println ("Parser Configuration Exception " + pce.getMessage());
System.exit(0);
}
try {
document = parser.parse(file);
}
catch (SAXException se) {
System.out.println ("SAX Exception on parsing document " + se.getMessage());
System.exit(0);
}
catch (IOException ioe) {
System.out.println ("IO Exception on parsing document " + ioe.getMessage());
System.exit(0);
}
FileWriter out = null;
XMLSerializer X = null;
Element root = null;
OutputFileName=args[0];
try {
out = new FileWriter(OutputFileName+".OUT"+".xml");
out.flush();
OutputFormat o = new OutputFormat(document);
o.setIndent(5);
o.setIndenting(true);
X = new XMLSerializer(o);
X.setOutputCharStream(out);
}
catch (IOException e0) {
System.out.println ("problem in setting up to save XML file" + e0.getMessage());
e0.printStackTrace();
}

// use the XML functions to dump the materials
try {
X.serialize(document);
out.flush();
} catch (IOException e2) {System.out.println("error writing file " + e2.getMessage());e2.printStackTrace();}

debug.close();

}
}

Dr. Laurence Leff Western Illinois University, Macomb IL 61455 ||(309) 298-1315
Stipes 447 Assoc. Prof. of Computer Sci. Pager: 309-367-0787 FAX: 309-298-2302
Secretary: eContracts Technical Committee OASIS Legal XML Member Section
Jul 20 '05 #1
4 9746
/Dr. Laurence Leff/:
It works fine, except that it is disturbing the carriage returns in places
where they have meaning.
<http://www.w3.org/TR/REC-xml/#sec-line-ends>:
... XML processor MUST behave as if it normalized all line breaks in
external parsed entities (including the document entity) on input,
before parsing, by translating both the two-character sequence #xD
#xA and any #xD that is not followed by #xA to a single #xA
character.


So you have to include the carriage returns in the source XML data
using character references - &#xD;

--
Stanimir
Jul 20 '05 #2
/Stanimir Stamenkov/:
So you have to include the carriage returns in the source XML data using
character references - &#xD;


As far as I see:

Document doc;
... // initialize a new empty 'doc'
Element elem = doc.createElement("test");
elem.appendChild(doc.createTextNode(
"A line of text.\r\n\r\nAnother line."));
doc.appendChild(elem);
// serialize the 'doc'

Serialization using the standard JAXP Transformations API (using a
"copy transformer") correctly outputs in place of the CR
characters so they would be the read next time again.

--
Stanimir
Jul 20 '05 #3
Dr. Stamenkov:

Thank you for your quick responses to my question on using Java
to process XML files that contain formatted text such as programs.

I tried your suggestion of using the entity reference for carriage
return, &#xD;, (I wrote a perl script to identify my
programlisting sections and make the replacement.)

It did not help.

Here is the sample input (after including the entity references)

<section><para>
abc
def
<programlisting>
&#xD;ghi
&#xD;jkl
&#xD;mno
&#xD;</programlisting>
ghi
jkl
</para></section>

Here is the output of the Java program. Observe that the carriage return
and formatting in the "programlisting" section are being changed.

<?xml version="1.0" encoding="UTF-8"?>
<section>
<para> abc def <programlisting> ghi jkl mno </programlisting>
ghi jkl </para>
</section>

This output was from the same Java program as before.

I also tried the startPreserving() option on the OutputSerializer
and removing the invocations of the methods setIndent and setIndenting.
I also tried removing the carraige returns between the line, replacing
them with the return, &#xD;
These made no change.

I then tried the other suggestion:

import java.text.*;
import java.io.*;

import javax.xml.parsers.*;
import org.apache.xml.serialize.*;
import org.w3c.dom.*;
import org.xml.sax.*;

public class T{
static PrintWriter debug = null;
static String OutputFileName;

public static void main (String[] args) throws ParserConfigurationException,FileNotFoundException {
try {
debug = new PrintWriter (new FileWriter("debug.out"));
}
catch (Exception d) {System.out.println("cannot open debug file");}
Text T;
int j;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document doc;
DocumentBuilder db = dbf.newDocumentBuilder();
doc = db.newDocument();
Element elem = doc.createElement("test");
elem.appendChild(doc.createTextNode(
"A line of text \r\n\r\nAnother line"));

doc.appendChild(elem);

FileWriter out = null;
XMLSerializer X = null;
Element root = null;
try {
out = new FileWriter(OutputFileName+".OUT"+".xml");
out.flush();
OutputFormat o = new OutputFormat(doc);
//o.setIndent(5);
//o.setIndenting(true);
X = new XMLSerializer(o);
X.setOutputCharStream(out);
}
catch (IOException e0) {
System.out.println ("problem in setting up to save XML file" + e0.getMessage());
e0.printStackTrace();
}

// use the XML functions to dump the materials
try {
X.serialize(doc);
out.flush();
} catch (IOException e2) {System.out.println("error writing file " + e2.getMessage());e2.printStackTrace();}

debug.close();

}
}

I got this output:

<?xml version="1.0" encoding="UTF-8"?>
<test>A line of text Another line</test>

Perhaps, there is a version or configuration problem with my parser software.
I am using Xerces-2_4_0.

Thank you for any further assistance that you or anyone else
reading this newsgroup can provide.

Dr. Laurence Leff Western Illinois University, Macomb IL 61455 ||(309) 298-1315
Stipes 447 Assoc. Prof. of Computer Sci. Pager: 309-367-0787 FAX: 309-298-2302
Secretary: eContracts Technical Committee OASIS Legal XML Member Section

Jul 20 '05 #4
/Dr. Laurence Leff/:
I tried your suggestion of using the entity reference for carriage
return, &#xD; ...

It did not help.

Here is the sample input (after including the entity references)

<section><para>
abc
def
<programlisting>
&#xD;ghi
&#xD;jkl
&#xD;mno
&#xD;</programlisting>
ghi
jkl
</para></section>
[...]
Perhaps, there is a version or configuration problem with my parser software.
I am using Xerces-2_4_0.


As I've mentioned in my previous reply I've used the JAXP
Transformations API (part of the standard Java 1.4 framework) to
serialize the data. I have Xerces2 version 2.6.2 but I haven't used
its 'serialize' package. It could be these 'Serializer' classes need
additional configuration or just they don't behave well in the
version you have. The Xerces version I have provides implementation
of the DOM Level 3 Load and Save API (which is now part of the
standard Java 5 framework) but I haven't tried that, too.

I've prepared an example for you to try:

http://www.geocities.com/stanio/test...utputTest.java
http://www.geocities.com/stanio/test/input.xml

It reads the "input.xml" file (which is copy of the sample input
you've given above), dumps its contents to the console where CR and
LF characters are indicated/replaced with "[CR]" and "[LF]" strings
(all on one line). Then it saves the read DOM data to "output.xml".

In addtion a "test.xml" file is created with DOM data constructed
using the 'Document' factory methods (as in my previous example).

--
Stanimir
Jul 20 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Les Juby | last post by:
Can someone please help with a suggestion as to how I can keep the formatting (carriage returns) that the user enters into a memo field and then...
2
by: eagleofjade | last post by:
I am trying to import data from a Word document into an Access table with VBA. The Word document is a form which has various fields. One of the...
1
by: Mark Rae | last post by:
Hi, I have to process a very "wide" CSV file. Basically, the file does not appear correctly in Notepad, WordPad etc because each line is 414...
2
by: Matt Mercer | last post by:
Hi all, I am having a frustration problem, and I have read about 25 newsgroup postings that do not have a satisfying answer :) The problem...
1
by: Larry Menard | last post by:
Folks, I've written the world's simplest java UDF, and it is complaining that it can't load the method. The class seems OK, it's complaining about...
8
by: TheDude5B | last post by:
Hi, I have some data which is stored in my MySQL database as TEXT. when the data is entered in, it has some carriage returns in it, and this can...
7
by: mattrapoport | last post by:
I have a page with a div on it. The div displays a user comment. When the user logs into this page, their current comment is pulled from a db and...
0
by: markus.shure | last post by:
Hi, I'm noticed a problem testing a JAX-WS client with a WSE server. The JAX-WS client adds carriage returns to a SOAP header element that is...
2
by: Bazza Formez | last post by:
I have a bound field in a DetailsView control that displays free form description type data from my SQL database table (typical data is a couple of...
1
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...
0
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.