473,692 Members | 2,271 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Java DOM processing XML Keeping Carriage Returns

I am writing a Java program to read in XML file, modify some elements
slightly, and then write it out. That XML file is prepared
in Docbook.

It works fine, except that it is disturbing the carriage returns in places
where they have meaning.

Attached are a sample input file, the sample output file, and a simplified
version of my Java program. My real file, examines certain element's attributes
and adds certain elements to the DOM data structure.

How, best to write such a program, without disturbing the carriage returns
between "<programlistin g>" and "</programlisting> "

Here is the input file. I need to preserve the formatting of the material
between "<programlistin g>" and "</programlisting> "

<test>
<programlisting >
a
b
c d e f
</programlisting>
</test>

Here is the output file -- observe how the "a b c d e f" are now on
one line.
<?xml version="1.0" encoding="UTF-8"?>
<test>
<programlisting > a b c d e f </programlisting>
</test>

Here is the Java program:

import java.text.*;
import java.io.*;

import javax.xml.parse rs.*;
import org.apache.xml. serialize.*;
import org.w3c.dom.*;
import org.xml.sax.*;

public class Test {
static PrintWriter debug = null;
static Document document = null;
static String OutputFileName;

static Element CreateElement (String ElementName, String Contents){
Element ToReturn;
ToReturn = document.create Element(Element Name);
Text T = document.create TextNode(Conten ts);
ToReturn.append Child(T);
return ToReturn;
}
static Element CreateElement (String ElementName, String Contents, String AttributeName, String AttributeValue) {
Element ToReturn = CreateElement(E lementName,Cont ents);
ToReturn.setAtt ribute(Attribut eName,Attribute Value);
return ToReturn;
}
public static void main (String[] args) throws FileNotFoundExc eption {
try {
debug = new PrintWriter (new FileWriter("deb ug.out"));
}
catch (Exception d) {System.out.pri ntln("cannot open debug file");}
Text T;
int j;
DocumentBuilder parser = null;
// Here we read in the data from the XML file
DocumentBuilder Factory Factory = DocumentBuilder Factory.newInst ance();
String xmlFile = args[0];
File file = new File (xmlFile);
try {
parser = Factory.newDocu mentBuilder();
}
catch (ParserConfigur ationException pce) {
System.out.prin tln ("Parser Configuration Exception " + pce.getMessage( ));
System.exit(0);
}
try {
document = parser.parse(fi le);
}
catch (SAXException se) {
System.out.prin tln ("SAX Exception on parsing document " + se.getMessage() );
System.exit(0);
}
catch (IOException ioe) {
System.out.prin tln ("IO Exception on parsing document " + ioe.getMessage( ));
System.exit(0);
}
FileWriter out = null;
XMLSerializer X = null;
Element root = null;
OutputFileName= args[0];
try {
out = new FileWriter(Outp utFileName+".OU T"+".xml");
out.flush();
OutputFormat o = new OutputFormat(do cument);
o.setIndent(5);
o.setIndenting( true);
X = new XMLSerializer(o );
X.setOutputChar Stream(out);
}
catch (IOException e0) {
System.out.prin tln ("problem in setting up to save XML file" + e0.getMessage() );
e0.printStackTr ace();
}

// use the XML functions to dump the materials
try {
X.serialize(doc ument);
out.flush();
} catch (IOException e2) {System.out.pri ntln("error writing file " + e2.getMessage() );e2.printStack Trace();}

debug.close();

}
}

Dr. Laurence Leff Western Illinois University, Macomb IL 61455 ||(309) 298-1315
Stipes 447 Assoc. Prof. of Computer Sci. Pager: 309-367-0787 FAX: 309-298-2302
Secretary: eContracts Technical Committee OASIS Legal XML Member Section
Jul 20 '05 #1
4 9875
/Dr. Laurence Leff/:
It works fine, except that it is disturbing the carriage returns in places
where they have meaning.
<http://www.w3.org/TR/REC-xml/#sec-line-ends>:
... XML processor MUST behave as if it normalized all line breaks in
external parsed entities (including the document entity) on input,
before parsing, by translating both the two-character sequence #xD
#xA and any #xD that is not followed by #xA to a single #xA
character.


So you have to include the carriage returns in the source XML data
using character references - &#xD;

--
Stanimir
Jul 20 '05 #2
/Stanimir Stamenkov/:
So you have to include the carriage returns in the source XML data using
character references - &#xD;


As far as I see:

Document doc;
... // initialize a new empty 'doc'
Element elem = doc.createEleme nt("test");
elem.appendChil d(doc.createTex tNode(
"A line of text.\r\n\r\nAn other line."));
doc.appendChild (elem);
// serialize the 'doc'

Serialization using the standard JAXP Transformations API (using a
"copy transformer") correctly outputs in place of the CR
characters so they would be the read next time again.

--
Stanimir
Jul 20 '05 #3
Dr. Stamenkov:

Thank you for your quick responses to my question on using Java
to process XML files that contain formatted text such as programs.

I tried your suggestion of using the entity reference for carriage
return, &#xD;, (I wrote a perl script to identify my
programlisting sections and make the replacement.)

It did not help.

Here is the sample input (after including the entity references)

<section><par a>
abc
def
<programlisting >
&#xD;ghi
&#xD;jkl
&#xD;mno
&#xD;</programlisting>
ghi
jkl
</para></section>

Here is the output of the Java program. Observe that the carriage return
and formatting in the "programlisting " section are being changed.

<?xml version="1.0" encoding="UTF-8"?>
<section>
<para> abc def <programlisting > ghi jkl mno </programlisting>
ghi jkl </para>
</section>

This output was from the same Java program as before.

I also tried the startPreserving () option on the OutputSerialize r
and removing the invocations of the methods setIndent and setIndenting.
I also tried removing the carraige returns between the line, replacing
them with the return, &#xD;
These made no change.

I then tried the other suggestion:

import java.text.*;
import java.io.*;

import javax.xml.parse rs.*;
import org.apache.xml. serialize.*;
import org.w3c.dom.*;
import org.xml.sax.*;

public class T{
static PrintWriter debug = null;
static String OutputFileName;

public static void main (String[] args) throws ParserConfigura tionException,F ileNotFoundExce ption {
try {
debug = new PrintWriter (new FileWriter("deb ug.out"));
}
catch (Exception d) {System.out.pri ntln("cannot open debug file");}
Text T;
int j;
DocumentBuilder Factory dbf = DocumentBuilder Factory.newInst ance();
Document doc;
DocumentBuilder db = dbf.newDocument Builder();
doc = db.newDocument( );
Element elem = doc.createEleme nt("test");
elem.appendChil d(doc.createTex tNode(
"A line of text \r\n\r\nAnother line"));

doc.appendChild (elem);

FileWriter out = null;
XMLSerializer X = null;
Element root = null;
try {
out = new FileWriter(Outp utFileName+".OU T"+".xml");
out.flush();
OutputFormat o = new OutputFormat(do c);
//o.setIndent(5);
//o.setIndenting( true);
X = new XMLSerializer(o );
X.setOutputChar Stream(out);
}
catch (IOException e0) {
System.out.prin tln ("problem in setting up to save XML file" + e0.getMessage() );
e0.printStackTr ace();
}

// use the XML functions to dump the materials
try {
X.serialize(doc );
out.flush();
} catch (IOException e2) {System.out.pri ntln("error writing file " + e2.getMessage() );e2.printStack Trace();}

debug.close();

}
}

I got this output:

<?xml version="1.0" encoding="UTF-8"?>
<test>A line of text Another line</test>

Perhaps, there is a version or configuration problem with my parser software.
I am using Xerces-2_4_0.

Thank you for any further assistance that you or anyone else
reading this newsgroup can provide.

Dr. Laurence Leff Western Illinois University, Macomb IL 61455 ||(309) 298-1315
Stipes 447 Assoc. Prof. of Computer Sci. Pager: 309-367-0787 FAX: 309-298-2302
Secretary: eContracts Technical Committee OASIS Legal XML Member Section

Jul 20 '05 #4
/Dr. Laurence Leff/:
I tried your suggestion of using the entity reference for carriage
return, &#xD; ...

It did not help.

Here is the sample input (after including the entity references)

<section><par a>
abc
def
<programlisting >
&#xD;ghi
&#xD;jkl
&#xD;mno
&#xD;</programlisting>
ghi
jkl
</para></section>
[...]
Perhaps, there is a version or configuration problem with my parser software.
I am using Xerces-2_4_0.


As I've mentioned in my previous reply I've used the JAXP
Transformations API (part of the standard Java 1.4 framework) to
serialize the data. I have Xerces2 version 2.6.2 but I haven't used
its 'serialize' package. It could be these 'Serializer' classes need
additional configuration or just they don't behave well in the
version you have. The Xerces version I have provides implementation
of the DOM Level 3 Load and Save API (which is now part of the
standard Java 5 framework) but I haven't tried that, too.

I've prepared an example for you to try:

http://www.geocities.com/stanio/test...utputTest.java
http://www.geocities.com/stanio/test/input.xml

It reads the "input.xml" file (which is copy of the sample input
you've given above), dumps its contents to the console where CR and
LF characters are indicated/replaced with "[CR]" and "[LF]" strings
(all on one line). Then it saves the read DOM data to "output.xml ".

In addtion a "test.xml" file is created with DOM data constructed
using the 'Document' factory methods (as in my previous example).

--
Stanimir
Jul 20 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
7315
by: Les Juby | last post by:
Can someone please help with a suggestion as to how I can keep the formatting (carriage returns) that the user enters into a memo field and then display that later. I figured I might be able to use: 'replace carriage returns with BRs comment=Replace(comment, chr(13), "<br>") but obviously net.! The <pre> tag doesn't sem to help either as the embedded return is
2
2945
by: eagleofjade | last post by:
I am trying to import data from a Word document into an Access table with VBA. The Word document is a form which has various fields. One of the fields is a field for notes. In some cases, this note field contains carriage returns. When I import a note field that has carriage returns, what shows up in the Access field are vertical black lines where the carriage return should be.
1
1575
by: Mark Rae | last post by:
Hi, I have to process a very "wide" CSV file. Basically, the file does not appear correctly in Notepad, WordPad etc because each line is 414 characters wide. Ordinarily, I would have read the file into a StreamReader and processed each line in turn e.g. objSR = new StreamReader(strFileSpec); while ((strLineIn = objSR.ReadLine()) != null)
2
2335
by: Matt Mercer | last post by:
Hi all, I am having a frustration problem, and I have read about 25 newsgroup postings that do not have a satisfying answer :) The problem appears to be common where carriage returns are lost when pulling data from an SQL database. The thing that frustrates me the most, is that when I use Enterprise Manager, the carriage returns ARE THERE. It looks fine until I pull it out.
1
6048
by: Larry Menard | last post by:
Folks, I've written the world's simplest java UDF, and it is complaining that it can't load the method. The class seems OK, it's complaining about the method. The JDBC Sample UDFs (e.g., scUDFReturningErr) seem to work fine, so I made sure that I'm doing everything the same way the sample is. My class file is in the right place. My method is public. I'm using the IBM 1.4.1 JDK that came with my DB2 (v8.2.2). I'm using the InfoCenter...
8
2221
by: TheDude5B | last post by:
Hi, I have some data which is stored in my MySQL database as TEXT. when the data is entered in, it has some carriage returns in it, and this can be seen when querying the data using MySQL Query Browser. I want to then display this text within <p> tags when requested from the database. However, the test is formatted without the carriage returns.
7
11189
by: mattrapoport | last post by:
I have a page with a div on it. The div displays a user comment. When the user logs into this page, their current comment is pulled from a db and displayed in the div. The user can edit the comment through a pop-up that contains a textarea. When the user hits OK on the pop-up, the text in the textarea is sent to a function on the main page. The function inserts the text into the div's text node. Please don't ask why I'm making this...
0
1477
by: markus.shure | last post by:
Hi, I'm noticed a problem testing a JAX-WS client with a WSE server. The JAX-WS client adds carriage returns to a SOAP header element that is signed. This causes the WSE server to raise an error: "The signature or decryption failed". If the carriage returns are removed, the same web service call is successful. I tried replicating the situation with a WSE client. I created a WSE client that adds a SOAP header element that contains a...
2
9725
by: Bazza Formez | last post by:
I have a bound field in a DetailsView control that displays free form description type data from my SQL database table (typical data is a couple of paragraphs of written product description being held in a single database field of type ntext). This description data typically has various simple control characters in it - ie. new line, carriage returns etc) to make the paragraph more readable. My problem is that these control characters...
0
8611
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8547
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8969
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8812
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8810
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7639
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5822
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
2983
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2242
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.