473,324 Members | 2,511 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

see whitespace in java DOM

I've had some luck using string values "\t" "\n" and "\r" to insert tabs,
newlines and carriagereturn textnodes into a document, but I can't *read*
these nodes, at least not by analyzing the nodeValue. Am i missing
something?
/**
* NodeFilter supposed to remove ignorable whitespace
*/
private class WhiteSpaceFilter implements NodeFilter {

public short acceptNode ( Node node ) {

// HELLO?
String value = node.getTextContent ();
boolean ok = value.equals ( "\n" ) || value.equals ( "\t" );
return ok ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT;
}
}
/**
* Strip whitespace
* @param element DOMElement
*/
private void strip ( Element element ) {

List<Node> list = new ArrayList<Node> ();
NodeFilter filter = new WhiteSpaceFilter ();
Document document = element.getOwnerDocument();
DocumentTraversal traversable = (DocumentTraversal) document;
TreeWalker walker = traversable.createTreeWalker (
element, NodeFilter.SHOW_TEXT, filter, true );

while ( walker.nextNode() != null )
list.add ( walker.getCurrentNode ());
for ( Node node : list )
node.getParentNode().removeChild ( node );
}
--
Wired Earp
Wunderbyte
Jul 20 '05 #1
2 4455
I wrote:
Am i missing something?


For some reason, even a single "\n" textnode can only be identified by a
regular expression. To make things worse, in-text whitespace must be
trimmed out, not to fool the filter.

private class WhiteSpaceFilter implements NodeFilter {

// filter parsed data
public short acceptNode ( Node node ) {
node = sanitize ( node );
String data = node.getTextContent();
boolean ok = Pattern.matches ( "", data );
return ok ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT;
}

// parse and modify data
private Node sanitize ( Node node ) {
Text text = ( Text ) node;
String data = text.getData ();
text.setData ( data.replaceAll ( "[\t\n\r\f]+", "" ));
return node; //TODO: delete multiple space characters
}
}

--
Wired Earp
Wunderbyte
Jul 20 '05 #2
I wrote:
For some reason, even a single "\n" textnode can only be identified by a
regular expression. To make things worse, in-text whitespace must be
trimmed out, not to fool the filter.


In that case, it would probably be simpler to just:

private void strip ( Document document ) {

DocumentTraversal traversable = ( DocumentTraversal ) document;
NodeIterator iterator = traversable.createNodeIterator (
(Node)document, NodeFilter.SHOW_TEXT, null, false );

Node node;
while (( node = iterator.nextNode ()) != null ) {
Text text = ( Text ) node;
String data = text.getData ();
text.setData ( data.replaceAll ( "[\t\n\r\f]+", "" ));
// TODO: delete multiple spaces
}
document.normalizeDocument ();
}

--
Wired Earp
Wunderbyte
Jul 20 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: mep | last post by:
Hi,all Is there anybody trying to release a modification version to current python source code with no significant whitespace, say replacing whitespace by {} like C or java. I do *NOT* mean...
1
by: John | last post by:
I have been trying with no result, to force Xerces parsers to ignore whitespace in my XML file. I have tried several of the NG suggested setFeature() methods in DOMParser to no avail. I have now...
2
by: f | last post by:
I am writing a java code generation tool. I use xml and xslt. But I have some problem using xsl:for-each. here is my xml <?xml version = "1.0"?> <CLASS package_name=".test"...
2
by: Wolfgang Jeltsch | last post by:
Hello, it is often convenient to insert whitespace into an XML document in order to format it nicely. For example, take this snippet of a notional DocBook XML document: <para> This is a...
2
by: josh.asbury | last post by:
I am having some major issues with whitespace in my XSL stylesheets. We recently upgraded our servlet-based application to JDK 1.4, and this has forced the issue of my finally upgrading to...
7
by: Georg J. Stach | last post by:
Hi, as mentioned above I'd like to validate a simple XML-document with a simple DTD. For this, I use Java and Xerces. But, when I have tags of this form: <tag>some characters in here</tag> ...
2
by: Carlitos | last post by:
Hi there, A class in Xerces J-API (Java) called TextImpl contains a property that returns whether the text is ignorable whitespace...
2
by: Carlitos | last post by:
Hi there, A class in Xerces J-API (Java) called TextImpl contains a property that returns whether the text is ignorable whitespace...
56
by: infidel | last post by:
Where are they-who-hate-us-for-our-whitespace? Are "they" really that stupid/petty? Are "they" really out there at all? "They" almost sound like a mythical caste of tasteless heathens that "we"...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.