in message <iJ************ *************** ***@comcast.com >, Joe Kesselman
('k************ *@comcast.net') wrote:
Simon Brooke wrote:
>The DOM API has included public Node importNode(Node ,boolean) as a
method of the Document interface for a long time. Does anything actually
implement it?
Certainly should work; I wrote Xerces' first implementation of that
function, and in fact was one of those who lobbied the DOM WG to include
it in the standard. If the node being imported properly implements the
DOM APIs, and the implementation being imported into doesn't have some
reason for blocking this (eg, that it's specifically a read-only DOM,
such as the DOM view of Xalan's internal data model), the function
should work. It isn't rocket science, after all; it's just a tree-walker
feeding a tree-builder.
I have to believe the problem resides in something you haven't told us.
OK, then I have to believe that, too. Furthermore, this is another of the
bits of my code that have been around for a long time (since 2003 in this
case), and I'm sure it used to work (but it may only ever have worked with
Crimson). I have had occasions in the past where I have inadvertently
depended on bugs in a library, and when that library has been fixed all my
code broke.
If this class fails, it returns a text node with a 'flat' representation of
the embedded markup. Looking at the production server logs I see that it
has been intermittently failing in this way for some time, but that the
failure simply has not been noticed. The failure on the production servers
is different from the failure on the development server, I'll detail that
difference below. The production severs use Crimson to parse, but Xerces
to construct documents - I can't remember why, but probably just an
oversight.
The class in question is:
//*************** *************** *************** *************** ***********\
// *
// MaybeParseGener ator.java *
// *
// Author: Simon Brooke *
// Created: 17th January 2003 *
// $Revision: 1.7.4.3 $; $Date: 2006/09/04 13:45:54 $ *
// *
//*************** *************** *************** *************** ***********/
package uk.co.weft.domu til;
import org.w3c.dom.Doc ument;
import org.w3c.dom.Nod e;
import org.xml.sax.Inp utSource;
import java.io.StringR eader;
import javax.xml.parse rs.DocumentBuil der;
import uk.co.weft.htfo rm.ResourceCons umerImpl;
/*
* $Log: MaybeParseGener ator.java,v $
* Revision 1.7.4.3 2006/09/04 13:45:54 simon
* Added more debugging output. Have an intermittent bug in PRES which may
originate here.
*
* Revision 1.7.4.2 2005/12/30 16:54:00 simon
* EkitWidget now working remarkably well. Still some tidying up to do.
*
* Revision 1.7.4.1 2005/12/23 10:48:33 simon
* Brute force tidy up after CVS server crash: this time it should work.
*
* Revision 1.7 2005/02/05 17:40:17 simon
* Improved diagnostics on failure
*
* Revision 1.6 2004/07/14 12:52:34 simon
* Final commit for 1.10.0
*
* Revision 1.5 2004/06/17 15:10:38 simon
* Extends ResourceConsume rImpl to gain access to grs, etc
*
* Revision 1.4 2003/10/30 12:40:21 simon
* Added debug flag in domutil classes
*
* Revision 1.3 2003/08/20 09:38:35 simon
* Code cleanup with eclipse; mostly removal of exccessive includes
*
* Revision 1.2 2003/07/09 09:32:07 simon
* Initial work on HTML generation of widgets.
*
* Revision 1.1 2003/02/06 11:22:26 simon
* New superclass for node generators which may want to parse XML text.
*/
/**
* Abstract superclass for TextNodeGenerat or and ElementGenerato r, which
may
* want to parse their content. Parsing is potentially expensive, so if
* you're confident the value won't contain XML markup it may be worth
* setting allowEmbeddeMar kup( false).
*
* @author Simon Brooke
* @version $Revision: 1.7.4.3 $ This revision: $Author: simon $
*/
public abstract class MaybeParseGener ator extends ResourceConsume rImpl
{
//~ Instance fields -----------------------------------------------------
/**
* whether or not I'm in debug mode; if I am I may print debugging
* messages to System.err
*/
protected boolean debug = false;
/** By default we allow embedded markup in children */
protected boolean embeddedMarkup = true;
//~ Constructors --------------------------------------------------------
/**
* Creates a new MaybeParseGener ator object.
*/
public MaybeParseGener ator( )
{
// ...nothing...
}
//~ Methods -------------------------------------------------------------
/**
* whether or not to set debugging mode. If true, the generator _may_
* write debugging messages to System.err
*
* @param debug whether or not to set debugging mode
*
* @since Jacquard 1.10
*/
public void setDebug( boolean debug )
{
this.debug = debug;
}
/**
* Do we allow (and parse for) embedded markup within the value of this
* node? default is we do.
*
* @param allow if true, then allow embedded markup within my value
*/
public void allowEmbeddedMa rkup( boolean allow )
{
embeddedMarkup = allow;
}
/**
* Construct a node representing this value. It's perfectly possible (and
* possibly legitimate) that the value of a child should contain embedded
* markup. If so, try to parse a node out of it.
*
* @param doc the document in which the node is to be created
* @param unparsed the string, possibly with embedded markup, to parse
*
* @exception GenerationExcep tion if parsing fails
*/
protected Node maybeParse( Document doc, String unparsed )
throws GenerationExcep tion
{
Node val = doc.createTextN ode( unparsed ); // safe default
if ( debug )
{
System.err.prin tln( "MaybeParseGene rator.maybePars e: parsing [" +
unparsed + "]" );
}
if ( unparsed != null ) // defensive
{
if ( embeddedMarkup && (
// if we allow embedded markup
unparsed.indexO f( "<" ) -1 ) ) // it looks like markup
{
if ( !unparsed.trim( ).startsWith( "<" ) )
{
// nasty: if it contains markup, but
// isn't contained in markup, the
// parser will barf.
unparsed = "<parsed>" + unparsed + "</parsed>";
}
try
{
DocumentBuilder parser = DOMStub.getPars er( );
if ( parser == null )
{
System.err.prin tln( "Could not initialise XML parser" );
}
InputSource i =
new InputSource( new StringReader( unparsed ) );
// i.setCharacterS tream( new StringReader( unparsed ) );
Document parsed = parser.parse( i );
if ( debug )
{
System.err.prin tln( "Parsed document: " +
parsed.toString ( ) );
if ( parsed != null )
{
Node root = parsed.getDocum entElement( );
if ( root != null )
{
System.err.prin tln( "Root node: (" +
root.getClass( ).getName( ) + "): " +
root.toString( ) );
}
}
}
val = doc.importNode( parsed, true );
if ( debug )
{
System.err.prin tln(
"MaybeParseGene rator.maybePars e: parse successful" );
new Printer( ).print( val, System.err );
}
}
catch ( Exception e )
{
System.err.prin tln(
"MaybeParseGene rator.maybePars e(): Could not parse '" +
unparsed + "'as XML" );
e.printStackTra ce( System.err );
}
}
}
return val;
}
}
/* [end of file] */
What I'm getting in the error stream on the development server is (with
parser unconfigured, i.e. using Tomcat's default, which is Xerces; see
below for Crimson):
ElementGenerato r.generate: attempting to parse <div class="Intro">
Here be dragons!
</div>
MaybeParseGener ator.maybeParse : parsing [<div class="Intro">
Here be dragons!
</div>]
Parsed document: [#document: null]
Root node: (org.apache.xer ces.dom.Deferre dElementImpl): [div: null]
MaybeParseGener ator.maybeParse (): Could not parse '<div class="Intro">
Here be dragons!
</div>'as XML
org.w3c.dom.DOM Exception: NOT_SUPPORTED_E RR: The implementation does not
support the requested type of object or operation.
at org.apache.xerc es.dom.CoreDocu mentImpl.import Node(Unknown Source)
at org.apache.xerc es.dom.CoreDocu mentImpl.import Node(Unknown Source)
at
uk.co.weft.domu til.MaybeParseG enerator.maybeP arse(MaybeParse Generator.java: 183)
(with parser configured as org.apache.crim son.tree.DOMImp lementationImpl ):
ElementGenerato r.generate: attempting to parse <div class="Intro">
Here be dragons!
</div>
MaybeParseGener ator.maybeParse : parsing [<div class="Intro">
Here be dragons!
</div>]
Parsed document: org.apache.crim son.tree.XmlDoc ument@e9a0e9a
Root node: <div class="Intro">
Here be dragons!
</div>
MaybeParseGener ator.maybeParse (): Could not parse '<div class="Intro">
Here be dragons!
</div>'as XML
org.w3c.dom.DOM Exception: NOT_SUPPORTED_E RR: The implementation does not
support the requested type of object or operation.
at org.apache.xerc es.dom.CoreDocu mentImpl.import Node(Unknown Source)
at org.apache.xerc es.dom.CoreDocu mentImpl.import Node(Unknown Source)
at
uk.co.weft.domu til.MaybeParseG enerator.maybeP arse(MaybeParse Generator.java: 173)
What's showing up in the production server logs is:
(Firstly, evidence that it sometimes does work):
ElementGenerato r.generate: attempting to parse <div
class="Introduc tion"><p>Copies of documentation issued to licensees is
available in this section.</p></div>
ElementGenerato r.generate: attempting to parse Cockle Bags - further
information
(Secondly, evidence that it sometimes doesn't):
ElementGenerato r.generate: attempting to parse <div class="Introduc tion">
Ayrshire and Dumfrieshire Cyclists Association is a regional
association
of cycling clubs within the structure of Scottish Cycling.
</div>
MayberParseGene rator.maybePars e(): Could not parse '<div
class="Introduc tion">
Ayrshire and Dumfrieshire Cyclists Association is a regional
association
of cycling clubs within the structure of Scottish Cycling.
</div>'as XML
java.lang.NullP ointerException
at org.apache.xerc es.dom.CoreDocu mentImpl.import Node(Unknown
Source)
at org.apache.xerc es.dom.CoreDocu mentImpl.import Node(Unknown
Source)
at org.apache.xerc es.dom.CoreDocu mentImpl.import Node(Unknown
Source)
at
uk.co.weft.domu til.MaybeParseG enerator.maybeP arse(MaybeParse Generator
..java:163)
I've checked the libraries and the two instances above use the same
versions of the same libraries with the same configuration, so why
<div class="Introduc tion"><p>Copies of documentation issued to licensees is
available in this section.</p></div>
parses successfully and
<div class="Introduc tion">
Ayrshire and Dumfrieshire Cyclists Association is a regional
association
of cycling clubs within the structure of Scottish Cycling.
</div>
fails to parse is frankly baffling me.
--
si***@jasmine.o rg.uk (Simon Brooke)
http://www.jasmine.org.uk/~simon/
;; Let's have a moment of silence for all those Americans who are stuck
;; in traffic on their way to the gym to ride the stationary bicycle.
;; Rep. Earl Blumenauer (Dem, OR)