Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old March 8th, 2006, 03:35 AM
Xavier
Guest
 
Posts: n/a
Default Parsing HTML file with xerces2-j

Hi,

I've just download Xerces2 Java and I'd like to parse an HTML file using
the HTMLDOMImplementation found in the org.apache.html.dom package.

First I try :

DOMImplementationRegistry registry =
DOMImplementationRegistry.newInstance();

DOMImplementation domImpl =
(DOMImplementation)registry.getDOMImplementation(" HTML");

but it doesn't find any DOM Implementation for HTML.
Then I try :

HTMLDOMImplementation domImpl =
HTMLDOMImplementationImpl.getHTMLDOMImplementation ();

DOMImplementationLS domImplLS =
(DOMImplementationLS)domImpl.getFeature("LS","3.0" );

LSParser parser =
domImplLS.createLSParser(DOMImplementationLS.MODE_ SYNCHRONOUS, null);

Document document = parser.parseURI("C:\\test.html");

but I don't know how to get an instance of HTMLDocument to use the HTML
DOM interfaces.

thks
  #2  
Old March 8th, 2006, 06:05 AM
Joe Kesselman
Guest
 
Posts: n/a
Default Re: Parsing HTML file with xerces2-j

If you want to parse HTML, you want the NekoHTML parser rather than
normal Xerces. (HTML is not an XML language, though XHTML is.)
  #3  
Old March 8th, 2006, 08:55 AM
Xavier
Guest
 
Posts: n/a
Default Re: Parsing HTML file with xerces2-j

In fact, I should start at the beginning :

Because w3c DOM interface is too generic (createElement(),
getAttribute(), ...), I want to create my own DOM Implementation (like
HTML4.01/XHTML1.0, MathML or SVG...) from my own schema.

I'll use XercesJ but I don't know how to do.

In XercesJ 2.8 distribution can be found DOM HTML and DOM WML.

I try the 'HTMLDOMImplementation' to parse HTML file and use the HTML
DOM Interface.

Then I should be inspired for my own DOMImplementation.

But I'm not able to use the HTML DOM Implementation.

Could somebody help me ?

Thks
 

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles