473,405 Members | 2,167 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

No parsing-result of HTML into XHTML

1
Hi,

I have a big problem with parsing HTML into a XHTML using Cberneko to validate the html.

First I tried to work with a HTML-File. This solutions works fine:

Expand|Select|Wrap|Line Numbers
  1.         String aHTMLFile = "file:\\C:/work/Eclipse3.1.1/html-file.html";
  2.         org.xml.sax.InputSource pSource  = new InputSource(aHTMLFile);
  3.  
  4.     org.cyberneko.html.HTMLConfiguration htmlConfig = new HTMLConfiguration();
  5.         org.apache.xerces.parsers.DOMParser parser = new DOMParser(htmlConfig);
  6.  
  7.         //setting DOMParser
  8.         parser.setProperty(DruckKonstanten.CYBERNEKO_PROP_ELEMS, "lower");
  9.         parser.setProperty(DruckKonstanten.CYBERNEKO_PROP_ATTRIB, "lower");
  10.  
  11.     //parse and validating HTML into XHTML 
  12.         parser.parse(pSource);
  13.  
  14.     //XHTML-Doc into JDOM-Document 
  15.         org.jdom.input.DOMBuilder builder = new org.jdom.input.DOMBuilder(); 
  16.         org.jdom.Document document = builder.build(parser.getDocument());
  17.  
  18. ..


But when I try to use a string of html instead of a file via StringReader, the parser ignores the html. There is no error occuring but I miss the result of xhtml. It seems the parser eats it up:
Expand|Select|Wrap|Line Numbers
  1. ..
  2.  
  3.         String aHTMLStr = "<p>Example</p>";
  4.         org.xml.sax.InputSource pSource  = new InputSource(new StringReader(aHTMLStr));
  5.  
  6.     org.cyberneko.html.HTMLConfiguration htmlConfig = new HTMLConfiguration();
  7.         org.apache.xerces.parsers.DOMParser parser = new DOMParser(htmlConfig);
  8.  
  9.         //setting DOMParser
  10.         parser.setProperty(DruckKonstanten.CYBERNEKO_PROP_ELEMS, "lower");
  11.         parser.setProperty(DruckKonstanten.CYBERNEKO_PROP_ATTRIB, "lower");
  12.  
  13.     //parse and validating HTML into XHTML 
  14.         parser.parse(pSource);
  15.  
  16.     //XHTML-Doc into JDOM-Document 
  17.         org.jdom.input.DOMBuilder builder = new org.jdom.input.DOMBuilder(); 
  18.         org.jdom.Document document = builder.build(parser.getDocument());
  19.  
  20. ..

Is there anybody who can give me a hint to solve my problem?
I have no idea and no hint to look up for anymore. :-((((

Intel/Windows NT/Java 5

Thanks for your help in advance.
Sep 8 '06 #1
0 1970

Sign in to post your reply or Sign up for a free account.

Similar topics

8
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $...
15
by: Freddie | last post by:
Happy new year! Since I have run out of alcohol, I'll ask a question that I haven't really worked out an answer for yet. Is there an elegant way to turn something like: > moo cow "farmer john"...
0
by: Pentti | last post by:
Can anyone help to understand why re-parsing occurs on a remote database (using database links), even though we are using a prepared statement on the local database: Scenario: ======== We...
4
by: Earl | last post by:
I'm curious if there are others who have a better method of accepting/parsing phone numbers. I've used a couple of different techniques that are functional but I can't really say that I'm totally...
3
by: David Svoboda | last post by:
I have a server program that takes commands and acts on them. The server program can also take these commands from an input file or standard input (mainly for testing purposes). As such, I often...
5
by: randy | last post by:
Can some point me to a good example of parsing XML using C# 2.0? Thanks
3
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
7
by: Daniel Fetchinson | last post by:
Many times a more user friendly date format is convenient than the pure date and time. For example for a date that is yesterday I would like to see "yesterday" instead of the date itself. And for...
1
by: eyeore | last post by:
Hello everyone my String reverse code works but my professor wants me to use pop top push or Stack code and parsing code could you please teach me how to make this code work with pop top push or...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.