473,399 Members | 3,832 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

html dom representation

can any one help me??

I have an html document. i need the java code for the dom tree representation of that document. if i got the dom tree representation i can proceed with the next step of finding the path of each node

please help me/...
Oct 5 '11 #1
2 2590
Frinavale
9,735 Expert Mod 8TB
Have you considered using the following classes to manage DOM manipulation with Java?
  • DocumentBuilderFactory
  • DocumentBuilder
  • Document

They are in the javax.xml.parsers package.

-Frinny
Oct 14 '11 #2
thank u for ur reply..

Actually i want to do it using htmlparser1.6 library in java.

i have an html document.. i got the dom tree representation of the document using the following code. but i need the path of each node of the dom tree.

the code is as follows..
Expand|Select|Wrap|Line Numbers
  1. import java.io.Serializable;
  2. import java.net.HttpURLConnection;
  3. import java.net.URLConnection;
  4.  
  5. import org.htmlparser.Parser;
  6. import org.htmlparser.Node;
  7. import org.htmlparser.NodeFilter;
  8. import org.htmlparser.filters.TagNameFilter;
  9. import org.htmlparser.filters.NodeClassFilter;
  10. import org.htmlparser.http.ConnectionManager;
  11. import org.htmlparser.http.ConnectionMonitor;
  12. import org.htmlparser.http.HttpHeader;
  13. import org.htmlparser.lexer.Lexer;
  14. import org.htmlparser.lexer.Page;
  15. import org.htmlparser.util.DefaultParserFeedback;
  16. import org.htmlparser.util.IteratorImpl;
  17. import org.htmlparser.util.NodeIterator;
  18. import org.htmlparser.util.NodeList;
  19. import org.htmlparser.util.ParserException;
  20. import org.htmlparser.util.ParserFeedback;
  21. import org.htmlparser.util.EncodingChangeException;
  22. import org.htmlparser.visitors.NodeVisitor;
  23. import org.htmlparser.tags.*;
  24. import org.htmlparser.nodes.*;
  25. import org.htmlparser.Tag;
  26. import org.htmlparser.Text;
  27.  
  28. public class SimpleParser2 {
  29.  static String str="";
  30.     public static void main (String [] args)throws ParserException{
  31.         Parser parser = null;
  32.         NodeFilter filter = null;
  33.  
  34.  
  35.  
  36.  
  37.         if (args.length < 1 || args[0].equals ("-help")) {
  38.         System.out.println ("HTML Parser v" + Parser.getVersion () + "\n");
  39.  
  40.     }
  41.         else
  42.             try {
  43.         parser = new Parser ();
  44.         if (1 < args.length)
  45.             filter = new TagNameFilter (args[1]);
  46.         else
  47.             {
  48.             filter = null;
  49.             parser.setFeedback (Parser.STDOUT);
  50.             Parser.getConnectionManager ().setMonitor (parser);
  51.             }
  52.  
  53.         parser.setResource (args[0]);
  54.         NodeList list = parser.parse(filter);
  55.  
  56.                         for (NodeIterator i = list.elements (); i.hasMoreNodes (); )
  57.                             processMyNodes (i.nextNode ());
  58.  
  59.                    }
  60.             catch (EncodingChangeException ece) {
  61.         try {
  62.  
  63.             parser.reset ();
  64.             NodeList list = parser.parse(filter);
  65.             for (NodeIterator i = list.elements (); i.hasMoreNodes (); )
  66.             processMyNodes (i.nextNode ());
  67.         }
  68.         catch (ParserException e) {
  69.             e.printStackTrace ();
  70.         }
  71.             }
  72.             catch (ParserException e) {
  73.                 e.printStackTrace ();
  74.             }
  75.     }
  76.     static void processMyNodes (Node node) throws ParserException{
  77.  
  78.         if (node instanceof TextNode)
  79.         {
  80.             TextNode text = (TextNode)node;
  81.     str=str+text.getText();
  82.  
  83.     //System.out.println("Tree Nodes"+str);
  84.         }
  85.         if (node instanceof RemarkNode)
  86.         {          
  87.           RemarkNode remark = (RemarkNode)node;
  88.         }
  89.         else if (node instanceof TagNode)
  90.         {        
  91.             TagNode tag = (TagNode)node;
  92.  
  93.              str=str+tag.getTagName();       
  94.                //   System.out.println("Tree Nodes"+str);
  95.  
  96.             NodeList nl = tag.getChildren ();
  97.             if (null != nl)
  98.                 for (NodeIterator i = nl.elements (); i.hasMoreNodes(); )
  99.     {    
  100.                     processMyNodes (i.nextNode ());
  101.     }
  102. System.out.println("\nTree Nodes::\n"+str);
  103.         }
  104.  
  105.     }
  106. }

Can u help me to find get the path of each node as a String?
Oct 16 '11 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

5
by: Donald Firesmith | last post by:
Are html tags allowed within meta tags? Specifically, if I have html tags within a <definition> tag within XML, can I use the definition as the content within the <meta content="description> tag? ...
0
by: Phlip | last post by:
XMLians: Suppose I have an Ant script, and I want to convert it to a quicky HTML representation of itself. For esthetics, and eventually for editing and launching events, are there any XSLT...
10
by: Mantorok Redgormor | last post by:
I always see posts that involve the representation of integers, where some poster claims that the unerlyding representation of an integer doesn't have to reflect on the actual integer, for example:...
4
by: John Dalberg | last post by:
I have a weird problem. I have a webpage that displays a datagrid with data but when I use IE to view the html representation, the source has no table source for the datagrid. I used Opera, Mozilla...
6
by: John Dalberg | last post by:
Why does IE not show the full html source when I try to view the source?? I mean why does it hide a few features. Previously I had an issue where the data grid html representation was hidden in...
11
by: pemo | last post by:
Ambiguous? I have a student who's asked me to explain the following std text (esp. the footnote). 6.2.6.1.5 Certain object representations need not represent a value of the object type. If...
3
by: lcjohnso | last post by:
Hi all, Does anyone know if there is an easy way to create the html representation of an HTMLElement object in javascript? I'm attempting to update the innerHTML property of a div element to...
12
by: Vadim | last post by:
Hi! I am looking for HTML validator with the following restrictions: 1. Web server is the localhost (page should be validated locally). 2. The page is dynamic (generated by PHP with client side...
3
by: shajias | last post by:
Hi , I am trying to parse HTML data and retrive the contents. I am facing a problem which I have explained below. I have imported HTMLParser class and using the handle_data function. The issue...
1
by: Ari Krupnik | last post by:
While working on a commercial product, I came up with a way to allow users to drag and drop bits of web pages into my site. This works regardless of whether the source is from my site or not, and...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.