473,385 Members | 2,243 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

DOM and HTML

Hi All,

I am looking for any Python library which can help to get DOM
tree from HTML. Is there any way to access HTML DOM, just like
accessing it using javascript.

Any kind of help is appreciated.

Thanks.
R

Apr 2 '06 #1
5 2420
I do not know much about the HTML DOM....But I think if you just mean
treating HTML like XML and build it into a DOM tree and (Very
important) the HTML file is not a 10000 lines or even longer one, then
go ahead to xml.dom.minidom module for help. It has a basic (and great)
implementation for light-weighted DOM implementation.

Apr 2 '06 #2
"Sullivan WxPyQtKinter" wrote:
go ahead to xml.dom.minidom module for help. It has a basic (and great)
implementation for light-weighted DOM implementation.


that's a rather unusual way to use words like "great" and "light-weight"...

</F>

Apr 2 '06 #3
Ant
I've used Beautiful Soup, and it is a very pythonic way of accessing
the data in the HTML. It is actually very similar to the way you access
the DOM with JS - for example soup.html.body.h1 will give you the first
h1 tag.

There are also various other ways of searching the HTML in XPathish
ways (if XPath used dictionaries and lists...).

http://www.crummy.com/software/BeautifulSoup/

Apr 2 '06 #4
ro*****************@gmail.com wrote:
Hi All,

I am looking for any Python library which can help to get DOM
tree from HTML. Is there any way to access HTML DOM, just like
accessing it using javascript.

Any kind of help is appreciated.

Thanks.
R

Since the browser can't execute anything except Javascript, you
can't get to/manipulate the DOM with anything but Javascript code.
There have been attempts at getting a browser that can execute
Python code, but I don't think they ever really got anywhere.

-Larry
Apr 2 '06 #5
Larry Bates wrote:
ro*****************@gmail.com wrote:

I am looking for any Python library which can help to get DOM
tree from HTML. Is there any way to access HTML DOM, just like
accessing it using javascript.

[...]
Since the browser can't execute anything except Javascript, you
Who said anything about the browser? Accessing a DOM "just like [...]
javascript" can mean a number of things: using an API like the one
JavaScript uses, for example, as well as actually accessing a DOM
associated with a page in a browser.
can't get to/manipulate the DOM with anything but Javascript code.
There have been attempts at getting a browser that can execute
Python code, but I don't think they ever really got anywhere.


Actually, this isn't strictly true either. Disregarding, perhaps
unfairly, recent work on PyXPCOM to integrate Python more tightly with
Mozilla, there are various packages which do access browser DOMs: if
the questioner uses a KDE desktop and isn't averse to installing some
packages, there's qtxmldom [1] which can access the DOM in Konqueror in
association with the kpartplugins distribution [2]; otherwise, I
believe there's a Python package for accessing Internet Explorer's DOM.

And outside browsers, one can still use various packages already
mentioned, in addition to libxml2dom [3] which provides support via
libxml2 for reading HTML and XML, producing a DOM which resembles the
standardised DOM typically available to JavaScript. It shouldn't be
forgotten that PyXML also supports HTML parsing [4], either.

Paul

[1] http://www.boddie.org.uk/python/qtxmldom.html
[2] http://www.boddie.org.uk/python/kpartplugins.html
[3] http://www.boddie.org.uk/python/libxml2dom.html
[4] http://www.boddie.org.uk/python/HTML.html

Apr 2 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: VK | last post by:
09/30/03 Phil Powell posted his "Radio buttons do not appear checked" question. This question led to a long discussion about the naming rules applying to variables, objects, methods and properties...
4
by: Francois Keyeux | last post by:
hello everyone: i have a web site built using vbasic active server scripting running on iis (it works on either iis 50 and 60, but is designed for iis 50) i know how to create a plain text...
1
by: cirillo_curiosone | last post by:
Hi, i'm new to javascript. I started studing it on the web few weeks ago, but still haven't been able to solve one big problem: HOT TO PASS VALUES FROM A SCRIPT VARIABLE TO A CHILD HTML...
33
by: LRW | last post by:
http://gto.ie-studios.net/index.php When you view the above site in IE, if the 1st of the three product images is tall enough to push the cell down a couple of pixels, IE somehow doesn't show...
0
by: Boris Ammerlaan | last post by:
This notice is posted about every week. I'll endeavor to use the same subject line so that those of you who have seen it can kill-file the subject; additionally, Supersedes: headers are used to...
9
by: Patient Guy | last post by:
Taking the BODY element as an example, all of its style attributes ('alink', 'vlink', 'background', 'text', etc.) are deprecated in HTML 4.01, a fact noted in the DOM Level 2 HTML specification. ...
5
by: serge calderara | last post by:
Dear all, I am new in asp.net and prepare myself for exam I still have dificulties to understand the difference between server control and HTML control. Okey things whcih are clear are the fact...
6
by: Guy Macon | last post by:
cwdjrxyz wrote: HTML 5 has solved the above probem. See the following web page: HTML 5, one vocabulary, two serializations http://www.w3.org/QA/2008/01/html5-is-html-and-xml.html
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.