473,385 Members | 1,907 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

HTMLDocument and Xpath

Hi, I want to use xpath to scrape info from a website using pyXML but I
keep getting no results.

For example, in the following, I want to return the text "Element1" I
can't get xpath to return anything at all. What's wrong with this
code?

--------------------
from xml.dom.ext.reader import HtmlLib
from xml.xpath import Evaluate

reader = HtmlLib.Reader()
doc_node = reader.fromString("""
<html>
<head>
<title>Python Programming Language</title>
</head>
<body>
<table><tr><td>element1</td></tr></table>
</body>
</html>
""")

test = Evaluate('td', doc_node.documentElement)
print "test =", test
------------

All I get is an empty list for output.

Thx in advance

Shawn

Feb 3 '06 #1
3 3054
[sw*****@acs.on.ca]
Hi, I want to use xpath to scrape info from a website using pyXML but I
keep getting no results.

For example, in the following, I want to return the text "Element1" I
can't get xpath to return anything at all. What's wrong with this
code?
Your xpath expression is wrong.
test = Evaluate('td', doc_node.documentElement)


Try one of the following alternatives, all of which should work.

test = Evaluate('//td', doc_node.documentElement)
test = Evaluate('/html/body/table/tr/td', doc_node.documentElement)
test = Evaluate('/html/body/table/tr/td[1]', doc_node.documentElement)

HTH,

Alan.

Feb 3 '06 #2

Alan Kennedy wrote:
[sw*****@acs.on.ca]
Hi, I want to use xpath to scrape info from a website using pyXML but I
keep getting no results.

For example, in the following, I want to return the text "Element1" I
can't get xpath to return anything at all. What's wrong with this
code?


Your xpath expression is wrong.
test = Evaluate('td', doc_node.documentElement)


Try one of the following alternatives, all of which should work.

test = Evaluate('//td', doc_node.documentElement)
test = Evaluate('/html/body/table/tr/td', doc_node.documentElement)
test = Evaluate('/html/body/table/tr/td[1]', doc_node.documentElement)

HTH,

Alan.


I tried all of those and in every case, test returns "[]". Does
Evaluate only work with XML documents?

Shawn

Feb 3 '06 #3
Got the answer - there's a bug in xpath. I think the HTML parser
converts all the tags (but not the attributes) to uppercase. Xpath
definitely does not like my first string but, these work fine:

test = Evaluate('//TD', doc_node.documentElement)
test = Evaluate('/HTML/BODY/TABLE/TR/TD', doc_node.documentElement)
test = Evaluate('/HTML/BODY/TABLE/TR/TD[1]', doc_node.documentElement)

Shawn

Feb 7 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: James Marshall | last post by:
I'm writing a library where I want to override document.write(), but for all document objects; thus, I want to put it in the prototype. I tried Document.prototype.write= my_doc_write ; but it...
9
by: marc | last post by:
I think, it's very simple to do, but i don't know how ?! i've got a html page which contains : <div id="myDiv"> <hr /> <hr /> <hr /> <hr /> </div>
1
by: kavitha | last post by:
Can we construct a HTMLDocument? I mean something like this string outerHTML = "<HTML><BODY>Some sample text...</BODY></HTML>"; HTMLDocument doc = new HTMLDocument(); doc.addElement("HTML");...
8
by: pierre | last post by:
Hi, I got a problem which may easy to resolve, but I can't find any issue: I want to parse html files, so, I want first get it from an url, and I do like that: Dim objMSHTML As New...
0
by: Filippo Bettinaglio | last post by:
VS2005, C# I have developed a UserControl embedded in a HTML web page. And I can access to the DOM with the following code: HTML page: …….. <BODY onload=loadDoc()> …….
2
by: Paul Hemans | last post by:
I am very new at .Net. I have a small project where I need to manipulate the contents of a web page. I have a form with a web browser control (webBrowser1) on it. Within the...
9
by: Le Minh | last post by:
Hi, i want to write a program. Input of this is HTML source code of a web page and output is a treeview representation it structure. I want to write it with HtmlDocument in .net framework 2.0. how...
5
by: Jeff | last post by:
Is there a standard way of getting the HTMLDocument object representation of a remote page using Javascript? If I request an HTML page, the xmlHttpRequest returns either text or an XMLDocument. I...
0
by: nickin4u | last post by:
I have a application that is used to automate certain task, I have been using mshtml.HTMLDocument class but certain events like click a button do not fire. I have tried a number of combinations but...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.