Hi, I want to use xpath to scrape info from a website using pyXML but I
keep getting no results.
For example, in the following, I want to return the text "Element1" I
can't get xpath to return anything at all. What's wrong with this
code?
--------------------
from xml.dom.ext.reader import HtmlLib
from xml.xpath import Evaluate
reader = HtmlLib.Reader()
doc_node = reader.fromString("""
<html>
<head>
<title>Python Programming Language</title>
</head>
<body>
<table><tr><td>element1</td></tr></table>
</body>
</html>
""")
test = Evaluate('td', doc_node.documentElement)
print "test =", test
------------
All I get is an empty list for output.
Thx in advance
Shawn 3 3054
[sw*****@acs.on.ca] Hi, I want to use xpath to scrape info from a website using pyXML but I keep getting no results.
For example, in the following, I want to return the text "Element1" I can't get xpath to return anything at all. What's wrong with this code?
Your xpath expression is wrong.
test = Evaluate('td', doc_node.documentElement)
Try one of the following alternatives, all of which should work.
test = Evaluate('//td', doc_node.documentElement)
test = Evaluate('/html/body/table/tr/td', doc_node.documentElement)
test = Evaluate('/html/body/table/tr/td[1]', doc_node.documentElement)
HTH,
Alan.
Alan Kennedy wrote: [sw*****@acs.on.ca] Hi, I want to use xpath to scrape info from a website using pyXML but I keep getting no results.
For example, in the following, I want to return the text "Element1" I can't get xpath to return anything at all. What's wrong with this code?
Your xpath expression is wrong.
test = Evaluate('td', doc_node.documentElement)
Try one of the following alternatives, all of which should work.
test = Evaluate('//td', doc_node.documentElement) test = Evaluate('/html/body/table/tr/td', doc_node.documentElement) test = Evaluate('/html/body/table/tr/td[1]', doc_node.documentElement)
HTH,
Alan.
I tried all of those and in every case, test returns "[]". Does
Evaluate only work with XML documents?
Shawn
Got the answer - there's a bug in xpath. I think the HTML parser
converts all the tags (but not the attributes) to uppercase. Xpath
definitely does not like my first string but, these work fine:
test = Evaluate('//TD', doc_node.documentElement)
test = Evaluate('/HTML/BODY/TABLE/TR/TD', doc_node.documentElement)
test = Evaluate('/HTML/BODY/TABLE/TR/TD[1]', doc_node.documentElement)
Shawn This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: James Marshall |
last post by:
I'm writing a library where I want to override document.write(), but for
all document objects; thus, I want to put it in the prototype. I tried
Document.prototype.write= my_doc_write ;
but it...
|
by: marc |
last post by:
I think, it's very simple to do, but i don't know how ?!
i've got a html page which contains :
<div id="myDiv">
<hr />
<hr />
<hr />
<hr />
</div>
|
by: kavitha |
last post by:
Can we construct a HTMLDocument? I mean something like
this
string outerHTML = "<HTML><BODY>Some sample
text...</BODY></HTML>";
HTMLDocument doc = new HTMLDocument();
doc.addElement("HTML");...
|
by: pierre |
last post by:
Hi, I got a problem which may easy to resolve, but I can't
find any issue:
I want to parse html files, so, I want first get it from an
url, and I do like that:
Dim objMSHTML As New...
|
by: Filippo Bettinaglio |
last post by:
VS2005, C#
I have developed a UserControl embedded in a HTML web page. And I can
access to the DOM with the following code:
HTML page:
……..
<BODY onload=loadDoc()>
…….
|
by: Paul Hemans |
last post by:
I am very new at .Net. I have a small project where I need to manipulate the
contents of a web page.
I have a form with a web browser control (webBrowser1) on it. Within the...
|
by: Le Minh |
last post by:
Hi, i want to write a program. Input of this is HTML source code of a web
page and output is a treeview representation it structure.
I want to write it with HtmlDocument in .net framework 2.0. how...
|
by: Jeff |
last post by:
Is there a standard way of getting the HTMLDocument object
representation of a remote page using Javascript? If I request an
HTML page, the xmlHttpRequest returns either text or an XMLDocument.
I...
|
by: nickin4u |
last post by:
I have a application that is used to automate certain task,
I have been using mshtml.HTMLDocument class but certain events like click a button do not fire. I have tried a number of combinations but...
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |