Thanks for the reply. Came to the same conclusion a few minutes before I saw
your email.
Another question:
tr=d.xpath(foo)
gets me an array of nodes.
is there a way for me to then iterate through the node tr[x] to see if a
child node exists???
"d" is a document object, while "tr" would be a node object?, or would i
convert the "tr[x]" to a string, and then feed that into the
libxml2dom.parseString()...
thanks
-----Original Message-----
From: py*****************************************@python .org
[mailto:py***************************************** @python.org]On Behalf
Of Paul Boddie
Sent: Friday, June 13, 2008 12:49 PM
To: py*********@python.org
Subject: Re: python screen scraping/parsing
On 13 Jun, 20:10, "bruce" <bedoug...@earthlink.netwrote:
>[...]
url ="http://www.pricegrabber.com/rating_summary.php/page=1"
tr ="/html/body/div[@id='pgSiteContainer']/div[@id='pgPageContent']/table[2]/tbo
dy/tr[4]"[...]
tr_=d.xpath(tr)
my issue appears to be related to the last "tbody", or tbody/tr[4]...Yes, I can confirm this.
if i leave off the tbody, i can display data, as the tr_ is an array with
data...
with the "tbody" it appears that the tr_ array is not defined, or it hasno
data... however, i can use the DOM tool with firefox to observe the factYes, but the DOM tool in Firefox probably inserts virtual nodes for
that the "tbody" is there...
its own purposes. Remember that it has to do a lot of other stuff like
implement CSS rendering and DOM event models.
You can confirm that there really is no tbody by printing the result
of this...
d.xpath("/html/body/div[@id='pgSiteContainer']/
div[@id='pgPageContent']/table[2]")[0].toString()
This should fetch the second table in a single element list and then
obviously give you the only element of that list. You'll see that the
raw HTML doesn't have any tbody tags at all.
Paul
--
http://mail.python.org/mailman/listinfo/python-list