By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,610 Members | 1,601 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,610 IT Pros & Developers. It's quick & easy.

python/xpath question...

P: n/a
for guys with python/xpath expertise..

i'm playing with xpath.. and i'm trying to solve an issue...

i have the following kind of situation where i'm trying to get certain data.

i have a bunch of tr/td...

i can create an xpath, that gets me all of the tr.. i only want to get the
sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to
how this query might be created?..
the idea would be to start at the "Summer B", to skip the 1st "tr", to get
the next "tr"s until you get to the next "Summer" section...

sample data.....

<tr<Th colspan=14 class="soc_comment"Summer B </th</tr>
<!-- START RA.CTLIB(SOCPHDR1) -->
<tr>
<td nowrap valign="bottom" class="colhelp">
<a href="#">Course<span>
<b>Course</b>
<br>Course number and suffix, if applicable.
<br>C = combined lecture and lab course
<br>L = laboratory course
</span></a></td>
</tr>
<!-- END RA.CTLIB(SOCPHDR1) -->
<tr>
<td valign="top" nowrap><a href="javascript:crsdescunderpop('AST1002');">AST
1002</a></td>
</tr>
<tr>
<td valign="top" nowrap><a
href="javascript:crsdescunderpop('AST1022L');">AST 1022L</a></td>
</tr>
<tr>
<td valign="top" nowrap><a
href="javascript:crsdescunderpop('AST1022L');">AST 1022L</a></td>
</tr>
<tr>
<td valign="top" nowrap><a
href="javascript:crsdescunderpop('AST1022L');">AST 1022L</a></td>
</tr>
<tr<Th colspan=14 class="soc_comment"Summer C </th</tr>
<!-- START RA.CTLIB(SOCPHDR1) -->
<tr>
<td nowrap valign="bottom" class="colhelp">
<a href="#">Course<span>
..
..
..

thanks...

-bruce
Jul 6 '06 #1
Share this Question
Share on Google+
3 Replies


P: n/a
bruce wrote:
for guys with python/xpath expertise..

i'm playing with xpath.. and i'm trying to solve an issue...

i have the following kind of situation where i'm trying to get certain data.

i have a bunch of tr/td...

i can create an xpath, that gets me all of the tr.. i only want to get the
sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to
how this query might be created?..
I'm not quite sure how this is supposed to be related to Python, but if you're
trying to find a sibling, what about using the "sibling" axis in XPath?

Stefan
Jul 6 '06 #2

P: n/a
(Damn gmane's authorizor, I think I lost four postings because the
auth messages went to my work email address (and I thought the
authorization was supposed to be one-time only per group anyway??). I
deleted them as spam since I hadn't posted from there for days :-(
Grrr. At least I could reconstruct this one...)

"bruce" <be*******@earthlink.netwrites:
for guys with python/xpath expertise..

i'm playing with xpath.. and i'm trying to solve an issue...

i have the following kind of situation where i'm trying to get certain data.

i have a bunch of tr/td...

i can create an xpath, that gets me all of the tr.. i only want to get the
sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to
how this query might be created?..
[...]

((//tr/th)[2]/../following-sibling::tr/td/..)[count(.|((//tr/th)[3]/../preceding-sibling::*))=count((//tr/th)[3]/../preceding-sibling::*)]
which makes use of the following idiom for writing an intersection:

$set1[count(.|$set2)=count($set2)]
and gets the second group in the sequence you describe. IMHO, this
illustrates what happens when XPath is pushed too far ;-) I don't see
an easier way, but perhaps I missed one.

Example code:

(Note that the expression used here doesn't get any trailing group of
tr elements if there's no terminating tr/th -- that fits your
specification, but may not be what you really wanted. To fix that,
meditate on the above expression for an hour or two <0.8 wink>.)

#---------------------------------------------------------
def xpath(path, source):
import StringIO
import pprint
from lxml import etree
f = StringIO.StringIO(source)
tree = etree.parse(f)
r = tree.xpath(path)
#return "\n".join(etree.tostring(el) for el in r)
return pprint.pformat([etree.tostring(el) for el in r])

simple = """\
<html>
<tr><th>A</th></tr>
<tr><td>B</td></tr>
<tr><td>C</td></tr>
<tr><th>D</th></tr>
<tr><td>E</td></tr>
<tr><td>F</td></tr>
<tr><th>G</th></tr>
<tr><td>H</td></tr>
<tr><td>I</td></tr>
</html>
"""

for i in range(3):
expr = '((//tr/th)[%s]/../following-sibling::tr/td/..)[count(.|((//tr/th)[%s]/../preceding-sibling::*))=count((//tr/th)[%s]/../preceding-sibling::*)]' % (i+1, i+2, i+2)
print "---------------------"
print xpath(expr, simple)
#---------------------------------------------------------
john[0]$ tst.py
---------------------
['<tr><td>B</td></tr>\n', '<tr><td>C</td></tr>\n']
---------------------
['<tr><td>E</td></tr>\n', '<tr><td>F</td></tr>\n']
---------------------
[]
Knowing what you're doing, though, you'd probably be better off with
BeautifulSoup than XPath. Also note that mechanize (which I know
you're using) only supports BeautifulSoup 2 at present. You can't use
BeautifulSoup 3 yet (I hope to fix that 'RSN').
John
Jul 9 '06 #3

P: n/a
Stefan Behnel <st******************@web.dewrites:
[...]
I'm not quite sure how this is supposed to be related to Python, but if you're
trying to find a sibling, what about using the "sibling" axis in XPath?
<nit>
There's no "sibling" axis in XPath. I'm sure you meant
"following-sibling" and/or "preceding-sibling".
</nit>
John
Jul 9 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.