On Wed, 29 Oct 2008 09:45:31 -0700 (PDT), luca72 <lu*******@libero.itwrote:
Hello
I try to use beautifulsoup
i have this:
sito = urllib.urlopen('http://www.prova.com/')
esamino = BeautifulSoup(sito)
luca = esamino.findAll('tr', align='center')
print luca[0]
[The following long string has been wrapped.]
>>><tr align="center"><th width="5%"><a onclick="t('Only|G|BoT|05','#1');"
href="#">#1</a></th><td width="10%">44.4MB</td>
<td width="90%" align="left">
<font color="orange"Pc-prova.rar </font></td></tr>
>
I need to get the following information:
1)Only|G|BoT|05
2)#1
3)44.4MB
4)Pc-prova.rar
with: print luca[0].a.string i get #1
with print luca[0].td.string i get 44.4MB
can you explain me how to get the others two value
Like you, I struggle with BeautifulSoup; but perhaps this will help
while waiting for somebody smarter to join the thread:
>>soup = BeautifulSoup.BeautifulSoup(
.... """<tr align="center"><th width="5%">"""
.... """<a onclick="t('Only|G|BoT|05','#1');" href="#">#1</a>"""
.... """</th><td width="10%">44.4MB</td><td width="90%" align="left">"""
.... """<font color="orange"Pc-prova.rar </font></td></tr>""" )
>>tr = soup.findAll( 'tr' )
tr[0].findAll( text = True )
[u'#1', u'44.4MB', u' Pc-prova.rar ']
>>c = tr[0].findChild( attrs={"onclick": True} )
print c[ "onclick" ]
t('Only|G|BoT|05','#1');
--
To email me, substitute nowhere->spamcop, invalid->net.