By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,594 Members | 1,452 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,594 IT Pros & Developers. It's quick & easy.

beutifulsoup

P: n/a
Hello
I try to use beautifulsoup
i have this:
sito = urllib.urlopen('http://www.prova.com/')
esamino = BeautifulSoup(sito)
luca = esamino.findAll('tr', align='center')

print luca[0]
>><tr align="center"><th width="5%"><a onclick="t('Only|G|BoT|05','#1');" href="#">#1</a></th><td width="10%">44.4MB</td><td width="90%" align="left"><font color="orange"Pc-prova.rar </font></td></tr>
I need to get the following information:
1)Only|G|BoT|05
2)#1
3)44.4MB
4)Pc-prova.rar
with: print luca[0].a.string i get #1
with print luca[0].td.string i get 44.4MB
can you explain me how to get the others two value
Thanks
Luca
Oct 29 '08 #1
Share this Question
Share on Google+
5 Replies


P: n/a
On Wed, 29 Oct 2008 09:45:31 -0700 (PDT), luca72 <lu*******@libero.itwrote:
Hello
I try to use beautifulsoup
i have this:
sito = urllib.urlopen('http://www.prova.com/')
esamino = BeautifulSoup(sito)
luca = esamino.findAll('tr', align='center')

print luca[0]
[The following long string has been wrapped.]
>>><tr align="center"><th width="5%"><a onclick="t('Only|G|BoT|05','#1');"
href="#">#1</a></th><td width="10%">44.4MB</td>
<td width="90%" align="left">
<font color="orange"Pc-prova.rar </font></td></tr>
>
I need to get the following information:
1)Only|G|BoT|05
2)#1
3)44.4MB
4)Pc-prova.rar
with: print luca[0].a.string i get #1
with print luca[0].td.string i get 44.4MB
can you explain me how to get the others two value
Like you, I struggle with BeautifulSoup; but perhaps this will help
while waiting for somebody smarter to join the thread:
>>soup = BeautifulSoup.BeautifulSoup(
.... """<tr align="center"><th width="5%">"""
.... """<a onclick="t('Only|G|BoT|05','#1');" href="#">#1</a>"""
.... """</th><td width="10%">44.4MB</td><td width="90%" align="left">"""
.... """<font color="orange"Pc-prova.rar </font></td></tr>""" )
>>tr = soup.findAll( 'tr' )
tr[0].findAll( text = True )
[u'#1', u'44.4MB', u' Pc-prova.rar ']
>>c = tr[0].findChild( attrs={"onclick": True} )
print c[ "onclick" ]
t('Only|G|BoT|05','#1');
--
To email me, substitute nowhere->spamcop, invalid->net.
Oct 29 '08 #2

P: n/a
Peter Pearson wrote:
Like you, I struggle with BeautifulSoup
Well, there's always lxml.html if you need it.

http://codespeak.net/lxml/

Stefan
Oct 30 '08 #3

P: n/a
On 29 Okt., 17:45, luca72 <lucabe...@libero.itwrote:
Hello
I try to use beautifulsoup
i have this:
sito = urllib.urlopen('http://www.prova.com/')
esamino = BeautifulSoup(sito)
luca = esamino.findAll('tr', align='center')

print luca[0]
><tr align="center"><th width="5%"><a onclick="t('Only|G|BoT|05','#1');" href="#">#1</a></th><td width="10%">44.4MB</td><td width="90%" align="left"><font color="orange"Pc-prova.rar </font></td></tr>

I need to get the following information:
1)Only|G|BoT|05
2)#1
3)44.4MB
4)Pc-prova.rar
with: print luca[0].a.string i get #1
with print luca[0].td.string i get 44.4MB
can you explain me how to get the others two value
Thanks
Luca
The same way you got `luca`

1,2) luca.find("a")["onclick"].split("'") and search through the
result list
3) luca.find("td").string
4) luca.find("font").string
Oct 30 '08 #4

P: n/a
hello
Another stupit question instead of use
sito = urllib.urlopen('http://www.prova.com/')
esamino = BeautifulSoup(sito)

i do
sito = urllib.urlopen('http://onlygame.helloweb.eu/')
file_sito = open('sito.html', 'wb')
for line in sito :
file_sito.write(line)
file_sito.close()

how can i pass the file sito.html to beautifulsoup?

Regards

Luca

Oct 30 '08 #5

P: n/a
On 30 Okt., 18:28, luca72 <lucabe...@libero.itwrote:
hello
Another stupit question instead of use
sito = urllib.urlopen('http://www.prova.com/')
esamino = BeautifulSoup(sito)

i do
*sito = urllib.urlopen('http://onlygame.helloweb.eu/')
*file_sito = open('sito.html', 'wb')
*for line in sito :
* * *file_sito.write(line)
*file_sito.close()

how can i pass the file sito.html to beautifulsoup?

Regards

Luca
download = urllib.urlopen("http://www.fiber-space.de/downloads/
downloads.html")
BeautifulSoup(download.read())

Ciao
Oct 30 '08 #6

This discussion thread is closed

Replies have been disabled for this discussion.