Go********@gmail.com wrote:
I'm trying to get the data on the "Central London Property Price Guide"
box at the left hand side of this page
http://www.findaproperty.com/regi0018.html
I have managed to get the data :) but when I start looking for tables I
only get tables of depth 1 how do I go about accessing inner tables?
same happens for links...
this is what I've go so far
import sys
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
data = urlopen('http://www.findaproperty.com/regi0018.html').read()
soup = BeautifulSoup(data)
for tables in soup('table'):
table = tables('table')
if not table: continue
print table #this returns only 1 table
There's something fishy here. soup('table') should yield all the tables
in the document, even nested ones. For example, this program:
data = '''
<body>
<table width='100%'>
<tr><td>
<TABLE WIDTH='150'>
<tr><td>Stuff</td></tr>
</table>
</td></tr>
</table>
</body>
'''
from BeautifulSoup import BeautifulSoup as BS
soup = BS(data)
for table in soup('table'):
print table.get('width')
prints:
100%
150
Another tidbit - if I open the page in Firefox and save it, then open
that file into BeautifulSoup, it finds 25 tables and this code finds the
table you want:
from BeautifulSoup import BeautifulSoup
data2 = open('regi0018-firefox.html')
soup = BeautifulSoup(data2)
print len(soup('table'))
priceGuide = soup('table', dict(bgcolor="#e0f0f8", border="0",
cellpadding="2", cellspacing="2", width="150"))[1]
print priceGuide.tr
prints:
25
<tr><td bgcolor="#e0f0f8" valign="top"><font face="Arial"
size="2"><b>Central London Property Price Guide</b></font></td></tr>
Looking at the saved file, Firefox has clearly done some cleanup. So I
think you have to look at why BS is not processing the original data the
way you want. It seems to be choking on something.
Kent