468,741 Members | 1,642 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,741 developers. It's quick & easy.

Having trouble with some lists in BeautifulSoup


Okay, what I want to do with this code is to got to thesaurus.reference.com
and then search for a word and get the syns for it. Now, I can get the syns,
but they are still in html form and some are hyperlinks. But I can't get the
contents out. I am not that familiar with BeautifulSoup. So if anyone wants
to look over this code(if you run it, it will make a lot more sense) and
maybe help me out.

side note: if you run it, a list object will print and what I am after is
the part that starts:

<td colspan="2" widht="100%">american...

Heres the code:

import urllib
from BeautifulSoup import BeautifulSoup

class defSyn:
def __init__(self, word):
self.word = word

def get_syn(term):
soup =
BeautifulSoup(urllib.urlopen('http://thesaurus.reference.com/search?q=%s' %
term))

balls = soup.findAll('table', {'width': '100%'})
print soup.prettify()
for tabs in soup.findAll('table', {'width': '100%'}):
yield tabs.findAll('td', {'colspan': '2'})

self.mainList = list(get_syn(self.word))
print self.mainList[2]
if You have any further questions I would be happy to answer.
--
View this message in context: http://www.nabble.com/Having-trouble...p18497409.html
Sent from the Python - python-list mailing list archive at Nabble.com.

Jul 16 '08 #1
1 1204
Alexnb wrote:
Okay, what I want to do with this code is to got to thesaurus.reference.com
and then search for a word and get the syns for it. Now, I can get the syns,
but they are still in html form and some are hyperlinks. But I can't get the
contents out. I am not that familiar with BeautifulSoup. So if anyone wants
to look over this code(if you run it, it will make a lot more sense) and
maybe help me out.
The thesaurus site may become annoyed if you overdo this.

However, it's not hard to do. Search the output for
an "a" tag with class "noline", then extract the text content
of the "a" tag. The BeautifulSoup manual will tell you how.

If you want raw thesaurus data you can use freely, see
"http://wordnet.princeton.edu".

John Nagle
Jul 18 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Steve Young | last post: by
17 posts views Thread by homepricemaps | last post: by
4 posts views Thread by William Xu | last post: by
9 posts views Thread by Mizipzor | last post: by
11 posts views Thread by John Nagle | last post: by
2 posts views Thread by Frank Stutzman | last post: by
5 posts views Thread by Larry Bates | last post: by
3 posts views Thread by bsagert | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
xarzu
2 posts views Thread by xarzu | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.