By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,034 Members | 1,325 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,034 IT Pros & Developers. It's quick & easy.

BeautifulSoup: problems with parsing a website

P: n/a
Hy guys,

I'm using the python-framework BeautifulSoup(BS) to parse some
information out of a german soccer-website.
I spend some qualitiy time with the BS-docs, but I couldn't really
figure out how to get what I was looking for.

Here's the deal:
I want to parse the article shown on the website. To do so I want to
use the Tag " <div class="txt_fliesstext">" as a starting-point. When
I have found the Tag I somehow want to get all following "br"-Tags
until there is a new CSS-Class Style is coming up.
I tried several options in the findAll()-command, but nothing seems to
work.(like: soup.findAll('br',attrs={'class':'txt_fliesstext'} , text
=True) - This one comes with a thound addtional Tag that I don't want
to have, or soup.findAll(attrs={'class':'txt_fliesstext'}) - This
gives me a much better Result, but in this case I only get some few
Tags, instead of all the Tags I want)

Any suggestions?
Thanks in advance!

Some html-code of the website:
<div id="area_headline">
<div class="txt_headline_red">Erst Höhenflug, dann Absturz</
<div id="area_fliesstext">
<div class="txt_fliesstext_bold">Mit 28 Punkten stand der KSC
nach der Hinrunde sensationell auf Platz 6.</div>
<div class="txt_fliesstext">Doch in der Rückrunde brachen
die Badener regelrecht ein und holten nur noch 15 Zähler.<br />
<br />
43 Punkte reichten am Ende für den 11. Tabellenplatz, ein mehr
als respektables Ergebnis für einen Aufsteiger.<br />
<br />
Jun 27 '08 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.