Hy guys,
I'm using the python-framework BeautifulSoup(B S) to parse some
information out of a german soccer-website.
I spend some qualitiy time with the BS-docs, but I couldn't really
figure out how to get what I was looking for.
Here's the deal:
I want to parse the article shown on the website. To do so I want to
use the Tag " <div class="txt_flie sstext">" as a starting-point. When
I have found the Tag I somehow want to get all following "br"-Tags
until there is a new CSS-Class Style is coming up.
I tried several options in the findAll()-command, but nothing seems to
work.(like: soup.findAll('b r',attrs={'clas s':'txt_fliesst ext'}, text
=True) - This one comes with a thound addtional Tag that I don't want
to have, or soup.findAll(at trs={'class':'t xt_fliesstext'} ) - This
gives me a much better Result, but in this case I only get some few
Tags, instead of all the Tags I want)
Any suggestions?
Thanks in advance!
Website:
http://www.bundesliga.de/de/liga/new...hp?f=94820.php
Some html-code of the website:
<div id="area_headli ne">
<div class="txt_head line_red">Erst Höhenflug, dann Absturz</
div>
</div>
<div id="area_fliess text">
<div class="txt_flie sstext_bold">Mi t 28 Punkten stand der KSC
nach der Hinrunde sensationell auf Platz 6.</div>
<br><br>
<div class="txt_flie sstext">Doch in der Rückrunde brachen
die Badener regelrecht ein und holten nur noch 15 Zähler.<br />
<br />
43 Punkte reichten am Ende für den 11. Tabellenplatz, ein mehr
als respektables Ergebnis für einen Aufsteiger.<br />
<br />