Ben Finney <bi****************@benfinney.id.au> writes:
Joe <di******@lycos.com> wrote: I'm trying to extract part of html code from a tag to a tag
For tag soup, use BeautifulSoup:
<URL:http://www.crummy.com/software/BeautifulSoup/>
Except he's trying to extract an apparently random part of the
file. BeautifulSoup is a wonderful thing for dealing with X/HTML
documents as structured documents, which is how you want to deal with
them most of the time.
In this case, an re works nicely:
import re
s = '<span class="boldyellow"><B><U> and ends with TD><TD> <img src="http://whatever/some.gif"> </TD></TR></TABLE>'
r = re.match('<span class="boldyellow"><B><U>(.*)TD><TD> <img src="http://whatever/some.gif"> </TD></TR></TABLE>', s)
r.group(1)
' and ends with '
String.find also works really well:
start = s.find('<span class="boldyellow"><B><U>') + len('<span class="boldyellow"><B><U>')
stop = s.find('TD><TD> <img src="http://whatever/some.gif"> </TD></TR></TABLE>', start)
s[start:stop]
' and ends with '
Not a lot to choose between them.
<mike
--
Mike Meyer <mw*@mired.org>
http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.