H!
I do this to get a htmlTOtext file
class mvbHTMLParser(htmllib.HTMLParser):
def __init__(self, formatter, verbose=0):
htmllib.HTMLParser.__init__(self,formatter,verbose )
self.imglist = []
def handle_image(self,src,alt,*args):
self.imglist.append(src)
file = StringIO.StringIO()
f = formatter.AbstractFormatter(formatter.DumbWriter(f ile))
p = mvbHTMLParser(f)
p.feed(html)
p.close()
print file.getvalue()
But then the _ characters are away.
is it possible to keep that character in file.getvalue()
[the p.anchorlist = oke : test_bla.html]