Tim Arnold wrote:
Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>
I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.
How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).
thanks,
--Tim Arnold
You could also try something like this:
from sgmllib import SGMLParser
class IParse(SGMLParser):
def __init__(self, verbose=0):
SGMLParser.__init__(self, verbose)
self.data = ""
def _attr_to_str(self, attrs):
return ' '.join(['%s="%s"' % a for a in attrs])
def start_body(self, attrs):
self.data += "<body %s>" % self._attr_to_str(attrs)
print "remapping"
self.data += '''<div class="remapped">'''
def end_body(self):
self.data += "</div>" # end remapping
self.data += "</body>"
def handle_data(self, data):
self.data += data
def unknown_starttag(self, tag, attrs):
self.data+="<%s %s>" % (tag, self._attr_to_str(attrs),)
def unknown_endtag(self, tag):
self.data += "</%s>" % tag
if __name__=="__main__":
i = IParse()
i.feed('''
<html>
<body bgcolor="#fffff">
original
<i>italic</i>
<b class="test">contents</b>...
</body>
</html>''');
print i.data
i.close()
just look at the code from sgmllib (standard lib) and it is very easy to
make a parser. for some much needed refactoring