By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,766 Members | 1,435 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,766 IT Pros & Developers. It's quick & easy.


P: n/a
All my uses of the HTMLParser class in the standard library have involved
modifying HTML in some way and writing it back out. It would be very
convenient if the standard library had a HTMLPrinter class, defined as

import sys
from xml.sax import saxutils
from HTMLParser import HTMLParser

class HTMLPrinter(HTMLParser):
def __init__(self, outfile=None):
if outfile is None:
self.outfile = sys.stdout
self.outfile = outfile

def handle_data(self, data):

def handle_starttag(self, tag, attrs):
self.outfile.write('<%s' % tag)
for (name,value) in attrs:
self.outfile.write(' %s=%s' % (name, saxutils.quoteattr(value)))

def handle_endtag(self, tag):
self.outfile.write('</%s>' % tag)

def handle_charref(self, name):
self.outfile.write('&#%s;' % name)

def handle_entityref(self, name):
self.outfile.write('&%s;' % name)

# is any quoting needed on comment/decl/pi?

def handle_comment(self, data):
self.outfile.write('<!--%s-->' % data)

def handle_decl(self, decl):
self.outfile.write('<!%s>' % decl)

def handle_pi(self, data):
self.outfile.write('<?%s>' % data)


Such a class would make HTML munging much easier.
For instance:

class RemoveBreaks(HTMLPrinter):
def handle_starttag(self, tag, attrs):
if tag != 'br':
HTMLPrinter.handle_starttag(self, tag, attrs)
HTMLPrinter.handle_data(self, ' ')

def handle_endtag(self, tag):
if tag != 'br':
HTMLPrinter.handle_endtag(self, tag)

The code becomes much clearer since it focuses on the
munging rather than on all the boilerplate HTML printing.
Jul 18 '05 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.