By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,986 Members | 1,551 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,986 IT Pros & Developers. It's quick & easy.

confused by HTMLParser class

P: n/a
tried all kinds of combos to get this to work.
http://docs.python.org/lib/module-HTMLParser.html

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

def handle_starttag(self, tag, attrs):
print "Encountered the beginning of a %s tag" % tag

def handle_endtag(self, tag):
print "Encountered the end of a %s tag" % tag
from HTMLParser import HTMLParser
import urllib
import myhtmlparser

x = MyHTMLParser(HTMLParser())
site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
for row in site:
print x.handle_starttag()
Jun 27 '08 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On May 28, 11:20 am, globalrev <skanem...@yahoo.sewrote:
tried all kinds of combos to get this to work.
Did you try searching this group? There were recent posts discussing
basic usage of HTMLParser.

Throwing random code together is the least likely way to actually get
it to work.
x = MyHTMLParser(HTMLParser())
site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
for row in site:
print x.handle_starttag()
Why are you passing HTMLParser in to initialise MyHTMLParser?

Why are you iterating over site and expecting your instance of
MyHTMLParser to magically know about it?

Why haven't you read the urllib.urlopen docs, to see you need to do
a .read() to actually get the page data?

Why are you so resistant to reading some basic tutorials?
Jun 27 '08 #2

P: n/a
On May 28, 3:20*am, globalrev <skanem...@yahoo.sewrote:
tried all kinds of combos to get this to work.

http://docs.python.org/lib/module-HTMLParser.html

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

* * def handle_starttag(self, tag, attrs):
* * * * print "Encountered the beginning of a %s tag" % tag

* * def handle_endtag(self, tag):
* * * * print "Encountered the end of a %s tag" % tag

from HTMLParser import HTMLParser
import urllib
import myhtmlparser

x = MyHTMLParser(HTMLParser())
site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
for row in site:
* * print x.handle_starttag()
this works fine to me:
from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

def handle_starttag(self, tag, attrs):
print "Encountered the beginning of a %s tag" % tag

def handle_endtag(self, tag):
print "Encountered the end of a %s tag" % tag

#from HTMLParser import HTMLParser
import urllib
#import mythmlparser

site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
x = MyHTMLParser() # x = MyHTMLParser(HTMLParser())
x.feed(site.read())
x.close()
for row in site:
print x.handle_starttag()
site.close()
You should also read this:
http://www.diveintopython.org/html_p...ting_data.html
for example
Jun 27 '08 #3

P: n/a
globalrev wrote:
tried all kinds of combos to get this to work.
In case you meant to say that you can't get it to work, consider using lxml
instead.

http://codespeak.net/lxml
http://codespeak.net/lxml/lxmlhtml.html

Stefan
Jun 27 '08 #4

This discussion thread is closed

Replies have been disabled for this discussion.