473,287 Members | 1,419 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,287 software developers and data experts.

confused by HTMLParser class

tried all kinds of combos to get this to work.
http://docs.python.org/lib/module-HTMLParser.html

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

def handle_starttag(self, tag, attrs):
print "Encountered the beginning of a %s tag" % tag

def handle_endtag(self, tag):
print "Encountered the end of a %s tag" % tag
from HTMLParser import HTMLParser
import urllib
import myhtmlparser

x = MyHTMLParser(HTMLParser())
site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
for row in site:
print x.handle_starttag()
Jun 27 '08 #1
3 2094
On May 28, 11:20 am, globalrev <skanem...@yahoo.sewrote:
tried all kinds of combos to get this to work.
Did you try searching this group? There were recent posts discussing
basic usage of HTMLParser.

Throwing random code together is the least likely way to actually get
it to work.
x = MyHTMLParser(HTMLParser())
site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
for row in site:
print x.handle_starttag()
Why are you passing HTMLParser in to initialise MyHTMLParser?

Why are you iterating over site and expecting your instance of
MyHTMLParser to magically know about it?

Why haven't you read the urllib.urlopen docs, to see you need to do
a .read() to actually get the page data?

Why are you so resistant to reading some basic tutorials?
Jun 27 '08 #2
On May 28, 3:20*am, globalrev <skanem...@yahoo.sewrote:
tried all kinds of combos to get this to work.

http://docs.python.org/lib/module-HTMLParser.html

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

* * def handle_starttag(self, tag, attrs):
* * * * print "Encountered the beginning of a %s tag" % tag

* * def handle_endtag(self, tag):
* * * * print "Encountered the end of a %s tag" % tag

from HTMLParser import HTMLParser
import urllib
import myhtmlparser

x = MyHTMLParser(HTMLParser())
site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
for row in site:
* * print x.handle_starttag()
this works fine to me:
from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

def handle_starttag(self, tag, attrs):
print "Encountered the beginning of a %s tag" % tag

def handle_endtag(self, tag):
print "Encountered the end of a %s tag" % tag

#from HTMLParser import HTMLParser
import urllib
#import mythmlparser

site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
x = MyHTMLParser() # x = MyHTMLParser(HTMLParser())
x.feed(site.read())
x.close()
for row in site:
print x.handle_starttag()
site.close()
You should also read this:
http://www.diveintopython.org/html_p...ting_data.html
for example
Jun 27 '08 #3
globalrev wrote:
tried all kinds of combos to get this to work.
In case you meant to say that you can't get it to work, consider using lxml
instead.

http://codespeak.net/lxml
http://codespeak.net/lxml/lxmlhtml.html

Stefan
Jun 27 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Adonis | last post by:
When parsing my html files, I use handle_pi to capture some embedded python code, but I have noticed that in the embedded python code if it contains html, HTMLParser will parse it as well, and thus...
11
by: Sean Cody | last post by:
I'm trying to take a webpage that has a nxn table of entries (bus times) and convert it to a 2D array (list of lists). Initially this was simple but I need to be able to access whole 'columns' of...
2
by: Matthew Wilson | last post by:
I want to parse an html file and extract my router's IP address. I wrote this code and I have python 2.3 installed: #! /usr/bin/env python import HTMLParser class...
4
by: Kevin T. Ryan | last post by:
Hi all - I'm somewhat new to python (about 1 year), and I'm trying to write a program that opens a file like object w/ urllib.urlopen, and then parse the data by passing it to a class that...
1
by: Rajarshi Guha | last post by:
Hi, I have some HTML that looks essentially consists of a series of <div>'s and each <div> having one of two classes (tnt-question or tnt-answer). I'm using HTMLParser to handle the tags as: ...
9
by: florent | last post by:
I'm trying to parse html documents from the web, using the HTMLParser class of the HTMLParser module (python 2.3), but some web documents are not fully valids. When the parser finds an invalid tag,...
1
by: Kenneth McDonald | last post by:
I'm writing a program that will parse HTML and (mostly) convert it to MediaWiki format. The two Python modules I'm aware of to do this are HTMLParser and htmllib. However, I'm currently...
3
by: ychaouche | last post by:
Hi, python experts. <console trace> chaouche@CAY:~/TEST$ python nettoyageHTML.py chaouche@CAY:~/TEST$ </console trace> This is the nettoyageHTML.py python script <code>
8
by: jonbutler88 | last post by:
Just writing a simple website spider in python, keep getting these errors, not sure what to do. The problem seems to be in the feed() function of htmlparser. Traceback (most recent call last):...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.