472,373 Members | 1,547 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,373 software developers and data experts.

minidom and pulldom

I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
my browser, but I get errors when I call minidom.parse, or try to get
events with pulldom. Is there a parser that is as forgiving as web
browsers?
Jul 18 '05 #1
4 1664
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
my browser, but I get errors when I call minidom.parse, or try to get
events with pulldom. Is there a parser that is as forgiving as web
browsers?


minidom is an XML parser. Most Web pages are not XML, but some form of
HTML.

You should have better chances with parsing HTML using htmllib.

Regards,
Martin

Jul 18 '05 #2
ma****@v.loewis.de (Martin v. Löwis) writes:
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
[...] minidom is an XML parser. Most Web pages are not XML, but some form of
HTML.

You should have better chances with parsing HTML using htmllib.


Or, better, HTMLParser.HTMLParser -- works better with XHTML.

If you don't mind dependencies and want a document tree, a good plan
is to shove everything through mxTidy or uTidylib to generate XHTML,
then use the XML API of your choice.
John
Jul 18 '05 #3
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
my browser, but I get errors when I call minidom.parse, or try to get
events with pulldom. Is there a parser that is as forgiving as web
browsers?


Didn't this get answered just the other day?

minidom and pulldom are built on XML parsers. HTML is not XML.

If you want a tree, I recommend using pushing the HTML through mxTidy
or uTidylib, and feeding the resultant XHTML to the XML API of your
choice.
John
Jul 18 '05 #4
jj*@pobox.com (John J. Lee) writes:
[...]
Didn't this get answered just the other day?

[...]

Whoops, local news trouble, I guess.
John
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Paul Miller | last post by:
We've run into minidom's inabilty to handle large (20+MB) XML files, and need a replacement that can handle it. Unfortunately, we're pretty dependent on a DOM, so a pulldom or SAX replacement is...
0
by: xtian | last post by:
Hi - I'm doing some data conversion with minidom (turning a csv file into a specific xml format), and I've hit a couple of small problems. 1: The output format has a header with some xml that...
3
by: Sunil Movva | last post by:
I have an application that uses xml to communicate between threads. One of the threads in my app creates an xml message and sends it to a second thread. This second thread parses the message and...
5
by: Mike McGavin | last post by:
Hi everyone. I've been trying for several hours now to get minidom to parse namespaces properly from my stream of XML, so that I can use DOM methods such as getElementsByTagNameNS(). For some...
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
8
by: jog | last post by:
Hi, I want to get text out of some nodes of a huge xml file (1,5 GB). The architecture of the xml file is something like this <parent> <page> <title>bla</title> <id></id> <revision> <id></id>...
0
by: Greg Copeland | last post by:
I am attempting to freeze an application which uses the dom.minidom parser. When I execute my application, I get an import error of: ImportError: No module named dom.minidom. During the freeze...
0
by: Gary | last post by:
Howdy I ran into a difference between Python on Windows XP and Linux Fedora 6. Writing a dom to xml with minidom works on Linux. It gives an error on XP if there is an empty namespace. The...
0
by: susan_ali | last post by:
I'm using xml.dom.pulldom to parse through an XML file. I use expandNode() to scrutinize certain blocks of it that I'm interested in. Once I find a block of XML in the input file that I'm...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
1
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
2
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...
1
by: Johno34 | last post by:
I have this click event on my form. It speaks to a Datasheet Subform Private Sub Command260_Click() Dim r As DAO.Recordset Set r = Form_frmABCD.Form.RecordsetClone r.MoveFirst Do If...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...
0
by: jack2019x | last post by:
hello, Is there code or static lib for hook swapchain present? I wanna hook dxgi swapchain present for dx11 and dx9.
0
DizelArs
by: DizelArs | last post by:
Hi all) Faced with a problem, element.click() event doesn't work in Safari browser. Tried various tricks like emulating touch event through a function: let clickEvent = new Event('click', {...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.