473,289 Members | 2,087 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,289 software developers and data experts.

minidom and pulldom

I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
my browser, but I get errors when I call minidom.parse, or try to get
events with pulldom. Is there a parser that is as forgiving as web
browsers?
Jul 18 '05 #1
4 1704
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
my browser, but I get errors when I call minidom.parse, or try to get
events with pulldom. Is there a parser that is as forgiving as web
browsers?


minidom is an XML parser. Most Web pages are not XML, but some form of
HTML.

You should have better chances with parsing HTML using htmllib.

Regards,
Martin

Jul 18 '05 #2
ma****@v.loewis.de (Martin v. Löwis) writes:
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
[...] minidom is an XML parser. Most Web pages are not XML, but some form of
HTML.

You should have better chances with parsing HTML using htmllib.


Or, better, HTMLParser.HTMLParser -- works better with XHTML.

If you don't mind dependencies and want a document tree, a good plan
is to shove everything through mxTidy or uTidylib to generate XHTML,
then use the XML API of your choice.
John
Jul 18 '05 #3
pi***@map.com (David Pinto) writes:
I'm trying to use either the minidom or pulldom to find table tags in
html web pages. I've tried parsing two web pages that show up fine in
my browser, but I get errors when I call minidom.parse, or try to get
events with pulldom. Is there a parser that is as forgiving as web
browsers?


Didn't this get answered just the other day?

minidom and pulldom are built on XML parsers. HTML is not XML.

If you want a tree, I recommend using pushing the HTML through mxTidy
or uTidylib, and feeding the resultant XHTML to the XML API of your
choice.
John
Jul 18 '05 #4
jj*@pobox.com (John J. Lee) writes:
[...]
Didn't this get answered just the other day?

[...]

Whoops, local news trouble, I guess.
John
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Paul Miller | last post by:
We've run into minidom's inabilty to handle large (20+MB) XML files, and need a replacement that can handle it. Unfortunately, we're pretty dependent on a DOM, so a pulldom or SAX replacement is...
0
by: xtian | last post by:
Hi - I'm doing some data conversion with minidom (turning a csv file into a specific xml format), and I've hit a couple of small problems. 1: The output format has a header with some xml that...
3
by: Sunil Movva | last post by:
I have an application that uses xml to communicate between threads. One of the threads in my app creates an xml message and sends it to a second thread. This second thread parses the message and...
5
by: Mike McGavin | last post by:
Hi everyone. I've been trying for several hours now to get minidom to parse namespaces properly from my stream of XML, so that I can use DOM methods such as getElementsByTagNameNS(). For some...
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
8
by: jog | last post by:
Hi, I want to get text out of some nodes of a huge xml file (1,5 GB). The architecture of the xml file is something like this <parent> <page> <title>bla</title> <id></id> <revision> <id></id>...
0
by: Greg Copeland | last post by:
I am attempting to freeze an application which uses the dom.minidom parser. When I execute my application, I get an import error of: ImportError: No module named dom.minidom. During the freeze...
0
by: Gary | last post by:
Howdy I ran into a difference between Python on Windows XP and Linux Fedora 6. Writing a dom to xml with minidom works on Linux. It gives an error on XP if there is an empty namespace. The...
0
by: susan_ali | last post by:
I'm using xml.dom.pulldom to parse through an XML file. I use expandNode() to scrutinize certain blocks of it that I'm interested in. Once I find a block of XML in the input file that I'm...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: marcoviolo | last post by:
Dear all, I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.