Problem with xml.dom parser and xmlns attribute

Peter Maas

Hi,

I have a problem parsing html text with xmldom. The following code
runs well:

--------------------------------------------
from xml.dom.ext.reader import HtmlLib
from xml.dom.ext import PrettyPrint

r = HtmlLib.Reader()
doc = r.fromString(
'''
<html>
<head>
</head>
<body>
<p>hallo welt
</body>
</html>
''')
PrettyPrint(doc)
--------------------------------------------

but if I replace <html> by <html xmlns="http://www.w3.org/1999/xhtml">
I get the error

Traceback (most recent call last):
File "xhtml.py", line 5, in ?
doc = r.fromString(
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\HtmlLib.py", line 69, in fromString
return self.fromStream(stream, ownerDoc, charset)
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\HtmlLib.py", line 27, in fromStream
self.parser.parse(stream)
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\Sgmlop.py", line 57, in parse
self._parser.parse(stream.read())
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\Sgmlop.py", line 160, in finish_starttag
unicode(value, self._charset))
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\Element.py", line 177, in setAttributeNS
attr = self.ownerDocument.createAttributeNS(namespaceURI, qualifiedName)
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\Document.py", line 139, in createAttributeNS
raise NamespaceErr()
xml.dom.NamespaceErr: Invalid or illegal namespace operation

Exit code: 1

A lot of HTML documents on Internet have this xmlns=.... Are
they wrong or is this a PyXML bug?

Mit freundlichen Gruessen,

Peter Maas

--
-------------------------------------------------------------------
Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
Tel +49-241-93878-0 Fax +49-241-93878-20 eMail pe********@mplusr.de
-------------------------------------------------------------------

Jul 18 '05 #1

Subscribe Post Reply

4460

Richard Brodie

"Peter Maas" <pe********@mplusr.de> wrote in message news:c6**********@swifty.westend.com...

but if I replace <html> by <html xmlns="http://www.w3.org/1999/xhtml"> A lot of HTML documents on Internet have this xmlns=.... Are
they wrong or is this a PyXML bug?

If they are genuine XHTML documents, they should be well-formed XML,
so you should be able to use an XML rather than an SGML parser.

from xml.dom.ext.reader import Sax2
r = Sax2.Reader()

Jul 18 '05 #2

Peter Maas

Richard Brodie wrote:

"Peter Maas" <pe********@mplusr.de> wrote in message news:c6**********@swifty.westend.com...

[...]

but if I replace <html> by <html xmlns="http://www.w3.org/1999/xhtml"> [...]A lot of HTML documents on Internet have this xmlns=.... Are
they wrong or is this a PyXML bug?

If they are genuine XHTML documents, they should be well-formed XML,
so you should be able to use an XML rather than an SGML parser.

from xml.dom.ext.reader import Sax2
r = Sax2.Reader()

Thanks, Richard. But in the Internet most of the time I don't know
what kind of document I'm dealing with when I start parsing. I guess
I should use HTMLParser (?).

Mit freundlichen Gruessen,

Peter Maas

--
-------------------------------------------------------------------
Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
Tel +49-241-93878-0 Fax +49-241-93878-20 eMail pe********@mplusr.de
-------------------------------------------------------------------

Jul 18 '05 #3

Richard Brodie

"Peter Maas" <pe********@mplusr.de> wrote in message news:c6**********@swifty.westend.com...

Thanks, Richard. But in the Internet most of the time I don't know
what kind of document I'm dealing with when I start parsing. I guess
I should use HTMLParser (?).

If you're dealing with a wide range of web pages, chances are they
will have all manner of rubbish in them. I would probably feed the
stuff through Tidy (or uTidyLib) first, to convert to cleanish XHTML,
then use an XML parser.

Jul 18 '05 #4

Uche Ogbuji

Peter Maas <pe********@mplusr.de> wrote in message news:<c6**********@swifty.westend.com>...

Hi,

I have a problem parsing html text with xmldom. The following code
runs well:

--------------------------------------------
from xml.dom.ext.reader import HtmlLib
from xml.dom.ext import PrettyPrint

r = HtmlLib.Reader()
doc = r.fromString(
'''
<html>
<head>
</head>
<body>
<p>hallo welt
</body>
</html>
''')
PrettyPrint(doc)
--------------------------------------------

but if I replace <html> by <html xmlns="http://www.w3.org/1999/xhtml">
I get the error

Traceback (most recent call last):
File "xhtml.py", line 5, in ?
doc = r.fromString(
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\HtmlLib.py", line 69, in fromString
return self.fromStream(stream, ownerDoc, charset)
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\HtmlLib.py", line 27, in fromStream
self.parser.parse(stream)
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\Sgmlop.py", line 57, in parse
self._parser.parse(stream.read())
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\Sgmlop.py", line 160, in finish_starttag
unicode(value, self._charset))
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\Element.py", line 177, in setAttributeNS
attr = self.ownerDocument.createAttributeNS(namespaceURI, qualifiedName)
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\Document.py", line 139, in createAttributeNS
raise NamespaceErr()
xml.dom.NamespaceErr: Invalid or illegal namespace operation
>Exit code: 1

A lot of HTML documents on Internet have this xmlns=.... Are
they wrong or is this a PyXML bug?

This looks like a 4DOM bug. What are you hoping to do once you've
parsed these documents? If we know we can either suggest an
alternative tool to use or perhaps a workaround.

--Uche

Jul 18 '05 #5

Similar topics

javascript xml parser question.

by: annoyingmouse2002 | last post by:

Hi there, sorry if this a long post but I'm really just starting out. I've been using MSXML to parse an OWL but would like to use a different solution. Basically it reads the OWL (Based on XML)...

.NET Framework

Redirect Problem

by: WebHouse.Co | last post by:

Hi Sir I'm in my 2nd year in M.Sc. degree & I made a project about the powerful tools SQLXML 3.0 & updategram, so I made a list of programs which r they so similar to the example that using...

.NET Framework

XSL output method to XML & JavaScript problem

by: johkar | last post by:

When the output method is set to xml, even though I have CDATA around my JavaScript, the operaters of && and < are converted to XML character entities which causes errors in my JavaScript. I know...

.NET Framework

.Net Client calling Apache WebService: Problem with NULL-Values

by: Jens | last post by:

Hello, i am trying to call a Apache WebService, which accepts NULL-Values for some Parameters of a specific Web-Method. NULL-Values are mapped within the soap-request by the .NET Client...

.NET Framework

Problem with Schema Validations using XmlValidatingReade--HELP

by: Rajesh Jain | last post by:

I Have 2 separate schemas. --------------Schema 1 is defined as below----------- <xs:schema targetNamespace="http://Schemas/1" xmlns="http://Schemas/1" xmlns:xs="http://www.w3.org/2001/XMLSchema"...

.NET Framework

XML validation problem (substitutionGroup)

by: Michael Skulsky | last post by:

Hi all, I've got the following validation problem. There are 2 schemas and a document: ----------------------------------------------------------------- bar.xsd ====== <?xml version="1.0"...

.NET Framework

Problem of calling WebService in Java.

by: yqlu | last post by:

I hava developed a client in C# that is connected to a 3-party XML Web Services developed in Java based on the AXIS 1.1. Most methods call are successful except for one method named "findObjects"...

.NET Framework

Help needed with Transpose XML and XSLT problem

by: infiniti | last post by:

Hi, I am coming across problems in trying to EFFICIENTLY merge to XML files into one which involves transposing the rows into columns so that I can either generate a single flat xml file or store...

.NET Framework

problem when parsing the xml file

by: reddyth | last post by:

Dear All, I wanted to parse an XML file and print the element's content. I have the following code for the same. I have printed the ourput too. The problem is it is printing unwanted spaces and...

Perl

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice