By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,270 Members | 1,703 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,270 IT Pros & Developers. It's quick & easy.

Problem parsing namespaces with xml.dom.minidom

P: n/a
Hi everyone.

I've been trying for several hours now to get minidom to parse
namespaces properly from my stream of XML, so that I can use DOM methods
such as getElementsByTagNameNS(). For some reason, though, it just
doesn't seem to want to split the prefixes from the rest of the tags
when parsing.

The minidom documentation at
http://docs.python.org/lib/module-xml.dom.minidom.html implies that
namespaces are supposed to be supported as long as I'm using a parser
that supports them, but I just can't seem to get it to work. I was
wondering if anyone can see what I'm doing wrong.

Here's a simple test case that represents the problem I'm having. If it
makes a difference, I have PyXML installed, or at the very least, I have
the Debian Linux python-xml package installed, which I'm pretty sure is
PyXML.
========

from xml.dom import minidom
from xml import sax
text = '''<?xml version="1.0" encoding="UTF-8"?>
<xte:xte xmlns:xte='http://www.mcs.vuw.ac.nz/renata/xte'>
<xte:creator>alias</xte:creator>
<xte:date>Thu Jan 30 15:06:06 NZDT 2003</xte:date>
<xte:object objectid="object1">
Nothing
</xte:object>
</xte:xte>
'''
# Set up a parser for namespace-ready parsing.
parser = sax.make_parser()
parser.setFeature(sax.handler.feature_namespaces, 1)
parser.setFeature(sax.handler.feature_namespace_pr efixes, 1)

# Parse the string into a minidom
mydom = minidom.parseString(text)

# Look for some elements

# This one shouldn't return any (I think).
object_el1 = mydom.getElementsByTagName("xte:object")

# This one definitely should, at least for what I want.
object_el2 = mydom.getElementsByTagNameNS("object",
'http://www.mcs.vuw.ac.nz/renata/xte')
print '1: ' + str(object_el1)
print '2: ' + str(object_el2)

=========

Output is:

1: [<DOM Element: xte:object at 0x404a922c>]
2: []

=========

What *seems* to be happening is that the namespace prefix isn't being
separated, and is simply being parsed as if it's part of the rest of the
tag. Therefore when I search for a tag in a particular namespace, it's
not being found.

I've looked through the code in the python libraries, and the
minidom.parseString function appears to be calling the PullDOM parse
method, which creates a PullDOM object to be the ContentHandler. Just
browsing over that code, it *appears* to be trying to split the prefix
from the local name in order to build a namespace-ready DOM as I would
expect it to. I can't quite figure out why this isn't working for me,
though.
I'm not terribly experienced with XML in general, so it's possible that
I'm just incorrectly interpreting how things are supposed to work to
begin with. If this is the case, please accept my apologies, but I'd
like any suggestions for how I should be doing it. I'd really just like
to be able to parse an XML document into a DOM, and then be able to pull
out elements relative to their namespaces.

Can anyone see what I'm doing wrong?

Thanks.
Mike.
Jul 18 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Mike McGavin wrote:
I'm not terribly experienced with XML in general, so it's possible that I'm just incorrectly
interpreting how things are supposed to work to begin with. If this is the case, please accept my
apologies, but I'd like any suggestions for how I should be doing it. I'd really just like to be
able to parse an XML document into a DOM, and then be able to pull out elements relative to their
namespaces.


is the DOM API an absolute requirement?

</F>

Jul 18 '05 #2

P: n/a
Hi Fredrik.

Fredrik Lundh wrote:
I'm not terribly experienced with XML in general, so it's possible that I'm just incorrectly
interpreting how things are supposed to work to begin with. If this is the case, please accept my
apologies, but I'd like any suggestions for how I should be doing it. I'd really just like to be
able to parse an XML document into a DOM, and then be able to pull out elements relative to their
namespaces.
is the DOM API an absolute requirement?


It wouldn't need to conform to the official specifications of the DOM
API, but I guess I'm after some comparable functionality.

In particular, I need to be able to parse a namespace-using XML document
into some kind of node tree, and then being able to query the tree to
select elements based on their namespace and local tag names, and so on.
I don't mind if the methods provided don't conform exactly to DOM
specifications.
I guess I could write my own code to build a namespace-recognising DOM
from an XML file, but it seems as if that's already been done and I'd be
very surprised if it hadn't. I just can't figure out why minidom
doesn't seem to be working properly for me when namespaces are involved.

Thanks.
Mike.
Jul 18 '05 #3

P: n/a
Mike McGavin wrote:
is the DOM API an absolute requirement?


It wouldn't need to conform to the official specifications of the DOM API, but I guess I'm after
some comparable functionality.

In particular, I need to be able to parse a namespace-using XML document into some kind of node
tree, and then being able to query the tree to select elements based on their namespace and local
tag names, and so on. I don't mind if the methods provided don't conform exactly to DOM
specifications.


sounds like this might be exactly what you need:

http://effbot.org/zone/element-index.htm

(it's also the fastest and most memory-efficient Python-only parser you
can get, but I suppose that's not a problem ;-)

</F>

Jul 18 '05 #4

P: n/a
You've reversed some function parameters. Here's a program that works
fine (note that you don't need to set up a SAX parser):

from xml.dom import minidom
text = '''<?xml version="1.0" encoding="UTF-8"?>
<xte:xte xmlns:xte='http://www.mcs.vuw.ac.nz/renata/xte'>
<xte:creator>alias</xte:creator>
<xte:date>Thu Jan 30 15:06:06 NZDT 2003</xte:date>
<xte:object objectid="object1">
Nothing
</xte:object>
</xte:xte>
'''

# Parse the string into a minidom
mydom = minidom.parseString(text)

# Look for some elements

# This one shouldn't return any (I think).
object_el1 = mydom.getElementsByTagName("xte:object")

# This one definitely should, at least for what I want.
object_el2 = mydom.getElementsByTagNameNS(
'http://www.mcs.vuw.ac.nz/renata/xte',"object",
)
print '1: ' + str(object_el1)
print '2: ' + str(object_el2)

Jul 18 '05 #5

P: n/a
Hi Fredrik.

Fredrik Lundh wrote:
It wouldn't need to conform to the official specifications of the DOM API, but I guess I'm after
some comparable functionality. [--snip--]
sounds like this might be exactly what you need:
http://effbot.org/zone/element-index.htm
(it's also the fastest and most memory-efficient Python-only parser you
can get, but I suppose that's not a problem ;-)


Thanks. The original problem I was having turned out to the be
reversing a couple of parameters in a method call, as Paul pointed out,
and I now feel pretty silly as a result. But I'll take a look at this, too.

Much appreciated.
Mike.
Jul 18 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.