Hi everyone.
I've been trying for several hours now to get minidom to parse
namespaces properly from my stream of XML, so that I can use DOM methods
such as getElementsByTagNameNS(). For some reason, though, it just
doesn't seem to want to split the prefixes from the rest of the tags
when parsing.
The minidom documentation at
http://docs.python.org/lib/module-xml.dom.minidom.html implies that
namespaces are supposed to be supported as long as I'm using a parser
that supports them, but I just can't seem to get it to work. I was
wondering if anyone can see what I'm doing wrong.
Here's a simple test case that represents the problem I'm having. If it
makes a difference, I have PyXML installed, or at the very least, I have
the Debian Linux python-xml package installed, which I'm pretty sure is
PyXML.
========
from xml.dom import minidom
from xml import sax
text = '''<?xml version="1.0" encoding="UTF-8"?>
<xte:xte xmlns:xte='http://www.mcs.vuw.ac.nz/renata/xte'>
<xte:creator>alias</xte:creator>
<xte:date>Thu Jan 30 15:06:06 NZDT 2003</xte:date>
<xte:object objectid="object1">
Nothing
</xte:object>
</xte:xte>
'''
# Set up a parser for namespace-ready parsing.
parser = sax.make_parser()
parser.setFeature(sax.handler.feature_namespaces, 1)
parser.setFeature(sax.handler.feature_namespace_pr efixes, 1)
# Parse the string into a minidom
mydom = minidom.parseString(text)
# Look for some elements
# This one shouldn't return any (I think).
object_el1 = mydom.getElementsByTagName("xte:object")
# This one definitely should, at least for what I want.
object_el2 = mydom.getElementsByTagNameNS("object",
'http://www.mcs.vuw.ac.nz/renata/xte')
print '1: ' + str(object_el1)
print '2: ' + str(object_el2)
=========
Output is:
1: [<DOM Element: xte:object at 0x404a922c>]
2: []
=========
What *seems* to be happening is that the namespace prefix isn't being
separated, and is simply being parsed as if it's part of the rest of the
tag. Therefore when I search for a tag in a particular namespace, it's
not being found.
I've looked through the code in the python libraries, and the
minidom.parseString function appears to be calling the PullDOM parse
method, which creates a PullDOM object to be the ContentHandler. Just
browsing over that code, it *appears* to be trying to split the prefix
from the local name in order to build a namespace-ready DOM as I would
expect it to. I can't quite figure out why this isn't working for me,
though.
I'm not terribly experienced with XML in general, so it's possible that
I'm just incorrectly interpreting how things are supposed to work to
begin with. If this is the case, please accept my apologies, but I'd
like any suggestions for how I should be doing it. I'd really just like
to be able to parse an XML document into a DOM, and then be able to pull
out elements relative to their namespaces.
Can anyone see what I'm doing wrong?
Thanks.
Mike.