473,324 Members | 2,356 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

Problem parsing namespaces with xml.dom.minidom

Hi everyone.

I've been trying for several hours now to get minidom to parse
namespaces properly from my stream of XML, so that I can use DOM methods
such as getElementsByTagNameNS(). For some reason, though, it just
doesn't seem to want to split the prefixes from the rest of the tags
when parsing.

The minidom documentation at
http://docs.python.org/lib/module-xml.dom.minidom.html implies that
namespaces are supposed to be supported as long as I'm using a parser
that supports them, but I just can't seem to get it to work. I was
wondering if anyone can see what I'm doing wrong.

Here's a simple test case that represents the problem I'm having. If it
makes a difference, I have PyXML installed, or at the very least, I have
the Debian Linux python-xml package installed, which I'm pretty sure is
PyXML.
========

from xml.dom import minidom
from xml import sax
text = '''<?xml version="1.0" encoding="UTF-8"?>
<xte:xte xmlns:xte='http://www.mcs.vuw.ac.nz/renata/xte'>
<xte:creator>alias</xte:creator>
<xte:date>Thu Jan 30 15:06:06 NZDT 2003</xte:date>
<xte:object objectid="object1">
Nothing
</xte:object>
</xte:xte>
'''
# Set up a parser for namespace-ready parsing.
parser = sax.make_parser()
parser.setFeature(sax.handler.feature_namespaces, 1)
parser.setFeature(sax.handler.feature_namespace_pr efixes, 1)

# Parse the string into a minidom
mydom = minidom.parseString(text)

# Look for some elements

# This one shouldn't return any (I think).
object_el1 = mydom.getElementsByTagName("xte:object")

# This one definitely should, at least for what I want.
object_el2 = mydom.getElementsByTagNameNS("object",
'http://www.mcs.vuw.ac.nz/renata/xte')
print '1: ' + str(object_el1)
print '2: ' + str(object_el2)

=========

Output is:

1: [<DOM Element: xte:object at 0x404a922c>]
2: []

=========

What *seems* to be happening is that the namespace prefix isn't being
separated, and is simply being parsed as if it's part of the rest of the
tag. Therefore when I search for a tag in a particular namespace, it's
not being found.

I've looked through the code in the python libraries, and the
minidom.parseString function appears to be calling the PullDOM parse
method, which creates a PullDOM object to be the ContentHandler. Just
browsing over that code, it *appears* to be trying to split the prefix
from the local name in order to build a namespace-ready DOM as I would
expect it to. I can't quite figure out why this isn't working for me,
though.
I'm not terribly experienced with XML in general, so it's possible that
I'm just incorrectly interpreting how things are supposed to work to
begin with. If this is the case, please accept my apologies, but I'd
like any suggestions for how I should be doing it. I'd really just like
to be able to parse an XML document into a DOM, and then be able to pull
out elements relative to their namespaces.

Can anyone see what I'm doing wrong?

Thanks.
Mike.
Jul 18 '05 #1
5 6317
Mike McGavin wrote:
I'm not terribly experienced with XML in general, so it's possible that I'm just incorrectly
interpreting how things are supposed to work to begin with. If this is the case, please accept my
apologies, but I'd like any suggestions for how I should be doing it. I'd really just like to be
able to parse an XML document into a DOM, and then be able to pull out elements relative to their
namespaces.


is the DOM API an absolute requirement?

</F>

Jul 18 '05 #2
Hi Fredrik.

Fredrik Lundh wrote:
I'm not terribly experienced with XML in general, so it's possible that I'm just incorrectly
interpreting how things are supposed to work to begin with. If this is the case, please accept my
apologies, but I'd like any suggestions for how I should be doing it. I'd really just like to be
able to parse an XML document into a DOM, and then be able to pull out elements relative to their
namespaces.
is the DOM API an absolute requirement?


It wouldn't need to conform to the official specifications of the DOM
API, but I guess I'm after some comparable functionality.

In particular, I need to be able to parse a namespace-using XML document
into some kind of node tree, and then being able to query the tree to
select elements based on their namespace and local tag names, and so on.
I don't mind if the methods provided don't conform exactly to DOM
specifications.
I guess I could write my own code to build a namespace-recognising DOM
from an XML file, but it seems as if that's already been done and I'd be
very surprised if it hadn't. I just can't figure out why minidom
doesn't seem to be working properly for me when namespaces are involved.

Thanks.
Mike.
Jul 18 '05 #3
Mike McGavin wrote:
is the DOM API an absolute requirement?


It wouldn't need to conform to the official specifications of the DOM API, but I guess I'm after
some comparable functionality.

In particular, I need to be able to parse a namespace-using XML document into some kind of node
tree, and then being able to query the tree to select elements based on their namespace and local
tag names, and so on. I don't mind if the methods provided don't conform exactly to DOM
specifications.


sounds like this might be exactly what you need:

http://effbot.org/zone/element-index.htm

(it's also the fastest and most memory-efficient Python-only parser you
can get, but I suppose that's not a problem ;-)

</F>

Jul 18 '05 #4
You've reversed some function parameters. Here's a program that works
fine (note that you don't need to set up a SAX parser):

from xml.dom import minidom
text = '''<?xml version="1.0" encoding="UTF-8"?>
<xte:xte xmlns:xte='http://www.mcs.vuw.ac.nz/renata/xte'>
<xte:creator>alias</xte:creator>
<xte:date>Thu Jan 30 15:06:06 NZDT 2003</xte:date>
<xte:object objectid="object1">
Nothing
</xte:object>
</xte:xte>
'''

# Parse the string into a minidom
mydom = minidom.parseString(text)

# Look for some elements

# This one shouldn't return any (I think).
object_el1 = mydom.getElementsByTagName("xte:object")

# This one definitely should, at least for what I want.
object_el2 = mydom.getElementsByTagNameNS(
'http://www.mcs.vuw.ac.nz/renata/xte',"object",
)
print '1: ' + str(object_el1)
print '2: ' + str(object_el2)

Jul 18 '05 #5
Hi Fredrik.

Fredrik Lundh wrote:
It wouldn't need to conform to the official specifications of the DOM API, but I guess I'm after
some comparable functionality. [--snip--]
sounds like this might be exactly what you need:
http://effbot.org/zone/element-index.htm
(it's also the fastest and most memory-efficient Python-only parser you
can get, but I suppose that's not a problem ;-)


Thanks. The original problem I was having turned out to the be
reversing a couple of parameters in a method call, as Paul pointed out,
and I now feel pretty silly as a result. But I'll take a look at this, too.

Much appreciated.
Mike.
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
by: Alex Mizrahi | last post by:
Hello, All! i have 3mb long XML document with about 150000 lines (i think it has about 200000 elements there) which i want to parse to DOM to work with. first i thought there will be no...
1
by: Greg Wogan-Browne | last post by:
Hi all, I am having some trouble figuring out what is going on here - is this a bug, or correct behaviour? Basically, when I create an XML document with a namespace using xml.dom.minidom.parse()...
18
by: Steven Bethard | last post by:
In the "empty classes as c structs?" thread, we've been talking in some detail about my proposed "generic objects" PEP. Based on a number of suggestions, I'm thinking more and more that instead of...
6
by: Horst Gutmann | last post by:
Hi :-) I currently have quite a big problem with minidom and special chars (for example &uuml;) in HTML. Let's say I have following input file:...
2
by: Piet | last post by:
Hello, Via Xpath, I want to access nodes which have a namespace prefix. THe document at hand is an Xsl-FO document. I tried the following: from xml.dom import minidom from xml.xpath import...
36
by: Wilfredo Sánchez Vega | last post by:
I'm having some issues around namespace handling with XML: >>> document = xml.dom.minidom.Document() >>> element = document.createElementNS("DAV:", "href") >>> document.appendChild(element)...
3
by: John Carlyle-Clarke | last post by:
Hi. I'm new to Python and trying to use it to solve a specific problem. I have an XML file in which I need to locate a specific text node and replace the contents with some other text. The...
1
by: phasma | last post by:
Hi. I'm trying to disassemble bzipped file. If I use minidom.parseString, I'm getting this error: Traceback (most recent call last): File "./replications.py", line 342, in ? File...
2
by: ashmir.d | last post by:
Hi, I am trying to parse an xml file using the minidom parser. <code> from xml.dom import minidom xmlfilename = "sample.xml" xmldoc = minidom.parse(xmlfilename) </code> The parser is...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.