ElementTree Namespace Prefixes

Chris Spencer

Does anyone know how to make ElementTree preserve namespace prefixes in
parsed xml files? The default behavior is to strip a document of all
prefixes and then replace them autogenerated prefixes like ns0, ns1,
etc. The correct behavior should be to write the file in the form that
it was read, which it seems to do correctly for everything except
namespace prefixes. The docs mention "proper" output can be achieved by
using the Qname object, but they don't go into any detail. Any help is
appreciated.

Thanks,
Chris Spencer

Jul 19 '05 #1

Subscribe Reply

10087

Andrew Dalke

On Sun, 12 Jun 2005 15:06:18 +0000, Chris Spencer wrote:

Does anyone know how to make ElementTree preserve namespace prefixes in
parsed xml files?

See the recent c.l.python thread titled "ElemenTree and namespaces"
and started "May 16 2:03pm". One archive is at

http://groups-beta.google.com/group/...?&rnum=3&hl=en

Andrew
da***@dalkescie ntific.com

Jul 19 '05 #2

Chris Spencer

Andrew Dalke wrote:

On Sun, 12 Jun 2005 15:06:18 +0000, Chris Spencer wrote:

Does anyone know how to make ElementTree preserve namespace prefixes in
parsed xml files?

See the recent c.l.python thread titled "ElemenTree and namespaces"
and started "May 16 2:03pm". One archive is at

http://groups-beta.google.com/group/...?&rnum=3&hl=en

Thanks, although that thread didn't seem to resolve the issue. All the
first few links talk about is how to hack your own parser to make sense
of the Clark notation.

The problem at hand is with how Elementtree outputs namespaces and
represents the tag name in memory.

Given xml with no namespaces, Elementtree works perfectly. However, if
you give the root tag an xmlns attribute, Elementtree relabels all child
nodes with it's own prefix, completely defeating the purpose of the
default namespace. In my opinion, this is unacceptable behavior.

If an XML parser reads in and then writes out a document without having
altered it, then the new document should be the same as the original.
With Elementtree this isn't so. Lundh apparently believes he knows
better than you and I on how our namespaces should be represented.

It's a shame the default ns behavior in Elementtree is in such a poort
staten. I'm surprised no one's forked Elementtree solely to fix this issue.

Anyways, Python's native minidom works as expected, so I'll probably use
that instead, even if the api is slightly less intuitive.

Chris

Jul 19 '05 #3

Jarek Zgoda

Chris Spencer napisa³(a):

Given xml with no namespaces, Elementtree works perfectly. However, if
you give the root tag an xmlns attribute, Elementtree relabels all child
nodes with it's own prefix, completely defeating the purpose of the
default namespace. In my opinion, this is unacceptable behavior.
There is no functional difference between default namespace and "normal"
namespace. Replacing "default" with "normal" has no effect for document
processing (namespace doesn't change, only prefix), although it looks
differently for humans. Anyway, XML is for machines, not for humans.
If an XML parser reads in and then writes out a document without having
altered it, then the new document should be the same as the original.
With Elementtree this isn't so. Lundh apparently believes he knows
better than you and I on how our namespaces should be represented.
No, this is perfectly valid behaviour. Go, see spec.
It's a shame the default ns behavior in Elementtree is in such a poort
staten. I'm surprised no one's forked Elementtree solely to fix this issue.

There is at least one ElementTree API implementation that retains
prefixes, lxml.ETree. Go google for it.

--
Jarek Zgoda
http://jpa.berlios.de/ | http://www.zgodowie.org/

Jul 19 '05 #4

Fredrik Lundh

Chris Spencer wrote:

If an XML parser reads in and then writes out a document without having
altered it, then the new document should be the same as the original.
says who?
With Elementtree this isn't so. Lundh apparently believes he knows
better than you and I on how our namespaces should be represented.

do you even understand how XML namespaces work?

</F>

Jul 19 '05 #5

Oren Tirosh

Fredrik Lundh wrote:

Chris Spencer wrote:
If an XML parser reads in and then writes out a document without having
altered it, then the new document should be the same as the original.

says who?

Good question. There is no One True Answer even within the XML
standards.

It all boils down to how you define "the same". Which parts of the XML
document are meaningful content that needs to be preserved and which
ones are mere encoding variations that may be omitted from the internal
representation?

Some relevant references which may be used as guidelines:

* http://www.w3.org/TR/xml-infoset
The XML infoset defines 11 types of information items including
document type declaration, notations and other features. It does not
appear to be suitable for a lightweight API like ElementTree.

* http://www.w3.org/TR/xpath-datamodel
The XPath data model uses a subset of the XML infoset with "only" seven
node types.

http://www.w3.org/TR/xml-c14n
The canonical XML recommendation is meant to describe a process but it
also effectively defines a data model: anything preserved by the
canonicalizatio n process is part of the model. Anything not preserved
is not part of the model.

In theory, this definition should be equivalent to the xpath data model
since canonical XML is defined in terms of the xpath data model. In
practice, the XPath data model defines properties not required for
producing canonical XML (e.g. unparsed entities associated with
document note). I like this alternative "black box" definition because
provides a simple touchstone for determining what is or isn't part of
the model.

I think it would be a good goal for ElementTree to aim for compliance
with the canonical XML data model. It's already quite close.

It's possible to use the canonical XML data model without being a
canonical XML processor but it would be nice if parse() followed by
write() actually passed the canonical XML test vectors. It's the
easiest way to demonstrate compliance conclusively.

So what changes are required to make ElementTree canonical?

1. PI nodes are already supported for output. Need an option to
preserve them on parsing
2. Comment nodes are already support for output. Need an option to
preserve them on parsing (canonical XML also defines a "no comments"
canonical form)
3. Preserve Comments and PIs outside the root element (store them as
children of the ElementTree object?)
4. Sorting of attributes by canonical order
5. Minor formatting and spacing issues in opening tags

oh, and one more thing...

6. preserve namespace prefixes ;-)
(see http://www.w3.org/TR/xml-c14n#NoNSPrefixRewriting for rationale)

Jul 19 '05 #6

Fredrik Lundh

Oren Tirosh wrote:

It all boils down to how you define "the same". Which parts of the XML
document are meaningful content that needs to be preserved and which
ones are mere encoding variations that may be omitted from the internal
representation?

Some relevant references which may be used as guidelines:

* http://www.w3.org/TR/xml-infoset
The XML infoset defines 11 types of information items including
document type declaration, notations and other features. It does not
appear to be suitable for a lightweight API like ElementTree.

* http://www.w3.org/TR/xpath-datamodel
The XPath data model uses a subset of the XML infoset with "only" seven
node types.

http://www.w3.org/TR/xml-c14n
The canonical XML recommendation is meant to describe a process but it
also effectively defines a data model: anything preserved by the
canonicalizatio n process is part of the model. Anything not preserved
is not part of the model.

you forgot

http://effbot.org/zone/element-infoset.htm

which describes the 3-node XML infoset subset used by ElementTree.

</F>

Jul 19 '05 #7

Oren Tirosh

> you forgot

http://effbot.org/zone/element-infoset.htm

which describes the 3-node XML infoset subset used by ElementTree.

No, I did not forget your infoset subset. I was comparing it with other
infoset subsets described in various XML specifications.

I agree 100% that prefixes were not *supposed* to be part of the
document's meaning back when the XML namespace specification was
written, but later specifications broke that.

Please take a look at http://www.w3.org/TR/xml-c14n#NoNSPrefixRewriting

"... there now exist a number of contexts in which namespace prefixes
can impart information value in an XML document..."

"...Moreove r, it is possible to prove that namespace rewriting is
harmful, rather than simply ineffective."

Jul 19 '05 #8

Martijn Faassen

Jarek Zgoda wrote:
[snip]

It's a shame the default ns behavior in Elementtree is in such a poort
staten. I'm surprised no one's forked Elementtree solely to fix this
issue.

There is at least one ElementTree API implementation that retains
prefixes, lxml.ETree. Go google for it.

Just to make it explicitly clear, lxml is not a fork of ElementTree
fork, but a reimplementatio n of the API on top of libxml2.

ElementTree indeed retains prefixes, and since version 0.7 released
earlier this way, it's also possible to get some control over generation
of prefixes during element construction.

You can find it here:

http://codespeak.net/lxml

Regards,

Martijn

Jul 19 '05 #9

uche.ogbuji

Chris Spencer:
"""
Fredrik Lundh wrote:

Chris Spencer wrote:
If an XML parser reads in and then writes out a document without having
altered it, then the new document should be the same as the original.

says who?

Jul 19 '05 #10

Similar topics

3727

Bug in Elementtree/Expat

by: alainpoint | last post by:

Hello, I use Elementtree to parse an elementary SVG file (in fact, it is one of the examples in the "SVG essentials" book). More precisely, it is the fig0201.svg file in the second chapter. The contents of the file are as follows (i hope it will be rendered correctly): <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd"> <svg width="200" height="200">

Python

4057

ElementTree and xsi to xmlns conversion?

by: Matthew Thorley | last post by:

Why does ElementTree.parse convert my xsi to an xmlns? When I do this from elementtree import ElementTree # Sample xml mgac =""" <mgac xmlns="http://www.chpc.utah.edu/~baites/mgacML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.chpc.utah.edu/~baites/mgacML

Python

1241

Xerces XS*, namespace prefixes and XSIDCDefinition string selector

by: S ML | last post by:

Hello, I use the Xerces XS* classes to parse and get the definition of an XML schema. I don't seem to find a way to get the namespace prefixes using this framework and I read somewhere that the reason for that is possibly because, the same namespace URI can have multiple prefixes. But the problem I am facing is, the xpath definition specified for key, keyref and unique elements use the namespace prefixes and these are not converted to the...

.NET Framework

3959

Add Namespace to element tags?

by: Wayne Wengert | last post by:

I am exporting an XML file based on a dataset using VB.NET. This works fine but the resulting xml file does not include namespace prefixes, which are required by another tool I am trying to use (Altova Stylevision). (See samples below) Stylevision won't see a file as valid unless it includes the namespace prefixes? I am confused about the whole namespace issue? It is not clear to me what the namespace really does here - or why it is needed?...

.NET Framework

3478

Namespace Best Practices

by: steve | last post by:

Can someone point me to some information on namespace best practices? Questions I have are as follows: 1) Should I use .NET namespaces or URL, URI and URN's? Ie mycompany.mydivision.myapp vs http://www.mycompany.com/division/app? 2) Should I use namespace prefixes? I thought I read something using prefixes is a bad idea? 3) etc. Thanks in advance.

.NET Framework

1323

namespace prefixes in output xml

by: Hubidubi | last post by:

Hi, I would like to generate XML (struts code) from XML with XSL transformation. I run into problem when I wanted to use tags with prefixes, like <html:text />. I included namespace specification to the header: <xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xmlns:html="struts">

.NET Framework

7713

XslTransform not excluding default namespace despite exclude-result-prefixes attribute?

by: Samuel R. Neff | last post by:

I'm writing an xslt in vs.net 2003 and in order to get intellisense on the html content I added the default namespace declaration xmlns="http://schemas.microsoft.com/intellisense/ie5". However, even though I also have exclude-result-prefixes="#default" declared the default namespace is still outputted on the resulting document. <?xml version="1.0" encoding="UTF-8" ?> <xsl:stylesheet version="1.0"...

.NET Framework

4607

lxml/ElementTree and .tail

by: Chas Emerick | last post by:

I looked around for an ElementTree-specific mailing list, but found none -- my apologies if this is too broad a forum for this question. I've been using the lxml variant of the ElementTree API, which I understand works in much the same way (with some significant additions). In particular, it shares the use of a .tail attribute. I ran headlong into this aspect of the API while doing some DOM manipulations, and it's got me pretty...

Python

2244

More or less philosophical thoughts about 'using namespace'

by: Juha Nieminen | last post by:

Whenever one sees example C++ code basically anywhere, be it in a book, in a tutorial in the internet, in an online forum or whatever, I would estimate that at least in 99% of cases one sees the use of "using namespace std;" to get rid of that namespace. In fact, "using namespace ..." is very popular with all documentation and example code of most C++ libraries out there which use their own namespace. This raises the question why use...

C / C++

8142

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8591

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

8294

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8444

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6093

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

4058

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4138

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2575

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1758

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP