formatted xml output from ElementTree inconsistency

Matthew Thorley

Greetings, perhaps someone can explain this. I get to different styles
of formatting for xmla and xmlb when I do the following:

from elementtree import ElementTree as et

xmla = et.ElementTree('some_file.xml')
xmlb = et.Element('parent')
et.SubElement(xmlb, 'child1')
et.SubElement(xmlb, 'child2')

root = et.Element('root')
root.append(xmla.getroot())
root.append(xmlb)

print et.tostring(root)

The output I get shows xmla as nicely formatted text, with elements on
different lines and everything all tabbed and pretty. Inverly, xmlb is
one long string on one line.

Is that because the some_file.xml is already nicely formatted? I thought
that the formatting was ignored when creating new elements.

Is their a function to 'pretty print' an element? I looked in api ref
and didn't see anything that would do it. It would be nice if their was
a way to create 'standard' formatted output for all elements regardless
of how they were created.

Comments and suggestions are greatly appreciated.

regards
-Matthew

Jul 19 '05 #1

Subscribe Post Reply

5195

Jarek Zgoda

Matthew Thorley napisa³(a):

The output I get shows xmla as nicely formatted text, with elements on
different lines and everything all tabbed and pretty. Inverly, xmlb is
one long string on one line.

Is that because the some_file.xml is already nicely formatted? I thought
that the formatting was ignored when creating new elements.

Why want you to read an XML document "by hand"? It's a "machine related"
data chunk.

Document formatting should be done by means of CSS and/or XSL stylesheet.

--
Jarek Zgoda
http://jpa.berlios.de/

Jul 19 '05 #2

Matthew Thorley

Jarek Zgoda wrote:

Matthew Thorley napisa³(a):
The output I get shows xmla as nicely formatted text, with elements on
different lines and everything all tabbed and pretty. Inverly, xmlb is
one long string on one line.

Is that because the some_file.xml is already nicely formatted? I
thought that the formatting was ignored when creating new elements.

Why want you to read an XML document "by hand"? It's a "machine related"
data chunk.

Document formatting should be done by means of CSS and/or XSL stylesheet.

It is just data to the machine, but people may have to read and
interpret this data. I don't think there is anything unsual about
formatting xml with tabs. Most web pages do that in their html/xhtml.
Just imagine if you wanted to change a broken link on your web page, and
the entire page was one long string. That may not matter to Dream
Weaver, but it sure would be annoying if you were using vi :)

-Matthew

Jul 19 '05 #3

Kent Johnson

Matthew Thorley wrote:

Greetings, perhaps someone can explain this. I get to different styles
of formatting for xmla and xmlb when I do the following:
<snip>
Is that because the some_file.xml is already nicely formatted? I thought
that the formatting was ignored when creating new elements.
ElementTree is preserving the whitespace of the original.
Is their a function to 'pretty print' an element?

AFAIK this is not supported in ElementTree. I hacked my own by modifying ElementTree._write(); it wasn't too hard to make a version that suited my purposes.

Kent

Jul 19 '05 #4

Patrick Maupin

Jarek Zgoda wrote:

Why want you to read an XML document "by hand"? It's a "machine related"
data chunk.

I see this attitude all the time, and frankly I don't understand it.
Please explain why XML is in ASCII/unicode instead of binary. Is it
because it is easier for a machine to parse? No, I thought not. It's
obviously so humans can read it. The next question is: why is
arbitrary whitespace allowed? Is that to make it easier for machines
to parse? Is it any easier for machines to generate arbitrary
whitespace than it would have been for them to always insert, e.g., a
single space? No, I thought not there as well.
Document formatting should be done by means of CSS and/or XSL stylesheet.

He's not formatting the (rendered) document -- he's just formatting the
raw data to make it more readable in an editor. You could use CSS/XSL,
and then selectively add whitespace without actually affecting the
rendering. Alternatively, as you point out, it is a "machine related"
data chunk -- some XML documents are never even destined for human
eyes, _except_ for debugging. For some of those documents, CSS and XSL
are just a waste of CPU cycles.

Regards,
Pat

Jul 19 '05 #5

Steven Bethard

Matthew Thorley wrote:

from elementtree import ElementTree as et

xmla = et.ElementTree('some_file.xml')
xmlb = et.Element('parent')
et.SubElement(xmlb, 'child1')
et.SubElement(xmlb, 'child2')

root = et.Element('root')
root.append(xmla.getroot())
root.append(xmlb)

print et.tostring(root) [snip] Is their a function to 'pretty print' an element?

Depends on how pretty you want it. I've found that putting each element
on its own line has been sufficient for many of my manual-inspection use
cases. This isn't too hard with a cheap hack:

py> import elementtree.ElementTree as et
py> root = et.Element('root')
py> parent = et.SubElement(root, 'parent')
py> child = et.SubElement(parent, 'child')
py> print et.tostring(root)
<root><parent><child /></parent></root>
py> print et.tostring(root).replace('><', '>\n<')
<root>
<parent>
<child />
</parent>
</root>

Not ideal, but it may work well enough for you.

STeVe

Jul 19 '05 #6

Dennis Lee Bieber

On 24 Jun 2005 13:53:43 -0700, "Patrick Maupin" <pm*****@gmail.com>
declaimed the following in comp.lang.python:

I see this attitude all the time, and frankly I don't understand it.
Please explain why XML is in ASCII/unicode instead of binary. Is it
because it is easier for a machine to parse? No, I thought not. It's
obviously so humans can read it. The next question is: why is
Off hand, I'd consider the non-binary nature to be because the
internet protocols are mostly designed for text, not binary.

-- ================================================== ============ <
wl*****@ix.netcom.com | Wulfraed Dennis Lee Bieber KD6MOG <
wu******@dm.net | Bestiaria Support Staff <
================================================== ============ <
Home Page: <http://www.dm.net/~wulfraed/> <
Overflow Page: <http://wlfraed.home.netcom.com/> <

Jul 19 '05 #7

Patrick Maupin

Dennis Bieber wrote:

Off hand, I'd consider the non-binary nature to be because the
internet protocols are mostly designed for text, not binary.

A document at http://www.w3.org/TR/REC-xml/ lists "the design goals for
XML".

One of the listed goals is "XML documents should be human-legible and
reasonably clear".

To your point, the very _first_ listed goal (if order means anything in
this list) is "XML shall be straightforwardly usable over the
Internet", so it's reasonable to assume "the non-binary nature to be
because the internet protocols are mostly designed for text, not
binary."

But this assumption turns cause and effect on its head. It is
perfectly feasible to pass binary data through every known internet
protocol (with a little simplistic encoding), and is done all the time.
The real next question is: why ARE the internet protocols "mostly
designed for text, not binary"?

SMTP, for example, was designed at a time when memory, bandwidth, and
CPU cycles were all at a premium, and MTAs were coded using fairly
low-level constructs in C where parsing was a pain in the rear. Even
so, the developers decided to use relatively free-formatted ASCII in
the protocol. To follow your theory to its logical conclusion, they
must have wasted all that bandwith, all those CPU cycles, all that
memory, all that disk space, and all that effort writing parsing code
because of yet another underlying mechanism which was "designed for
text."

On that account, your theory is correct, but only when you realize the
underlying mechanism which is "designed for text" is the human brain,
which has to try to make sense of all this mess when things aren't
quite interoperating properly.

Regards,
Pat

Jul 19 '05 #8

uche.ogbuji

Patrick Maupin wrote:
"""
Dennis Bieber wrote:

Off hand, I'd consider the non-binary nature to be because the
internet protocols are mostly designed for text, not binary.

A document at http://www.w3.org/TR/REC-xml/ lists "the design goals for
XML".

One of the listed goals is "XML documents should be human-legible and
reasonably clear".
"""

Yes. Thanks for mentioning this, because people too often forget it.

minidom, 4Suite's Domlette and Amara all provide good pretty-print
output functions. The latter two use rules from the XSLT spec, which
is designed by people who have the above design goal well in their
blood.

--
Uche
http://copia.ogbuji.net

Jul 19 '05 #9

by: Stewart Midwinter | last post by:

I want to parse a file with ElementTree. My file has the following format:  <?xml version='1.0' encoding='utf-8'?> <population> <person><name="joe" sex="male"...

Python

ElementTree/DTD question

by: Greg Wilson | last post by:

I'm trying to convert from minidom to ElementTree for handling XML, and am having trouble with entities in DTDs. My Python script looks like this: ...

Python

import statement / ElementTree

by: mirandacascade | last post by:

O/S: Windows 2K Vsn of Python: 2.4 Currently: 1) Folder structure: \workarea\ <- ElementTree files reside here \xml\ \dom\

Python

small inconsistency in ElementTree (1.2.6)

by: Damjan | last post by:

Attached is the smallest test case, that shows that ElementTree returns a string object if the text in the tree is only ascii, but returns a unicode object otherwise. This would make sense if...

Python

elementtree and gbk encoding

by: Steven Bethard | last post by:

I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that it's gbk-encoded.) I always have trouble with...

Python

using TreeBuilder in an ElementTree like way

by: Greg Aumann | last post by:

I am trying to write some python code for a library that reads an XML-like language from a file into elementtree data structures. Then I want to be able to read and/or modify the structure and then...

Python

request for advice - possible ElementTree nexus

by: mirandacascade | last post by:

Situation is this: 1) I have inherited some python code that accepts a string object, the contents of which is an XML document, and produces a data structure that represents some of the content of...

Python

module error for elementtree

by: saif.shakeel | last post by:

#!/usr/bin/env python from elementtree import ElementTree as Element tree = et.parse("testxml.xml") for t in tree.getiterator("SERVICEPARAMETER"): if t.get("Semantics") == "localId":...

Python

Output XML buffer?

by: Jan Danielsson | last post by:

Hello all, I'm using ElementTree to create an XHTML page (mod_python, blah, blah, blah). When I use ElementTree.tostring(root) to create a buffer which I want to return to the client, it doesn't...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

formatted xml output from ElementTree inconsistency

Similar topics