By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,931 Members | 2,015 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,931 IT Pros & Developers. It's quick & easy.

formatted xml output from ElementTree inconsistency

P: n/a
Greetings, perhaps someone can explain this. I get to different styles
of formatting for xmla and xmlb when I do the following:

from elementtree import ElementTree as et

xmla = et.ElementTree('some_file.xml')
xmlb = et.Element('parent')
et.SubElement(xmlb, 'child1')
et.SubElement(xmlb, 'child2')

root = et.Element('root')
root.append(xmla.getroot())
root.append(xmlb)

print et.tostring(root)

The output I get shows xmla as nicely formatted text, with elements on
different lines and everything all tabbed and pretty. Inverly, xmlb is
one long string on one line.

Is that because the some_file.xml is already nicely formatted? I thought
that the formatting was ignored when creating new elements.

Is their a function to 'pretty print' an element? I looked in api ref
and didn't see anything that would do it. It would be nice if their was
a way to create 'standard' formatted output for all elements regardless
of how they were created.

Comments and suggestions are greatly appreciated.

regards
-Matthew
Jul 19 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
Matthew Thorley napisał(a):
The output I get shows xmla as nicely formatted text, with elements on
different lines and everything all tabbed and pretty. Inverly, xmlb is
one long string on one line.

Is that because the some_file.xml is already nicely formatted? I thought
that the formatting was ignored when creating new elements.


Why want you to read an XML document "by hand"? It's a "machine related"
data chunk.

Document formatting should be done by means of CSS and/or XSL stylesheet.

--
Jarek Zgoda
http://jpa.berlios.de/
Jul 19 '05 #2

P: n/a
Jarek Zgoda wrote:
Matthew Thorley napisał(a):
The output I get shows xmla as nicely formatted text, with elements on
different lines and everything all tabbed and pretty. Inverly, xmlb is
one long string on one line.

Is that because the some_file.xml is already nicely formatted? I
thought that the formatting was ignored when creating new elements.

Why want you to read an XML document "by hand"? It's a "machine related"
data chunk.

Document formatting should be done by means of CSS and/or XSL stylesheet.

It is just data to the machine, but people may have to read and
interpret this data. I don't think there is anything unsual about
formatting xml with tabs. Most web pages do that in their html/xhtml.
Just imagine if you wanted to change a broken link on your web page, and
the entire page was one long string. That may not matter to Dream
Weaver, but it sure would be annoying if you were using vi :)

-Matthew
Jul 19 '05 #3

P: n/a
Matthew Thorley wrote:
Greetings, perhaps someone can explain this. I get to different styles
of formatting for xmla and xmlb when I do the following:
<snip>
Is that because the some_file.xml is already nicely formatted? I thought
that the formatting was ignored when creating new elements.
ElementTree is preserving the whitespace of the original.
Is their a function to 'pretty print' an element?


AFAIK this is not supported in ElementTree. I hacked my own by modifying ElementTree._write(); it wasn't too hard to make a version that suited my purposes.

Kent
Jul 19 '05 #4

P: n/a
Jarek Zgoda wrote:
Why want you to read an XML document "by hand"? It's a "machine related"
data chunk.

I see this attitude all the time, and frankly I don't understand it.
Please explain why XML is in ASCII/unicode instead of binary. Is it
because it is easier for a machine to parse? No, I thought not. It's
obviously so humans can read it. The next question is: why is
arbitrary whitespace allowed? Is that to make it easier for machines
to parse? Is it any easier for machines to generate arbitrary
whitespace than it would have been for them to always insert, e.g., a
single space? No, I thought not there as well.
Document formatting should be done by means of CSS and/or XSL stylesheet.


He's not formatting the (rendered) document -- he's just formatting the
raw data to make it more readable in an editor. You could use CSS/XSL,
and then selectively add whitespace without actually affecting the
rendering. Alternatively, as you point out, it is a "machine related"
data chunk -- some XML documents are never even destined for human
eyes, _except_ for debugging. For some of those documents, CSS and XSL
are just a waste of CPU cycles.

Regards,
Pat

Jul 19 '05 #5

P: n/a
Matthew Thorley wrote:
from elementtree import ElementTree as et

xmla = et.ElementTree('some_file.xml')
xmlb = et.Element('parent')
et.SubElement(xmlb, 'child1')
et.SubElement(xmlb, 'child2')

root = et.Element('root')
root.append(xmla.getroot())
root.append(xmlb)

print et.tostring(root) [snip] Is their a function to 'pretty print' an element?


Depends on how pretty you want it. I've found that putting each element
on its own line has been sufficient for many of my manual-inspection use
cases. This isn't too hard with a cheap hack:

py> import elementtree.ElementTree as et
py> root = et.Element('root')
py> parent = et.SubElement(root, 'parent')
py> child = et.SubElement(parent, 'child')
py> print et.tostring(root)
<root><parent><child /></parent></root>
py> print et.tostring(root).replace('><', '>\n<')
<root>
<parent>
<child />
</parent>
</root>

Not ideal, but it may work well enough for you.

STeVe
Jul 19 '05 #6

P: n/a
On 24 Jun 2005 13:53:43 -0700, "Patrick Maupin" <pm*****@gmail.com>
declaimed the following in comp.lang.python:

I see this attitude all the time, and frankly I don't understand it.
Please explain why XML is in ASCII/unicode instead of binary. Is it
because it is easier for a machine to parse? No, I thought not. It's
obviously so humans can read it. The next question is: why is
Off hand, I'd consider the non-binary nature to be because the
internet protocols are mostly designed for text, not binary.

-- ================================================== ============ <
wl*****@ix.netcom.com | Wulfraed Dennis Lee Bieber KD6MOG <
wu******@dm.net | Bestiaria Support Staff <
================================================== ============ <
Home Page: <http://www.dm.net/~wulfraed/> <
Overflow Page: <http://wlfraed.home.netcom.com/> <

Jul 19 '05 #7

P: n/a
Dennis Bieber wrote:
Off hand, I'd consider the non-binary nature to be because the
internet protocols are mostly designed for text, not binary.


A document at http://www.w3.org/TR/REC-xml/ lists "the design goals for
XML".

One of the listed goals is "XML documents should be human-legible and
reasonably clear".

To your point, the very _first_ listed goal (if order means anything in
this list) is "XML shall be straightforwardly usable over the
Internet", so it's reasonable to assume "the non-binary nature to be
because the internet protocols are mostly designed for text, not
binary."

But this assumption turns cause and effect on its head. It is
perfectly feasible to pass binary data through every known internet
protocol (with a little simplistic encoding), and is done all the time.
The real next question is: why ARE the internet protocols "mostly
designed for text, not binary"?

SMTP, for example, was designed at a time when memory, bandwidth, and
CPU cycles were all at a premium, and MTAs were coded using fairly
low-level constructs in C where parsing was a pain in the rear. Even
so, the developers decided to use relatively free-formatted ASCII in
the protocol. To follow your theory to its logical conclusion, they
must have wasted all that bandwith, all those CPU cycles, all that
memory, all that disk space, and all that effort writing parsing code
because of yet another underlying mechanism which was "designed for
text."

On that account, your theory is correct, but only when you realize the
underlying mechanism which is "designed for text" is the human brain,
which has to try to make sense of all this mess when things aren't
quite interoperating properly.

Regards,
Pat

Jul 19 '05 #8

P: n/a
Patrick Maupin wrote:
"""
Dennis Bieber wrote:
Off hand, I'd consider the non-binary nature to be because the
internet protocols are mostly designed for text, not binary.


A document at http://www.w3.org/TR/REC-xml/ lists "the design goals for
XML".

One of the listed goals is "XML documents should be human-legible and
reasonably clear".
"""

Yes. Thanks for mentioning this, because people too often forget it.

minidom, 4Suite's Domlette and Amara all provide good pretty-print
output functions. The latter two use rules from the XSLT spec, which
is designed by people who have the above design goal well in their
blood.

--
Uche
http://copia.ogbuji.net

Jul 19 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.