473,405 Members | 2,261 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

formatted xml output from ElementTree inconsistency

Greetings, perhaps someone can explain this. I get to different styles
of formatting for xmla and xmlb when I do the following:

from elementtree import ElementTree as et

xmla = et.ElementTree('some_file.xml')
xmlb = et.Element('parent')
et.SubElement(xmlb, 'child1')
et.SubElement(xmlb, 'child2')

root = et.Element('root')
root.append(xmla.getroot())
root.append(xmlb)

print et.tostring(root)

The output I get shows xmla as nicely formatted text, with elements on
different lines and everything all tabbed and pretty. Inverly, xmlb is
one long string on one line.

Is that because the some_file.xml is already nicely formatted? I thought
that the formatting was ignored when creating new elements.

Is their a function to 'pretty print' an element? I looked in api ref
and didn't see anything that would do it. It would be nice if their was
a way to create 'standard' formatted output for all elements regardless
of how they were created.

Comments and suggestions are greatly appreciated.

regards
-Matthew
Jul 19 '05 #1
8 5195
Matthew Thorley napisał(a):
The output I get shows xmla as nicely formatted text, with elements on
different lines and everything all tabbed and pretty. Inverly, xmlb is
one long string on one line.

Is that because the some_file.xml is already nicely formatted? I thought
that the formatting was ignored when creating new elements.


Why want you to read an XML document "by hand"? It's a "machine related"
data chunk.

Document formatting should be done by means of CSS and/or XSL stylesheet.

--
Jarek Zgoda
http://jpa.berlios.de/
Jul 19 '05 #2
Jarek Zgoda wrote:
Matthew Thorley napisał(a):
The output I get shows xmla as nicely formatted text, with elements on
different lines and everything all tabbed and pretty. Inverly, xmlb is
one long string on one line.

Is that because the some_file.xml is already nicely formatted? I
thought that the formatting was ignored when creating new elements.

Why want you to read an XML document "by hand"? It's a "machine related"
data chunk.

Document formatting should be done by means of CSS and/or XSL stylesheet.

It is just data to the machine, but people may have to read and
interpret this data. I don't think there is anything unsual about
formatting xml with tabs. Most web pages do that in their html/xhtml.
Just imagine if you wanted to change a broken link on your web page, and
the entire page was one long string. That may not matter to Dream
Weaver, but it sure would be annoying if you were using vi :)

-Matthew
Jul 19 '05 #3
Matthew Thorley wrote:
Greetings, perhaps someone can explain this. I get to different styles
of formatting for xmla and xmlb when I do the following:
<snip>
Is that because the some_file.xml is already nicely formatted? I thought
that the formatting was ignored when creating new elements.
ElementTree is preserving the whitespace of the original.
Is their a function to 'pretty print' an element?


AFAIK this is not supported in ElementTree. I hacked my own by modifying ElementTree._write(); it wasn't too hard to make a version that suited my purposes.

Kent
Jul 19 '05 #4
Jarek Zgoda wrote:
Why want you to read an XML document "by hand"? It's a "machine related"
data chunk.

I see this attitude all the time, and frankly I don't understand it.
Please explain why XML is in ASCII/unicode instead of binary. Is it
because it is easier for a machine to parse? No, I thought not. It's
obviously so humans can read it. The next question is: why is
arbitrary whitespace allowed? Is that to make it easier for machines
to parse? Is it any easier for machines to generate arbitrary
whitespace than it would have been for them to always insert, e.g., a
single space? No, I thought not there as well.
Document formatting should be done by means of CSS and/or XSL stylesheet.


He's not formatting the (rendered) document -- he's just formatting the
raw data to make it more readable in an editor. You could use CSS/XSL,
and then selectively add whitespace without actually affecting the
rendering. Alternatively, as you point out, it is a "machine related"
data chunk -- some XML documents are never even destined for human
eyes, _except_ for debugging. For some of those documents, CSS and XSL
are just a waste of CPU cycles.

Regards,
Pat

Jul 19 '05 #5
Matthew Thorley wrote:
from elementtree import ElementTree as et

xmla = et.ElementTree('some_file.xml')
xmlb = et.Element('parent')
et.SubElement(xmlb, 'child1')
et.SubElement(xmlb, 'child2')

root = et.Element('root')
root.append(xmla.getroot())
root.append(xmlb)

print et.tostring(root) [snip] Is their a function to 'pretty print' an element?


Depends on how pretty you want it. I've found that putting each element
on its own line has been sufficient for many of my manual-inspection use
cases. This isn't too hard with a cheap hack:

py> import elementtree.ElementTree as et
py> root = et.Element('root')
py> parent = et.SubElement(root, 'parent')
py> child = et.SubElement(parent, 'child')
py> print et.tostring(root)
<root><parent><child /></parent></root>
py> print et.tostring(root).replace('><', '>\n<')
<root>
<parent>
<child />
</parent>
</root>

Not ideal, but it may work well enough for you.

STeVe
Jul 19 '05 #6
On 24 Jun 2005 13:53:43 -0700, "Patrick Maupin" <pm*****@gmail.com>
declaimed the following in comp.lang.python:

I see this attitude all the time, and frankly I don't understand it.
Please explain why XML is in ASCII/unicode instead of binary. Is it
because it is easier for a machine to parse? No, I thought not. It's
obviously so humans can read it. The next question is: why is
Off hand, I'd consider the non-binary nature to be because the
internet protocols are mostly designed for text, not binary.

-- ================================================== ============ <
wl*****@ix.netcom.com | Wulfraed Dennis Lee Bieber KD6MOG <
wu******@dm.net | Bestiaria Support Staff <
================================================== ============ <
Home Page: <http://www.dm.net/~wulfraed/> <
Overflow Page: <http://wlfraed.home.netcom.com/> <

Jul 19 '05 #7
Dennis Bieber wrote:
Off hand, I'd consider the non-binary nature to be because the
internet protocols are mostly designed for text, not binary.


A document at http://www.w3.org/TR/REC-xml/ lists "the design goals for
XML".

One of the listed goals is "XML documents should be human-legible and
reasonably clear".

To your point, the very _first_ listed goal (if order means anything in
this list) is "XML shall be straightforwardly usable over the
Internet", so it's reasonable to assume "the non-binary nature to be
because the internet protocols are mostly designed for text, not
binary."

But this assumption turns cause and effect on its head. It is
perfectly feasible to pass binary data through every known internet
protocol (with a little simplistic encoding), and is done all the time.
The real next question is: why ARE the internet protocols "mostly
designed for text, not binary"?

SMTP, for example, was designed at a time when memory, bandwidth, and
CPU cycles were all at a premium, and MTAs were coded using fairly
low-level constructs in C where parsing was a pain in the rear. Even
so, the developers decided to use relatively free-formatted ASCII in
the protocol. To follow your theory to its logical conclusion, they
must have wasted all that bandwith, all those CPU cycles, all that
memory, all that disk space, and all that effort writing parsing code
because of yet another underlying mechanism which was "designed for
text."

On that account, your theory is correct, but only when you realize the
underlying mechanism which is "designed for text" is the human brain,
which has to try to make sense of all this mess when things aren't
quite interoperating properly.

Regards,
Pat

Jul 19 '05 #8
Patrick Maupin wrote:
"""
Dennis Bieber wrote:
Off hand, I'd consider the non-binary nature to be because the
internet protocols are mostly designed for text, not binary.


A document at http://www.w3.org/TR/REC-xml/ lists "the design goals for
XML".

One of the listed goals is "XML documents should be human-legible and
reasonably clear".
"""

Yes. Thanks for mentioning this, because people too often forget it.

minidom, 4Suite's Domlette and Amara all provide good pretty-print
output functions. The latter two use rules from the XSLT spec, which
is designed by people who have the above design goal well in their
blood.

--
Uche
http://copia.ogbuji.net

Jul 19 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Stewart Midwinter | last post by:
I want to parse a file with ElementTree. My file has the following format: <!-- file population.xml --> <?xml version='1.0' encoding='utf-8'?> <population> <person><name="joe" sex="male"...
1
by: Greg Wilson | last post by:
I'm trying to convert from minidom to ElementTree for handling XML, and am having trouble with entities in DTDs. My Python script looks like this: ...
1
by: mirandacascade | last post by:
O/S: Windows 2K Vsn of Python: 2.4 Currently: 1) Folder structure: \workarea\ <- ElementTree files reside here \xml\ \dom\
4
by: Damjan | last post by:
Attached is the smallest test case, that shows that ElementTree returns a string object if the text in the tree is only ascii, but returns a unicode object otherwise. This would make sense if...
15
by: Steven Bethard | last post by:
I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that it's gbk-encoded.) I always have trouble with...
0
by: Greg Aumann | last post by:
I am trying to write some python code for a library that reads an XML-like language from a file into elementtree data structures. Then I want to be able to read and/or modify the structure and then...
2
by: mirandacascade | last post by:
Situation is this: 1) I have inherited some python code that accepts a string object, the contents of which is an XML document, and produces a data structure that represents some of the content of...
5
by: saif.shakeel | last post by:
#!/usr/bin/env python from elementtree import ElementTree as Element tree = et.parse("testxml.xml") for t in tree.getiterator("SERVICEPARAMETER"): if t.get("Semantics") == "localId":...
3
by: Jan Danielsson | last post by:
Hello all, I'm using ElementTree to create an XHTML page (mod_python, blah, blah, blah). When I use ElementTree.tostring(root) to create a buffer which I want to return to the client, it doesn't...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.