473,508 Members | 2,324 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

xml file structure for use with ElementTree?

I want to parse a file with ElementTree. My file has the following
format:
<!-- file population.xml -->
<?xml version='1.0' encoding='utf-8'?>
<population>
<person><name="joe" sex="male" age="49"></person>
<person><name="hilda" sex="female" age="33"></person>
<person><name="bartholomew" sex="male" age="17">
</person>
</population>
note that the population can have more than one person.

I've created the following script to read and parse this file:
<!-- file et1.py -->
from elementtree import ElementTree
tree = ElementTree.parse("population.xml")
root = tree.getroot()
# ...manipulate tree...
tree.write("outfile.xml")

This script works if I have only one <person> record, but with more
than one record, it fails with the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "D:\PYTHON23\elementtree\ElementTree.py", line 864, in parse
tree.parse(source, parser)
File "D:\PYTHON23\elementtree\ElementTree.py", line 588, in parse
parser.feed(data)
File "D:\PYTHON23\elementtree\ElementTree.py", line 1132, in feed
self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3,
column 13

What am I doing wrong? Do I need to describe the xml structure in
some way?

2nd question. Assuming I can read and parse this file, I can create
and add an element to the tree. But how would I go about deleting
from the tree the <person> record for, say, name='joe'?

Is there a source of tutorial information anywhere for the use of
ElementTree? I'm looking for a few more examples than those contained
on the effbot.org site - none of which seem to address the question of
input file structure.

thanks
S
Jul 18 '05 #1
7 3244
Stewart Midwinter wrote:
<?xml version='1.0' encoding='utf-8'?>
<population>
<person><name="joe" sex="male" age="49"></person>
<person><name="hilda" sex="female" age="33"></person>
<person><name="bartholomew" sex="male" age="17">
</person>
</population> This script works if I have only one <person> record, but with more
than one record, it fails with the following error: ... xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3,
column 13

What am I doing wrong? Do I need to describe the xml structure in
some way?
This says your XML is not valid XML. The problem is on
line 3 column 13, which is the "=". You can't do that.
Legal XML would look like

<person name="joe" sex="male" age="49"></person>

or more concisely for empty elements

<person name="joe" sex="male" age="49"/>

2nd question. Assuming I can read and parse this file, I can create
and add an element to the tree. But how would I go about deleting
from the tree the <person> record for, say, name='joe'?
from elementtree import ElementTree

tree = ElementTree.parse("tmp.xml").getroot()

for person in tree.findall("person"):
if person.attrib["name"] == "joe":
tree.remove(person)
break
else:
raise AssertionError("Where's joe?")

ElementTree.dump(tree) <population>
<person age="33" name="hilda" sex="female" />
<person age="17" name="bartholomew" sex="male">
</person>
</population>


Is there a source of tutorial information anywhere for the use of
ElementTree? I'm looking for a few more examples than those contained
on the effbot.org site - none of which seem to address the question of
input file structure.


The effbot site, and its links to articles by Uche and David,
is the best source for documentation.

Given what you've shown, you need a reference to XML
and not ElementTree. The latter assumes you understand
the former. I don't have one handy.

Andrew
da***@dalkescientific.com
Jul 18 '05 #2
Andrew Dalke <ad****@mindspring.com> wrote
Legal XML would look like
<person name="joe" sex="male" age="49"></person>
sigh... after a good night's sleep I discovered that myself this
morning. It's obvious, of course.
for person in tree.findall("person"):
if person.attrib["name"] == "joe":
tree.remove(person)
break
else:
raise AssertionError("Where's joe?")
That's the ticket! Unfortunately at the moment when I run this code I
get the following error:'
ElementTree instance has no attribute 'remove'
but I'll try to work through that.
Given what you've shown, you need a reference to XML
and not ElementTree. The latter assumes you understand
the former. I don't have one handy.


that's a polite way of saying I'm clueless about XML, which is true!
The main appeal of ElementTree was so I could avoid having to learn a
whole lot about XML in order to parse a simple file, but I am coming
to the conclusion that ElementTree is only simple if you already have
an understanding about XML.

thanks again,
S
Jul 18 '05 #3
Stewart Midwinter wrote:
That's the ticket! Unfortunately at the moment when I run this code I
get the following error:'
ElementTree instance has no attribute 'remove'
but I'll try to work through that.
Perhaps you need a newer version of ElementTree? I don't
know when 'remove' was added.
The main appeal of ElementTree was so I could avoid having to learn a
whole lot about XML in order to parse a simple file, but I am coming
to the conclusion that ElementTree is only simple if you already have
an understanding about XML.


The problem is that you also need to generate the XML file.
You could use ElementTree to do that, but I've not used
it that way yet.

XML syntax isn't that hard. The summary is that everything
looks pretty much like this

<tagname attrib="val">Content goes here</tagname>

The <tagname> is called an opening tag and the </tagname>
is called an opening tag. The whole thing is called an
element.

There can be 0 or more attributes in the opening tag, but
none in the closing tag. So the following are valid
opening tag names

<tagname>
<person name="Andrew">
<person name="Andrew" city="Santa Fe">

There are ways to escape special characters in the
values for an attribute. Only some characters are
allowed as tag names and attribute names. The ':'
is the only special one. It's used for namespaces.
That's more complicated and you'll need to look
elsewhere for details on that. I don't believe duplicate
attribute names are allowed. Even if they are, don't
use them.

The contents of an element can contain text and other
elements. This is what makes it an element tree.
So the following is also valid

<person><name>Andrew</name><city>Santa Fe</city></person>

It's a matter of some preference about whether to put
data into attributes or as contents of an element.

As a shortcut, if there is no content then ending
the tag with a '/>' makes it both an opening tag and
a closing tag, so the following is a complete element.

<person name="Andrew"/>

The first line of your XML document could contain
another sort of element called a processing directive.
It tells the XML parser how to process the rest of
the document. It looks like this

<?xml version='1.0' encoding='utf-8'?>

Besides describing which XML definition is used
(there's only one I know about), this tells the
processor to interpret bytes as the UTF-8 encoding
of Unicode characters. I believe the first few
bytes are also used to determing the byte ordering
in case the text is stored as big-endian or little-
endian "wide" unicode characters.

One final note. Only one top-level element is
allowed in an XML file. For example, this is allowed
<?xml version='1.0' encoding='utf-8'?>
<people>
<person name="Andrew"/>
<person name="Fred"/>
</people>

while this is not

<?xml version='1.0' encoding='utf-8'?>
<person name="Andrew"/>
<person name="Fred"/>

In other words there is only one root to the
element tree.

Andrew
da***@dalkescientific.com
Jul 18 '05 #4
Andrew Dalke wrote:
That's the ticket! Unfortunately at the moment when I run this code I
get the following error:'
ElementTree instance has no attribute 'remove'
but I'll try to work through that.


Perhaps you need a newer version of ElementTree? I don't
know when 'remove' was added.


remove is available on Element instances. it's not available on ElementTree
wrappers.

</F>

Jul 18 '05 #5
Andrew Dalke wrote:
<?xml version='1.0' encoding='utf-8'?>

Besides describing which XML definition is used
(there's only one I know about)


there's also this one:

http://www.w3.org/TR/2004/REC-xml11-20040204/

however,

http://norman.walsh.name/2004/09/30/xml11

</F>

Jul 18 '05 #6
/F:
remove is available on Element instances. it's not available on ElementTree
wrappers.


Ahh, didn't catch that in the post. I wish the OP had
given a normal traceback instead of summarizing it. Would
have made it easier to see that the "this code" in the
phrase "when I run this code" didn't actually refer to
the code I posted.

Andrew
da***@dalkescientific.com
Jul 18 '05 #7
Andrew Dalke <ad****@mindspring.com> wrote in message
XML syntax isn't that hard. The summary is that everything
looks pretty much like this

.... snip ...
Thanks Andrew, that was a great summary of XML syntax. It will come
in handy when I take the next step, editing a larger, more complex
file.

My ElementTree method is working properly now; I posted a little
sample app on Fredrik's discussion board on quicktopic.com.

cheers
Stewart
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
4044
by: Matthew Thorley | last post by:
Why does ElementTree.parse convert my xsi to an xmlns? When I do this from elementtree import ElementTree # Sample xml mgac =""" <mgac xmlns="http://www.chpc.utah.edu/~baites/mgacML"...
9
10062
by: Chris Spencer | last post by:
Does anyone know how to make ElementTree preserve namespace prefixes in parsed xml files? The default behavior is to strip a document of all prefixes and then replace them autogenerated prefixes...
2
3448
by: Randall Parker | last post by:
Running Python 2.4.2 on Windows in SPE. I have a very small XML file (see below) that is in UTF-8 and saved by Windows Notepad as such. I'm reading it with minidom. My problem is that once I...
7
3059
by: saif.shakeel | last post by:
Hi, I need to replace a string in xml file with something else.Ex - <SERVICEPARAMETER id="_775" Semantics="subfunction" DDORef="_54"> <SHORTNAME>rate</SHORTNAME> <LONGNAME>rate</LONGNAME>...
5
9974
by: saif.shakeel | last post by:
#!/usr/bin/env python from elementtree import ElementTree as Element tree = et.parse("testxml.xml") for t in tree.getiterator("SERVICEPARAMETER"): if t.get("Semantics") == "localId":...
4
2575
by: Anand | last post by:
Hi, I'm new to Python. we are using python2.4. I wanted to insert an element into an existing xml file. Can anyone help me with this? I've seen lxml and elementTree in python2.5 has some...
2
4246
by: Rick Muller | last post by:
I'm a computational chemist who frequently dabbles in Python. A collaborator sent me a huge XML file that at one point was evidently modified by a now defunct java application. A sample of this...
11
3494
by: Peter Pei | last post by:
One bad design about elementtree is that it has different ways parsing a string and a file, even worse they return different objects: 1) When you parse a file, you can simply call parse, which...
8
1782
by: =?ISO-8859-1?Q?m=E9choui?= | last post by:
Problem: - You have tree structure (XML-like) that you don't want to create 100% in memory, because it just takes too long (for instance, you need a http request to request the information from...
0
7224
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7323
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7379
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
7493
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5625
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4706
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3192
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3180
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
763
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.