By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,732 Members | 1,397 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,732 IT Pros & Developers. It's quick & easy.

xml file structure for use with ElementTree?

P: n/a
I want to parse a file with ElementTree. My file has the following
format:
<!-- file population.xml -->
<?xml version='1.0' encoding='utf-8'?>
<population>
<person><name="joe" sex="male" age="49"></person>
<person><name="hilda" sex="female" age="33"></person>
<person><name="bartholomew" sex="male" age="17">
</person>
</population>
note that the population can have more than one person.

I've created the following script to read and parse this file:
<!-- file et1.py -->
from elementtree import ElementTree
tree = ElementTree.parse("population.xml")
root = tree.getroot()
# ...manipulate tree...
tree.write("outfile.xml")

This script works if I have only one <person> record, but with more
than one record, it fails with the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "D:\PYTHON23\elementtree\ElementTree.py", line 864, in parse
tree.parse(source, parser)
File "D:\PYTHON23\elementtree\ElementTree.py", line 588, in parse
parser.feed(data)
File "D:\PYTHON23\elementtree\ElementTree.py", line 1132, in feed
self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3,
column 13

What am I doing wrong? Do I need to describe the xml structure in
some way?

2nd question. Assuming I can read and parse this file, I can create
and add an element to the tree. But how would I go about deleting
from the tree the <person> record for, say, name='joe'?

Is there a source of tutorial information anywhere for the use of
ElementTree? I'm looking for a few more examples than those contained
on the effbot.org site - none of which seem to address the question of
input file structure.

thanks
S
Jul 18 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Stewart Midwinter wrote:
<?xml version='1.0' encoding='utf-8'?>
<population>
<person><name="joe" sex="male" age="49"></person>
<person><name="hilda" sex="female" age="33"></person>
<person><name="bartholomew" sex="male" age="17">
</person>
</population> This script works if I have only one <person> record, but with more
than one record, it fails with the following error: ... xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3,
column 13

What am I doing wrong? Do I need to describe the xml structure in
some way?
This says your XML is not valid XML. The problem is on
line 3 column 13, which is the "=". You can't do that.
Legal XML would look like

<person name="joe" sex="male" age="49"></person>

or more concisely for empty elements

<person name="joe" sex="male" age="49"/>

2nd question. Assuming I can read and parse this file, I can create
and add an element to the tree. But how would I go about deleting
from the tree the <person> record for, say, name='joe'?
from elementtree import ElementTree

tree = ElementTree.parse("tmp.xml").getroot()

for person in tree.findall("person"):
if person.attrib["name"] == "joe":
tree.remove(person)
break
else:
raise AssertionError("Where's joe?")

ElementTree.dump(tree) <population>
<person age="33" name="hilda" sex="female" />
<person age="17" name="bartholomew" sex="male">
</person>
</population>


Is there a source of tutorial information anywhere for the use of
ElementTree? I'm looking for a few more examples than those contained
on the effbot.org site - none of which seem to address the question of
input file structure.


The effbot site, and its links to articles by Uche and David,
is the best source for documentation.

Given what you've shown, you need a reference to XML
and not ElementTree. The latter assumes you understand
the former. I don't have one handy.

Andrew
da***@dalkescientific.com
Jul 18 '05 #2

P: n/a
Andrew Dalke <ad****@mindspring.com> wrote
Legal XML would look like
<person name="joe" sex="male" age="49"></person>
sigh... after a good night's sleep I discovered that myself this
morning. It's obvious, of course.
for person in tree.findall("person"):
if person.attrib["name"] == "joe":
tree.remove(person)
break
else:
raise AssertionError("Where's joe?")
That's the ticket! Unfortunately at the moment when I run this code I
get the following error:'
ElementTree instance has no attribute 'remove'
but I'll try to work through that.
Given what you've shown, you need a reference to XML
and not ElementTree. The latter assumes you understand
the former. I don't have one handy.


that's a polite way of saying I'm clueless about XML, which is true!
The main appeal of ElementTree was so I could avoid having to learn a
whole lot about XML in order to parse a simple file, but I am coming
to the conclusion that ElementTree is only simple if you already have
an understanding about XML.

thanks again,
S
Jul 18 '05 #3

P: n/a
Stewart Midwinter wrote:
That's the ticket! Unfortunately at the moment when I run this code I
get the following error:'
ElementTree instance has no attribute 'remove'
but I'll try to work through that.
Perhaps you need a newer version of ElementTree? I don't
know when 'remove' was added.
The main appeal of ElementTree was so I could avoid having to learn a
whole lot about XML in order to parse a simple file, but I am coming
to the conclusion that ElementTree is only simple if you already have
an understanding about XML.


The problem is that you also need to generate the XML file.
You could use ElementTree to do that, but I've not used
it that way yet.

XML syntax isn't that hard. The summary is that everything
looks pretty much like this

<tagname attrib="val">Content goes here</tagname>

The <tagname> is called an opening tag and the </tagname>
is called an opening tag. The whole thing is called an
element.

There can be 0 or more attributes in the opening tag, but
none in the closing tag. So the following are valid
opening tag names

<tagname>
<person name="Andrew">
<person name="Andrew" city="Santa Fe">

There are ways to escape special characters in the
values for an attribute. Only some characters are
allowed as tag names and attribute names. The ':'
is the only special one. It's used for namespaces.
That's more complicated and you'll need to look
elsewhere for details on that. I don't believe duplicate
attribute names are allowed. Even if they are, don't
use them.

The contents of an element can contain text and other
elements. This is what makes it an element tree.
So the following is also valid

<person><name>Andrew</name><city>Santa Fe</city></person>

It's a matter of some preference about whether to put
data into attributes or as contents of an element.

As a shortcut, if there is no content then ending
the tag with a '/>' makes it both an opening tag and
a closing tag, so the following is a complete element.

<person name="Andrew"/>

The first line of your XML document could contain
another sort of element called a processing directive.
It tells the XML parser how to process the rest of
the document. It looks like this

<?xml version='1.0' encoding='utf-8'?>

Besides describing which XML definition is used
(there's only one I know about), this tells the
processor to interpret bytes as the UTF-8 encoding
of Unicode characters. I believe the first few
bytes are also used to determing the byte ordering
in case the text is stored as big-endian or little-
endian "wide" unicode characters.

One final note. Only one top-level element is
allowed in an XML file. For example, this is allowed
<?xml version='1.0' encoding='utf-8'?>
<people>
<person name="Andrew"/>
<person name="Fred"/>
</people>

while this is not

<?xml version='1.0' encoding='utf-8'?>
<person name="Andrew"/>
<person name="Fred"/>

In other words there is only one root to the
element tree.

Andrew
da***@dalkescientific.com
Jul 18 '05 #4

P: n/a
Andrew Dalke wrote:
That's the ticket! Unfortunately at the moment when I run this code I
get the following error:'
ElementTree instance has no attribute 'remove'
but I'll try to work through that.


Perhaps you need a newer version of ElementTree? I don't
know when 'remove' was added.


remove is available on Element instances. it's not available on ElementTree
wrappers.

</F>

Jul 18 '05 #5

P: n/a
Andrew Dalke wrote:
<?xml version='1.0' encoding='utf-8'?>

Besides describing which XML definition is used
(there's only one I know about)


there's also this one:

http://www.w3.org/TR/2004/REC-xml11-20040204/

however,

http://norman.walsh.name/2004/09/30/xml11

</F>

Jul 18 '05 #6

P: n/a
/F:
remove is available on Element instances. it's not available on ElementTree
wrappers.


Ahh, didn't catch that in the post. I wish the OP had
given a normal traceback instead of summarizing it. Would
have made it easier to see that the "this code" in the
phrase "when I run this code" didn't actually refer to
the code I posted.

Andrew
da***@dalkescientific.com
Jul 18 '05 #7

P: n/a
Andrew Dalke <ad****@mindspring.com> wrote in message
XML syntax isn't that hard. The summary is that everything
looks pretty much like this

.... snip ...
Thanks Andrew, that was a great summary of XML syntax. It will come
in handy when I take the next step, editing a larger, more complex
file.

My ElementTree method is working properly now; I posted a little
sample app on Fredrik's discussion board on quicktopic.com.

cheers
Stewart
Jul 18 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.