473,395 Members | 2,713 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

ElementTree oddities

I'm trying to extract the text from some xml. I figured this
convenient python two-liner would do it for me:
>>from xml.etree.ElementTree import *
from cStringIO import StringIO
root = parse(StringIO(xml)).getroot()
' '.join([n.text for n in root.getiterator() if n.text is not None])
However, it's missing some of the text. For example, the following
XML:
>>xml = "<highlight><sp />Bar</highlight>"
Returns me a empty string. Seems the "<sp />" tag is borking it.
Also, the for the following XML:
>>xml = "<highlight><ref>Bar</ref>:</highlight>"
I only get "Bar". It's missing the trailing colon.

I'm not that experienced with XML so perhaps I am just missing
something here. Please enlighten me.

Thanks,
Brian
Sep 15 '08 #1
4 1153
Brian Cole wrote:
However, it's missing some of the text. For example, the following
XML:
>>>xml = "<highlight><sp />Bar</highlight>"

Returns me a empty string. Seems the "<sp />" tag is borking it.
Also, the for the following XML:
>>>xml = "<highlight><ref>Bar</ref>:</highlight>"

I only get "Bar". It's missing the trailing colon.

I'm not that experienced with XML so perhaps I am just missing
something here. Please enlighten me.
you're missing the "tail" attribute, which contains text that follows
directly *after* the element's end tag. it's not exactly a one-liner,
but I usually use the one on this page:

http://effbot.org/zone/element-bits-...es.htm#gettext

</F>

Sep 15 '08 #2
I'm not sure, but I think your document is not well formated...

Anyone as the name of the module you must think about XML, not as a
flat doc, but as a tree that's the only way I got to parse XML.

Brian Cole a écrit :
I'm trying to extract the text from some xml. I figured this
convenient python two-liner would do it for me:
>from xml.etree.ElementTree import *
from cStringIO import StringIO
root = parse(StringIO(xml)).getroot()
' '.join([n.text for n in root.getiterator() if n.text is not None])

However, it's missing some of the text. For example, the following
XML:
>xml = "<highlight><sp />Bar</highlight>"

Returns me a empty string. Seems the "<sp />" tag is borking it.
Also, the for the following XML:
>xml = "<highlight><ref>Bar</ref>:</highlight>"

I only get "Bar". It's missing the trailing colon.

I'm not that experienced with XML so perhaps I am just missing
something here. Please enlighten me.

Thanks,
Brian
Sep 15 '08 #3
Fredrik is correct, the text attribute only contains text before a
child element; tail contains the rest. It is indeed rather odd. For
comparison, here's how you would do it in lxml (http://codespeak.net/
lxml/index.html), a library which supports XPath:

from lxml import etree
tree = etree.fromstring('<highlight><ref>Bar</ref>:</highlight>')
print ' '.join(tree.xpath('//text()'))
Sep 15 '08 #4
Mark Thomas wrote:
here's how you would do it in lxml (http://codespeak.net/
lxml/index.html), a library which supports XPath:

from lxml import etree
tree = etree.fromstring('<highlight><ref>Bar</ref>:</highlight>')
print ' '.join(tree.xpath('//text()'))
If you want to use XPath, try this:

print tree.xpath('string()')

or if you want to use it in real code:

get_tree_text = etree.XPath('string()')
print get_tree_text(tree)

or just use

print etree.tostring(tree, method="text")

Stefan
Sep 16 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Stewart Midwinter | last post by:
I want to parse a file with ElementTree. My file has the following format: <!-- file population.xml --> <?xml version='1.0' encoding='utf-8'?> <population> <person><name="joe" sex="male"...
1
by: Greg Wilson | last post by:
I'm trying to convert from minidom to ElementTree for handling XML, and am having trouble with entities in DTDs. My Python script looks like this: ...
1
by: mirandacascade | last post by:
O/S: Windows 2K Vsn of Python: 2.4 Currently: 1) Folder structure: \workarea\ <- ElementTree files reside here \xml\ \dom\
15
by: Steven Bethard | last post by:
I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that it's gbk-encoded.) I always have trouble with...
0
by: Greg Aumann | last post by:
I am trying to write some python code for a library that reads an XML-like language from a file into elementtree data structures. Then I want to be able to read and/or modify the structure and then...
2
by: mirandacascade | last post by:
Situation is this: 1) I have inherited some python code that accepts a string object, the contents of which is an XML document, and produces a data structure that represents some of the content of...
5
by: saif.shakeel | last post by:
#!/usr/bin/env python from elementtree import ElementTree as Element tree = et.parse("testxml.xml") for t in tree.getiterator("SERVICEPARAMETER"): if t.get("Semantics") == "localId":...
1
by: Mike Slinn | last post by:
The following short Python program parses a KML file and displays the names of all Marks and Routes: from elementtree.ElementTree import ElementTree tree = ElementTree(file='test.kml') kml =...
3
by: gray.bowman | last post by:
I'm messing around with trying to write an xml file using xml.etree.ElementTree. All the examples on the internet show the use of ElementTree.write(), although when I try to use it it's not...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.