By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,089 Members | 2,359 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,089 IT Pros & Developers. It's quick & easy.

ElementTree, how to get the whole content of a tag

P: n/a
Given the folowing XML snippet, I build an ElementTree instance with
et=ElementTree.fromstring(..). Now et.text returns just '\n text\n some
other text'.
Is there any way I could get everything between the <div> and </div> tag?

<div>
text
some other text<br/>
and then some more
</div>
--
damjan
Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Damjan <gd*****@gmail.com> wrote:
Given the folowing XML snippet, I build an ElementTree instance with
et=ElementTree.fromstring(..). Now et.text returns just '\n text\n some
other text'.
Is there any way I could get everything between the <div> and </div> tag?

<div>
text
some other text<br/>
and then some more
</div>


def gettext(elem):
text = elem.text or ""
for subelem in elem:
text = text + gettext(subelem)
if subelem.tail:
text = text + subelem.tail
return text
gettext(et)

'\n text\n some other text\n and then some more\n'

</F>

Jul 18 '05 #2

P: n/a
>> Is there any way I could get everything between the <div> and </div> tag?

<div>
text
some other text<br/>
and then some more
</div>
gettext(et)

'\n text\n some other text\n and then some more\n'


I acctually need to get
'\n text\n some other text<br/>\n and then some more\n'

And if there were attributes in <br/> I'd want them too where they were.
Can't I just get ALL the text between the <div> tags?

--
damjan
Jul 18 '05 #3

P: n/a
Damjan wrote:
Is there any way I could get everything between the <div> and </div> tag?

<div>
text
some other text<br/>
and then some more
</div> gettext(et)

'\n text\n some other text\n and then some more\n'


I acctually need to get
'\n text\n some other text<br/>\n and then some more\n'


that's not the tree content, that's a serialized XML fragment.

the quickest way to do that is to serialize the entire element, and
strip off the start and end tags:

text = ElementTree.tostring(elem)
text = text.split(">", 1)[1].rsplit("<", 1)[0]

alternatively, you can serialize the subelements, and add in properly
encoded text and tail attributes:

def innersource(elem, encoding="ascii"):
text = ElementTree._encode(elem.text or "", encoding)
for subelem in elem:
text = text + ElementTree.tostring(subelem)
if subelem.tail:
text = text + ElementTree._encode(subelem.tail, encoding)
return text

(but _encode is not an official part of the elementtree API, so this code
may not work in post-1.2 releases)

</F>

Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.