471,090 Members | 1,287 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,090 software developers and data experts.

Getting elements and text with lxml

Hello,

I have an XML file that starts with:

<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
<ofc>*</ofc>-<rad>a</rad>
</kap>

out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):

[("ofc", "*"), "-", ("rad", "a")]

How can I do it? I managed to get the content of boths tags and the
text up to the first tag ("\n "), but not the - (and in other XML
files, there's more text outside the elements).

Thanks.
Jun 27 '08 #1
5 2368
En Fri, 16 May 2008 18:53:03 -0300, J. Pablo Fernández <pu****@pupeno.com>
escribió:
Hello,

I have an XML file that starts with:

<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
<ofc>*</ofc>-<rad>a</rad>
</kap>

out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):

[("ofc", "*"), "-", ("rad", "a")]

How can I do it? I managed to get the content of boths tags and the
text up to the first tag ("\n "), but not the - (and in other XML
files, there's more text outside the elements).
Look for the "tail" attribute.

--
Gabriel Genellina

Jun 27 '08 #2
On May 17, 2:19*am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Fri, 16 May 2008 18:53:03 -0300, J. Pablo Fernández <pup...@pupeno.com*
escribió:
Hello,
I have an XML file that starts with:
<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
* <ofc>*</ofc>-<rad>a</rad>
</kap>
out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):
[("ofc", "*"), "-", ("rad", "a")]
How can I do it? I managed to get the content of boths tags and the
text up to the first tag ("\n * "), but not the - (and in other XML
files, there's more text outside the elements).

Look for the "tail" attribute.
That gives me the last part, but not the one in the middle:

In : etree.tounicode(e)
Out: u'<kap>\n <ofc>*</ofc>-<rad>a</rad>\n</kap>\n'

In : e.text
Out: '\n '

In : e.tail
Out: '\n'

Thanks.
Jun 27 '08 #3
J. Pablo Fernández wrote:
On May 17, 2:19 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
>En Fri, 16 May 2008 18:53:03 -0300, J. Pablo Fernández <pup...@pupeno.com>
escribió:
>>Hello,
I have an XML file that starts with:
<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
<ofc>*</ofc>-<rad>a</rad>
</kap>
out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):
[("ofc", "*"), "-", ("rad", "a")]
How can I do it? I managed to get the content of boths tags and the
text up to the first tag ("\n "), but not the - (and in other XML
files, there's more text outside the elements).
Look for the "tail" attribute.

That gives me the last part, but not the one in the middle:

In : etree.tounicode(e)
Out: u'<kap>\n <ofc>*</ofc>-<rad>a</rad>\n</kap>\n'

In : e.text
Out: '\n '

In : e.tail
Out: '\n'
You need the text content of your initial element's children, which
needs that of their children, and so on.

See http://effbot.org/zone/element-bits-and-pieces.htm

HTH,
John
Jun 27 '08 #4
J. Pablo Fernández wrote:
I have an XML file that starts with:

<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
<ofc>*</ofc>-<rad>a</rad>
</kap>

out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):

[("ofc", "*"), "-", ("rad", "a")]
>>root = etree.fromstring(xml)
l = []
for el in root.iter(): # or root.getiterator()
... l.append((el, el.text))
... l.append(el.text)

or maybe this is enough:

list(root.itertext())

Stefan
Jun 27 '08 #5
On May 17, 4:17*pm, Stefan Behnel <stefan...@behnel.dewrote:
J. Pablo Fernández wrote:
I have an XML file that starts with:
<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
* <ofc>*</ofc>-<rad>a</rad>
</kap>
out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):
[("ofc", "*"), "-", ("rad", "a")]

* * >>root = etree.fromstring(xml)
* * >>l = []
* * >>for el in root.iter(): * *# or root.getiterator()
* * ... * * l.append((el, el.text))
* * ... * * l.append(el.text)

or maybe this is enough:

* * list(root.itertext())

Stefan
Hello,

My object doesn't have iter() or itertext(), it only has:
iterancestors, iterchildren, iterdescendants, itersiblings.

Thanks.
Jun 27 '08 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Jan Dries | last post: by
12 posts views Thread by kublai | last post: by
8 posts views Thread by geoffbache | last post: by
reply views Thread by Stefan Behnel | last post: by
reply views Thread by Frank Cusack | last post: by
1 post views Thread by Owen Zhang | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.