Hello,
I have an XML file that starts with:
<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
<ofc>*</ofc>-<rad>a</rad>
</kap>
out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):
[("ofc", "*"), "-", ("rad", "a")]
How can I do it? I managed to get the content of boths tags and the
text up to the first tag ("\n "), but not the - (and in other XML
files, there's more text outside the elements).
Thanks. 5 2464
En Fri, 16 May 2008 18:53:03 -0300, J. Pablo Fernández <pu****@pupeno.com>
escribió:
Hello,
I have an XML file that starts with:
<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
<ofc>*</ofc>-<rad>a</rad>
</kap>
out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):
[("ofc", "*"), "-", ("rad", "a")]
How can I do it? I managed to get the content of boths tags and the
text up to the first tag ("\n "), but not the - (and in other XML
files, there's more text outside the elements).
Look for the "tail" attribute.
--
Gabriel Genellina
On May 17, 2:19*am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Fri, 16 May 2008 18:53:03 -0300, J. Pablo Fernández <pup...@pupeno.com*
escribió:
Hello,
I have an XML file that starts with:
<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
* <ofc>*</ofc>-<rad>a</rad>
</kap>
out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):
[("ofc", "*"), "-", ("rad", "a")]
How can I do it? I managed to get the content of boths tags and the
text up to the first tag ("\n * "), but not the - (and in other XML
files, there's more text outside the elements).
Look for the "tail" attribute.
That gives me the last part, but not the one in the middle:
In : etree.tounicode(e)
Out: u'<kap>\n <ofc>*</ofc>-<rad>a</rad>\n</kap>\n'
In : e.text
Out: '\n '
In : e.tail
Out: '\n'
Thanks.
J. Pablo Fernández wrote:
On May 17, 2:19 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
>En Fri, 16 May 2008 18:53:03 -0300, J. Pablo Fernández <pup...@pupeno.com> escribió:
>>Hello, I have an XML file that starts with: <vortaro> <art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $"> <kap> <ofc>*</ofc>-<rad>a</rad> </kap> out of it, I'd like to extract something like (I'm just showing one structure, any structure as long as all data is there is fine): [("ofc", "*"), "-", ("rad", "a")] How can I do it? I managed to get the content of boths tags and the text up to the first tag ("\n "), but not the - (and in other XML files, there's more text outside the elements).
Look for the "tail" attribute.
That gives me the last part, but not the one in the middle:
In : etree.tounicode(e)
Out: u'<kap>\n <ofc>*</ofc>-<rad>a</rad>\n</kap>\n'
In : e.text
Out: '\n '
In : e.tail
Out: '\n'
You need the text content of your initial element's children, which
needs that of their children, and so on.
See http://effbot.org/zone/element-bits-and-pieces.htm
HTH,
John
J. Pablo Fernández wrote:
I have an XML file that starts with:
<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
<ofc>*</ofc>-<rad>a</rad>
</kap>
out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):
[("ofc", "*"), "-", ("rad", "a")]
>>root = etree.fromstring(xml) l = [] for el in root.iter(): # or root.getiterator()
... l.append((el, el.text))
... l.append(el.text)
or maybe this is enough:
list(root.itertext())
Stefan
On May 17, 4:17*pm, Stefan Behnel <stefan...@behnel.dewrote:
J. Pablo Fernández wrote:
I have an XML file that starts with:
<vortaro>
<art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
<kap>
* <ofc>*</ofc>-<rad>a</rad>
</kap>
out of it, I'd like to extract something like (I'm just showing one
structure, any structure as long as all data is there is fine):
[("ofc", "*"), "-", ("rad", "a")]
* * >>root = etree.fromstring(xml)
* * >>l = []
* * >>for el in root.iter(): * *# or root.getiterator()
* * ... * * l.append((el, el.text))
* * ... * * l.append(el.text)
or maybe this is enough:
* * list(root.itertext())
Stefan
Hello,
My object doesn't have iter() or itertext(), it only has:
iterancestors, iterchildren, iterdescendants, itersiblings.
Thanks. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Jan Dries |
last post by:
I'm trying to find Windows binaries for lxml. The cheeseshop is supposed
to have such binaries, but I can't find them.
Does anyone know where I might find such binaries?
Thanks,
Jan
|
by: sebzzz |
last post by:
Hi,
I work at this company and we are re-building our website: http://caslt.org/.
The new website will be built by an external firm (I could do it
myself, but since I'm just the summer student...
|
by: kublai |
last post by:
Hello,
For a project, I need to develop a corpus of online news stories. I'm
looking for an application that, given the url of a web page, "copies"
the rendered text of the web page (not the...
|
by: geoffbache |
last post by:
I have some marked up text and would like to convert it to plain text,
by simply removing all the tags. Of course I can do it from first
principles but I felt that among all Python's markup tools...
|
by: Stefan Behnel |
last post by:
Hi everyone,
I'm very happy to announce the official release of lxml 2.0!
http://codespeak.net/lxml/
http://pypi.python.org/pypi/lxml/2.0
** What is lxml?
"""
|
by: Frank Cusack |
last post by:
Is it possible to require one or more from a list of optional elements?
If I have something like:
<element name="parent">
<oneOrMore>
<interleave>
<optional>
<element name="child1">
<text/>...
|
by: =?iso-8859-1?q?KLEIN_St=E9phane?= |
last post by:
Hi,
I'm on Ubuntu 8.04.1
I've installed lxml with easy_install lxml command.
Now, when I load etree I've this error :
$ python
Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)
|
by: =?iso-8859-1?q?KLEIN_St=E9phane?= |
last post by:
Le Mon, 25 Aug 2008 13:50:50 +0000, KLEIN Stéphane a écrit :
I've this bug only with lxml2, lxml 1.3.3 work very well.
Regards,
Stephane
|
by: Owen Zhang |
last post by:
I am trying to build lxml package in SunOS 5.10. I got the following
errors. Does anybody know why?
$ python setup.py build
Building lxml version 2.1.
NOTE: Trying to build without Cython,...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
| |