469,365 Members | 1,840 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,365 developers. It's quick & easy.

get element text in DOM?

How can i get the text between the <teste> tags??
xml = """<root><teste> texto </teste></root>"""
from xml.dom import minidom
document = minidom.parseString(xml)
document <xml.dom.minidom.Document instance at 0x4181df0c> minidom.getElementsByTagName('teste') element = document.getElementsByTagName('teste')
element [<DOM Element: teste at 0x418e110c>] element[0].nodeType

1

Juliano Freitas
Jul 18 '05 #1
4 17146
Juliano Freitas wrote:
How can i get the text between the <teste> tags??

xml = """<root><teste> texto </teste></root>"""


You must know that the text between the tags is a DOM element
by itself, namely a TEXT node, which is a child of the
elment node formed by the tag.

So try;

xml = """<root><teste> texto </teste></root>"""
from xml.dom import minidom
document = minidom.parseString(xml)
element = document.getElementsByTagName('teste')
textelt=element[0].firstChild
print textelt.nodeType, textelt.nodeValue

and it will print:

3 texto

--Irmen
Jul 18 '05 #2
Juliano Freitas <ju*****@atlas.ucpel.tche.br> wrote in message news:<ma**************************************@pyt hon.org>...
How can i get the text between the <teste> tags??
xml = """<root><teste> texto </teste></root>"""
from xml.dom import minidom
document = minidom.parseString(xml)
document <xml.dom.minidom.Document instance at 0x4181df0c> minidom.getElementsByTagName('teste') element = document.getElementsByTagName('teste')
element [<DOM Element: teste at 0x418e110c>] element[0].nodeType 1

Juliano Freitas


http://lists.fourthought.com/piperma...er/013027.html

Verbatim:

"""
Or, ObTopic, for 4Suite recent CVS:
from Ft.Xml.Domlette import NonvalidatingReader
doc = NonvalidatingReader.parseString("<root><teste> texto </teste></root>", 'urn:dummy') print doc.xpath('string(/root/teste)')

texto

Simple and sweet IMHO.
"""

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
A hands-on introduction to ISO Schematron -
http://www-106.ibm.com/developerwork...ematron-i.html
Schematron abstract patterns -
http://www.ibm.com/developerworks/xm...y/x-stron.html
Wrestling HTML (using Python) -
http://www.xml.com/pub/a/2004/09/08/pyxml.html
XML's growing pains - http://www.adtmag.com/article.asp?id=10196
XMLOpen and more XML Hacks -
http://www.ibm.com/developerworks/xm...x-think27.html
A survey of XML standards -
http://www-106.ibm.com/developerwork...rary/x-stand4/
Jul 18 '05 #3
On Wed, 10 Nov 2004 17:11:09 -0200, Juliano Freitas
<ju*****@atlas.ucpel.tche.br> wrote:
How can i get the text between the <teste> tags??
xml = """<root><teste> texto </teste></root>"""
from xml.dom import minidom
document = minidom.parseString(xml)
document<xml.dom.minidom.Document instance at 0x4181df0c> minidom.getElementsByTagName('teste') element = document.getElementsByTagName('teste')
element[<DOM Element: teste at 0x418e110c>] element[0].nodeType1

Here is an useful function I have written:

def getText(node, recursive = False):
"""
Get all the text associated with this node.
With recursive == True, all text from child nodes is retrieved
"""
L = ['']
for n in node.childNodes:
if n.nodeType in (dom.Node.TEXT_NODE,
dom.Node.CDATA_SECTION_NODE):
L.append(n.data)
else:
if not recursive:
return None
L.append( get_text(n) )

return ''.join(L)
print getText(element[0])



Regards Manlio Perillo
Jul 18 '05 #4
Manlio Perillo <NO******************@libero.it> wrote:
for n in node.childNodes:
if n.nodeType in (dom.Node.TEXT_NODE, dom.Node.CDATA_SECTION_NODE):
(Aside: node.TEXT_NODE would probably be better here. Can't guarantee
that a DOM's implementation of the 'Node' interface is available as a
class called 'Node' inside its module.)
L.append(n.data)
else:
if not recursive:
return None


Surely 'continue'? This will exit the function (returning None instead
of the expected empty string) the first time a non-Text node is met.

Incidentally, DOM Level 3 Core defines the property 'textContent' to
return pretty much exactly this (although it removes the ignorable
whitespace). Not in minidom yet, but... <insert usual plug here>

--
Andrew Clover
mailto:an*@doxdesk.com
http://www.doxdesk.com/
Jul 18 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by Stefan Franke | last post: by
4 posts views Thread by Rithish | last post: by
6 posts views Thread by Luke Dalessandro | last post: by
7 posts views Thread by David | last post: by
2 posts views Thread by yawnmoth | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.