-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I only know a little bit of xml and I'm trying to parse a xml document
in order to save its elements in a file (dictionaries inside a list).
When I access a url from python 2.3.3 running in Linux with the
following lines:
resposta = urllib.urlopen( url)
xmldoc = minidom.parse(r esposta)
resposta.close( )
I get the following result:
<?xml version="1.0" encoding="utf-8"?>
<string xmlns="http://www......">< DataSet>
~ <Order> ;
~ <Customer> ;439</Customer>
(... others ...)
~ </Order>
</DataSet></string>
_______________ _______________ _______________ _______________ _
In the lines below, I try to get all the child nodes from string, first
by counting them, and then ignoring the /n ones:
stringNode = xmldoc.childNod es[0]
print stringNode.toxm l()
dataSetNode = stringNode.chil dNodes[0]
numNos = len(dataSetNode .childNodes)
todosNos={}
for no in range(numNos):
todosNos[no] = dataSetNode.chi ldNodes[no].toxml()
posicaoXml = [no for no in todosNos.keys() if len(todosNos[no])>4]
print posicaoXml
(I'm almost sure there's a simpler way to do this...)
_______________ _______________ _______________ _______________ _
I don't get any elements. But, if I access the same url via a browser,
the result in the browser window is something like:
<string xmlns="http://www......">
~ <DataSet>
~ <Order>
~ <Customer>439 </Customer>
(... others ...)
~ </Order>
~ </DataSet>
</string>
and the lines I posted work as intended.
I already browsed the web, I know it's about the escape characters, but
I didn't find a simple solution for this.
I tried to use LL2XML.py and unescape function with a simple replace
text = text.replace("& lt;", "<")
but I had to convert the xml document to string and then I could not (or
don't know) how to convert it back to xml object.
How can I solve this? Please, explain it having in mind that I'm just
beggining with Xml and I'm not very experienced in Python, too.
Luis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFB7rzKHn4 UHCY8rB8RAhnlAK CYA6t0gd8rRDhIv Z5sdmNJlEPSeQCg teB3
XUtZ0JoHeTavBOC Yi6YYnNo=
=VORM
-----END PGP SIGNATURE-----