472,143 Members | 1,603 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,143 software developers and data experts.

reading xml data

56
I have an Xml w/c look like this:
Expand|Select|Wrap|Line Numbers
  1. <xml>
  2.   <process name="proc1">
  3.        <mkdir>directory</mkdir>
  4.          <copyfile>src,dst</copyfile>
  5.   </process>
  6.  
  7.   <process name="proc2">
  8.        <copyfile>src,dst</copyfile>
  9.   </process>
  10. </xml>
then my problem is how can I get the child nodes of process w/c are "proc1" and "proc2"?
then i also need to get the value of these child nodes(example "src,dst" for <copyfile>)..
im using xml.dom.minidom module

Im hoping for your response guys...
Dec 4 '07 #1
8 1486
heiro
56
anyone knows???pls help...
Dec 5 '07 #2
bvdet
2,851 Expert Mod 2GB
You will need to create a parser, something like this:
Expand|Select|Wrap|Line Numbers
  1. from xml.dom.minidom import parse
  2.  
  3. fn = 'sample.xml'
  4.  
  5. dom1 = parse(fn)
  6. # global variable required by handleData
  7. nameList = ["proc1", "proc2"]
  8.  
  9. def getText(nodelist):
  10.     rc = ""
  11.     for node in nodelist:
  12.         if node.nodeType == node.TEXT_NODE:
  13.             rc = rc + node.data
  14.     return rc
  15.  
  16. def handleData(nodelist, *args):
  17.     resList = []
  18.     for node in nodelist:
  19.         if str(node.attributes["name"].value) in nameList:
  20.             for arg in args:
  21.                 resList.append(node.getElementsByTagName(arg))
  22.     return [item[0] for item in resList if item]
  23.  
  24. for item in dom1.getElementsByTagName("copyfile"):
  25.     print getText(item.childNodes)
  26.  
  27. process_elements = dom1.getElementsByTagName('process')
  28. print process_elements
  29.  
  30. print handleData(process_elements, "mkdir", "copyfile")
  31.  
  32. for item in handleData(process_elements, "mkdir", "copyfile"):
  33.     print getText(item.childNodes)
Contents of sample.xml:
<xml>
<process name="proc1">
<mkdir>directory</mkdir>
<copyfile>src,dst</copyfile>
</process>

<process name="proc2">
<copyfile>src,dst</copyfile>
</process>
<process name="proc3">
<mkdir>directory</mkdir>
<copyfile>src,dst</copyfile>
</process>
<process name="proc4">
<mkdir>directory</mkdir>
<copyfile>src,dst</copyfile>
</process>
</xml>
Output from above code:
>>> src,dst
src,dst
src,dst
src,dst
[<DOM Element: process at 0xed2670>, <DOM Element: process at 0xed2f58>, <DOM Element: process at 0xedb4b8>, <DOM Element: process at 0xedb788>]
[<DOM Element: mkdir at 0xed2e68>, <DOM Element: copyfile at 0xed2e90>, <DOM Element: copyfile at 0xedb0a8>]
directory
src,dst
src,dst
>>>
Dec 5 '07 #3
heiro
56
You will need to create a parser, something like this:
Expand|Select|Wrap|Line Numbers
  1. from xml.dom.minidom import parse
  2.  
  3. fn = 'sample.xml'
  4.  
  5. dom1 = parse(fn)
  6. # global variable required by handleData
  7. nameList = ["proc1", "proc2"]
  8.  
  9. def getText(nodelist):
  10.     rc = ""
  11.     for node in nodelist:
  12.         if node.nodeType == node.TEXT_NODE:
  13.             rc = rc + node.data
  14.     return rc
  15.  
  16. def handleData(nodelist, *args):
  17.     resList = []
  18.     for node in nodelist:
  19.         if str(node.attributes["name"].value) in nameList:
  20.             for arg in args:
  21.                 resList.append(node.getElementsByTagName(arg))
  22.     return [item[0] for item in resList if item]
  23.  
  24. for item in dom1.getElementsByTagName("copyfile"):
  25.     print getText(item.childNodes)
  26.  
  27. process_elements = dom1.getElementsByTagName('process')
  28. print process_elements
  29.  
  30. print handleData(process_elements, "mkdir", "copyfile")
  31.  
  32. for item in handleData(process_elements, "mkdir", "copyfile"):
  33.     print getText(item.childNodes)
Contents of sample.xml:
<xml>
<process name="proc1">
<mkdir>directory</mkdir>
<copyfile>src,dst</copyfile>
</process>

<process name="proc2">
<copyfile>src,dst</copyfile>
</process>
<process name="proc3">
<mkdir>directory</mkdir>
<copyfile>src,dst</copyfile>
</process>
<process name="proc4">
<mkdir>directory</mkdir>
<copyfile>src,dst</copyfile>
</process>
</xml>
Output from above code:
>>> src,dst
src,dst
src,dst
src,dst
[<DOM Element: process at 0xed2670>, <DOM Element: process at 0xed2f58>, <DOM Element: process at 0xedb4b8>, <DOM Element: process at 0xedb788>]
[<DOM Element: mkdir at 0xed2e68>, <DOM Element: copyfile at 0xed2e90>, <DOM Element: copyfile at 0xedb0a8>]
directory
src,dst
src,dst
>>>


thanks bvdet....i'll try this one..thanks
Dec 6 '07 #4
bvdet
2,851 Expert Mod 2GB
thanks bvdet....i'll try this one..thanks
You are welcome. I am learning about XML and DOM also.
Dec 6 '07 #5
heiro
56
You are welcome. I am learning about XML and DOM also.
I know this is too much :-).
I want to ask another favor..What if i want the output should look like this:

process name="proc1"
mkdir: directory
copyfile: src,dst


process name="proc2"
copyfile: src,dst


process name="proc3"
mkdir: directory
copyfile>src,dst

and how can i parse an xml childnode w/c look like this:
<download ='ftp' user='username' password='password'>

thanks in advance bvdet..Hope you can help me with this in a second time...
Dec 7 '07 #6
bvdet
2,851 Expert Mod 2GB
I know this is too much :-).
I want to ask another favor..What if i want the output should look like this:

process name="proc1"
mkdir: directory
copyfile: src,dst


process name="proc2"
copyfile: src,dst


process name="proc3"
mkdir: directory
copyfile>src,dst

and how can i parse an xml childnode w/c look like this:
<download ='ftp' user='username' password='password'>

thanks in advance bvdet..Hope you can help me with this in a second time...
Create a function to format the data:
Expand|Select|Wrap|Line Numbers
  1. from xml.dom.minidom import parse
  2.  
  3. # global variables required by formatData
  4. nameList = ["proc1", "proc2"]
  5. nodeIDlist = ['name',]
  6.  
  7. def formatData(nodelist, *args):
  8.     resList = []
  9.     for node in nodelist:
  10.         for id in nodeIDlist:
  11.             try:
  12.                 s = str(node.attributes[id].value)
  13.                 if s in nameList:
  14.                     resList.append('%s name=%s' % (repr(elem.parentNode).split(':')[1].split()[0], s))
  15.                     for arg in args:
  16.                         try:
  17.                             resList.append('  %s: %s' % (arg, getText(node.getElementsByTagName(arg)[0].childNodes)))
  18.                         except IndexError, e:
  19.                             # print 'Invalid element tag: %s' % arg
  20.                             pass
  21.             except KeyError, e:
  22.                 # print 'Invalid node atribute:', e
  23.                 pass
  24.     return '\n'.join(resList)
  25.  
  26. dom1 = parse('sample.xml')
Expand|Select|Wrap|Line Numbers
  1. >>> process_elements = dom1.getElementsByTagName('process')
  2. >>> process_elements
  3. [<DOM Element: process at 0xf8bb98>, <DOM Element: process at 0xf8b918>, <DOM Element: process at 0xf8b710>, <DOM Element: process at 0xf87490>]
  4. >>> print formatData(process_elements, "mkdir", "copyfile")
  5. process name=proc1
  6.   mkdir: directory1
  7.   copyfile: src1,dst1
  8. process name=proc2
  9.   copyfile: src2,dst2
  10. >>> 
The string <download ='ftp' user='username' password='password'> does not appear to be valid XML. Should not there be an attribute name to the left of the equal sign after 'download'?
Dec 7 '07 #7
heiro
56
Create a function to format the data:
Expand|Select|Wrap|Line Numbers
  1. from xml.dom.minidom import parse
  2.  
  3. # global variables required by formatData
  4. nameList = ["proc1", "proc2"]
  5. nodeIDlist = ['name',]
  6.  
  7. def formatData(nodelist, *args):
  8.     resList = []
  9.     for node in nodelist:
  10.         for id in nodeIDlist:
  11.             try:
  12.                 s = str(node.attributes[id].value)
  13.                 if s in nameList:
  14.                     resList.append('%s name=%s' % (repr(elem.parentNode).split(':')[1].split()[0], s))
  15.                     for arg in args:
  16.                         try:
  17.                             resList.append('  %s: %s' % (arg, getText(node.getElementsByTagName(arg)[0].childNodes)))
  18.                         except IndexError, e:
  19.                             # print 'Invalid element tag: %s' % arg
  20.                             pass
  21.             except KeyError, e:
  22.                 # print 'Invalid node atribute:', e
  23.                 pass
  24.     return '\n'.join(resList)
  25.  
  26. dom1 = parse('sample.xml')
Expand|Select|Wrap|Line Numbers
  1. >>> process_elements = dom1.getElementsByTagName('process')
  2. >>> process_elements
  3. [<DOM Element: process at 0xf8bb98>, <DOM Element: process at 0xf8b918>, <DOM Element: process at 0xf8b710>, <DOM Element: process at 0xf87490>]
  4. >>> print formatData(process_elements, "mkdir", "copyfile")
  5. process name=proc1
  6.   mkdir: directory1
  7.   copyfile: src1,dst1
  8. process name=proc2
  9.   copyfile: src2,dst2
  10. >>> 
The string <download ='ftp' user='username' password='password'> does not appear to be valid XML. Should not there be an attribute name to the left of the equal sign after 'download'?

it actually look like this..

<process name='download'>
<download server='ftp' user='username' password='******'>
<destination>path</destination>
<unzip>*.jpg, *.doc, *.pdf</unzip>
</download>
</process>

Actually I'm making a program right now and its output depends on the xml.
You help a me a lot bvdet..Thanks man
Dec 8 '07 #8
bvdet
2,851 Expert Mod 2GB
I have played around with XML parsing, and I made a new function. It is kind of ugly and does not work exactly the way I want, so maybe someone can improve it. Following is the complete code:
Expand|Select|Wrap|Line Numbers
  1. from xml.dom.minidom import parse
  2.  
  3. def getText(nodelist):
  4.     rc = []
  5.     for node in nodelist:
  6.         if node.nodeType == node.TEXT_NODE:
  7.             s = node.data.strip()
  8.             if s:
  9.                 rc.append(node.data)
  10.     return '\n'.join(rc)
  11.  
  12. def nodeName(node):
  13.     try: return repr(node).split(':')[1].split()[0]
  14.     except: return ''
  15.  
  16. def getDataList(nodelist, **kargs):
  17.     resList = []
  18.     for node in nodelist:
  19.         node_name = nodeName(node)
  20.         if node_name in kargs:
  21.             keys = kargs[node_name].keys()
  22.             for id in keys:
  23.                 try:
  24.                     s = str(node.attributes[id].value)
  25.                     v = kargs[node_name][id]
  26.                     if not v or s in kargs[node_name][id]:
  27.                         resList.append('%s %s=%s' % (node_name, id, s))
  28.  
  29.                         if node.nodeType == node.ELEMENT_NODE:
  30.                             nodes = node.childNodes
  31.                             name = node.nodeName
  32.                             print 'DOM element = %s' % name
  33.                             s = []
  34.                             for elem in nodes:
  35.                                 nm = nodeName(elem)
  36.                                 s.append('  %s%s' % (['', nm+': '][len(nm)>0 or 0],getText(elem.childNodes)))
  37.                             print '\n'.join([i for i in s if i.strip()])
  38.                         elif node.nodeType == node.TEXT_NODE:
  39.                             s = getText(node)
  40.                             print 'Text Node Text = %s' % s
  41.  
  42.                 except KeyError, e:
  43.                     print 'Invalid node atribute:', e
  44.                     pass
  45.     return resList
  46.  
  47. fn = r'H:\TEMP\temsys\sampleXML.txt'
  48.  
  49. dom1 = parse(fn)
  50.  
  51. process_elements = dom1.getElementsByTagName('process')
  52. download_elements = dom1.getElementsByTagName('download')
  53.  
  54. elemDict = {'process': {'name': ["proc1", "proc2"]}, 'download': {'server': ['ftp', ]}}
  55. x = getDataList(process_elements, **elemDict)
  56. y = getDataList(download_elements, **elemDict)
  57.  
  58. print
  59. print x
  60. print y
Output:
>>> DOM element = process
mkdir: directory1
mkdir: directory11
mkdir: directory111
copyfile: src1,dst1
DOM element = process
copyfile: src2,dst2
DOM element = download
destination: path
unzip: *.jpg, *.doc, *.pdf

['process name=proc1', 'process name=proc2']
['download server=ftp']
>>>
Dec 12 '07 #9

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

2 posts views Thread by Dariusz | last post: by
reply views Thread by Andy | last post: by
6 posts views Thread by KevinD | last post: by
6 posts views Thread by arne.muller | last post: by
10 posts views Thread by Tyler | last post: by
4 posts views Thread by Shark | last post: by
13 posts views Thread by swetha | last post: by
6 posts views Thread by efrenba | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.