472,127 Members | 1,465 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,127 software developers and data experts.

question about XML parsing

46
Hey Guys,

I am discovering the awesomeness that is XML.

I use an application called Final Cut Pro for editing video. The app is able to export is projects as XML.

I am trying to develop a script to read that script and build a list of the files that are listed in the XML. The projects' imported files are enclosed in the 'pathurl' tag in the XML file.

It is nearly working but I just wanted to see what you guys think of the manner I have approached it.

Currently it spits out a list, but the information is enclosed in the xml element tag - it would be great to get a list without the tags.
example (current output):
<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>

example of what I'd like:
/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif
Is there an 'XML parsing call' (sorry i am making up programming lingo as I go) to do this, or is it a matter of using a python tool like strip/split ?

Thanks for any advice!

Adam


Expand|Select|Wrap|Line Numbers
  1. import sys
  2. import os
  3. from xml.dom import minidom  
  4.  
  5. xmldocumentpath = str(sys.argv[1])
  6.  
  7. elementtofind = 'pathurl'
  8.  
  9.  
  10. xmldoc = minidom.parse(xmldocumentpath)
  11. pathlist = xmldoc.getElementsByTagName(elementtofind)
  12. pathlist
  13.  
  14. # All Nodes listed
  15. # AllNode = xmldoc.firstChild
  16.  
  17. itemamount = len (pathlist)
  18.  
  19. print itemamount
  20.  
  21. loop = 0
  22. while loop < itemamount:
  23.     print pathlist[loop].toxml()
  24.     loop = loop + 1
  25.  
Aug 12 '07 #1
7 1604
bartonc
6,596 Expert 4TB
Hi Adam.

I'm currently studying Regular Expressions in my spare (LOL) time. I believe that regex is such an important tool that is perfectly suited to this kind of task that it kills me not to be able to just crank one out for you.

In pure Python, you could use something like this:
Expand|Select|Wrap|Line Numbers
  1. >>> s = "<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>"
  2. >>> token = "pathurl>"
  3. >>> size = len(token)
  4. >>> start = s.find(token)
  5. >>> end = s.find(token, start + size)
  6. >>> s[start + size:end - 2]
  7. 'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'
  8. >>> 
Aug 12 '07 #2
ateale
46
Hey BartonC thanks a lot!
That is really really cool!

I'll play around with that for a while!

Cheers mate!



Hi Adam.

I'm currently studying Regular Expressions in my spare (LOL) time. I believe that regex is such an important tool that is perfectly suited to this kind of task that it kills me not to be able to just crank one out for you.

In pure Python, you could use something like this:
Expand|Select|Wrap|Line Numbers
  1. >>> s = "<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>"
  2. >>> token = "pathurl>"
  3. >>> size = len(token)
  4. >>> start = s.find(token)
  5. >>> end = s.find(token, start + size)
  6. >>> s[start + size:end - 2]
  7. 'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'
  8. >>> 
Aug 12 '07 #3
bartonc
6,596 Expert 4TB
Hey BartonC thanks a lot!
That is really really cool!

I'll play around with that for a while!

Cheers mate!
Actually, that was kind of dumb... If you know the size of the token AND that it exists, simply:
Expand|Select|Wrap|Line Numbers
  1. >>> token = "<pathurl>"
  2. >>> size = len(token)
  3. >>> s[size:-size - 1]
  4. 'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'
  5. >>> 
Aug 12 '07 #4
bvdet
2,851 Expert Mod 2GB
Hi Adam.

I'm currently studying Regular Expressions in my spare (LOL) time. I believe that regex is such an important tool that is perfectly suited to this kind of task that it kills me not to be able to just crank one out for you.

In pure Python, you could use something like this:
Expand|Select|Wrap|Line Numbers
  1. >>> s = "<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>"
  2. >>> token = "pathurl>"
  3. >>> size = len(token)
  4. >>> start = s.find(token)
  5. >>> end = s.find(token, start + size)
  6. >>> s[start + size:end - 2]
  7. 'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'
  8. >>> 
Barton, Adam - I am also trying to learn RE. What do you think of this?
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fn = r'H:\TEMP\temsys\re_parse_string.txt'
  4.  
  5. patt = re.compile(r'<pathurl>file://localhost(.+)</pathurl>')
  6.  
  7. f = open(fn)
  8. data = []
  9. for line in f:
  10.     m = patt.search(line)
  11.     if m:
  12.         data.append(m.group(1))
  13.  
  14. print data
Output:

>>> ['/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif']
Interaction:

>>> m.group(0)
'<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif</pathurl>'
>>> m.group(1)
'/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif'
>>>
Aug 12 '07 #5
bartonc
6,596 Expert 4TB
Barton, Adam - I am also trying to learn RE. What do you think of this?
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fn = r'H:\TEMP\temsys\re_parse_string.txt'
  4.  
  5. patt = re.compile(r'<pathurl>file://localhost(.+)</pathurl>')
  6.  
  7. f = open(fn)
  8. data = []
  9. for line in f:
  10.     m = patt.search(line)
  11.     if m:
  12.         data.append(m.group(1))
  13.  
  14. print data
Output:

>>> ['/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif']
Interaction:

>>> m.group(0)
'<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif</pathurl>'
>>> m.group(1)
'/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif'
>>>
That's the one that I was imagining! Thank you, very much.

It's interesting to note that (say) perl regex would not have created group zero, as there are no parentheses (which are the official "group" operators).
Aug 12 '07 #6
bvdet
2,851 Expert Mod 2GB
That's the one that I was imagining! Thank you, very much.

It's interesting to note that (say) perl regex would not have created group zero, as there are no parentheses (which are the official "group" operators).
I have been confused about the group() method all along. I learned this recently by experimenting (trial and error, a lot of error!).
Aug 12 '07 #7
bvdet
2,851 Expert Mod 2GB
Barton, Adam - I am also trying to learn RE. What do you think of this?
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fn = r'H:\TEMP\temsys\re_parse_string.txt'
  4.  
  5. patt = re.compile(r'<pathurl>file://localhost(.+)</pathurl>')
  6.  
  7. f = open(fn)
  8. data = []
  9. for line in f:
  10.     m = patt.search(line)
  11.     if m:
  12.         data.append(m.group(1))
  13.  
  14. print data
Output:

>>> ['/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif']
Interaction:

>>> m.group(0)
'<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif</pathurl>'
>>> m.group(1)
'/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif'
>>>
Just to show good practice, the open file object 'f' should be closed:
Expand|Select|Wrap|Line Numbers
  1. f.close()
Aug 12 '07 #8

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

2 posts views Thread by asdfkajsdflkjsadlfkjoewqifoeiwjf | last post: by
8 posts views Thread by Gerrit Holl | last post: by
reply views Thread by Joey Martin | last post: by
4 posts views Thread by annoyingmouse2002 | last post: by
16 posts views Thread by Terry | last post: by
4 posts views Thread by Joseph | last post: by
4 posts views Thread by bogus1one | last post: by
42 posts views Thread by mellyshum123 | last post: by
4 posts views Thread by charonzen | last post: by
2 posts views Thread by astroboiii | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.