By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,282 Members | 1,207 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,282 IT Pros & Developers. It's quick & easy.

question about XML parsing

P: 46
Hey Guys,

I am discovering the awesomeness that is XML.

I use an application called Final Cut Pro for editing video. The app is able to export is projects as XML.

I am trying to develop a script to read that script and build a list of the files that are listed in the XML. The projects' imported files are enclosed in the 'pathurl' tag in the XML file.

It is nearly working but I just wanted to see what you guys think of the manner I have approached it.

Currently it spits out a list, but the information is enclosed in the xml element tag - it would be great to get a list without the tags.
example (current output):
<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>

example of what I'd like:
/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif
Is there an 'XML parsing call' (sorry i am making up programming lingo as I go) to do this, or is it a matter of using a python tool like strip/split ?

Thanks for any advice!

Adam


Expand|Select|Wrap|Line Numbers
  1. import sys
  2. import os
  3. from xml.dom import minidom  
  4.  
  5. xmldocumentpath = str(sys.argv[1])
  6.  
  7. elementtofind = 'pathurl'
  8.  
  9.  
  10. xmldoc = minidom.parse(xmldocumentpath)
  11. pathlist = xmldoc.getElementsByTagName(elementtofind)
  12. pathlist
  13.  
  14. # All Nodes listed
  15. # AllNode = xmldoc.firstChild
  16.  
  17. itemamount = len (pathlist)
  18.  
  19. print itemamount
  20.  
  21. loop = 0
  22. while loop < itemamount:
  23.     print pathlist[loop].toxml()
  24.     loop = loop + 1
  25.  
Aug 12 '07 #1
Share this Question
Share on Google+
7 Replies

bartonc
Expert 5K+
P: 6,596
Hi Adam.

I'm currently studying Regular Expressions in my spare (LOL) time. I believe that regex is such an important tool that is perfectly suited to this kind of task that it kills me not to be able to just crank one out for you.

In pure Python, you could use something like this:
Expand|Select|Wrap|Line Numbers
  1. >>> s = "<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>"
  2. >>> token = "pathurl>"
  3. >>> size = len(token)
  4. >>> start = s.find(token)
  5. >>> end = s.find(token, start + size)
  6. >>> s[start + size:end - 2]
  7. 'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'
  8. >>> 
Aug 12 '07 #2

P: 46
Hey BartonC thanks a lot!
That is really really cool!

I'll play around with that for a while!

Cheers mate!



Hi Adam.

I'm currently studying Regular Expressions in my spare (LOL) time. I believe that regex is such an important tool that is perfectly suited to this kind of task that it kills me not to be able to just crank one out for you.

In pure Python, you could use something like this:
Expand|Select|Wrap|Line Numbers
  1. >>> s = "<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>"
  2. >>> token = "pathurl>"
  3. >>> size = len(token)
  4. >>> start = s.find(token)
  5. >>> end = s.find(token, start + size)
  6. >>> s[start + size:end - 2]
  7. 'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'
  8. >>> 
Aug 12 '07 #3

bartonc
Expert 5K+
P: 6,596
Hey BartonC thanks a lot!
That is really really cool!

I'll play around with that for a while!

Cheers mate!
Actually, that was kind of dumb... If you know the size of the token AND that it exists, simply:
Expand|Select|Wrap|Line Numbers
  1. >>> token = "<pathurl>"
  2. >>> size = len(token)
  3. >>> s[size:-size - 1]
  4. 'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'
  5. >>> 
Aug 12 '07 #4

bvdet
Expert Mod 2.5K+
P: 2,851
Hi Adam.

I'm currently studying Regular Expressions in my spare (LOL) time. I believe that regex is such an important tool that is perfectly suited to this kind of task that it kills me not to be able to just crank one out for you.

In pure Python, you could use something like this:
Expand|Select|Wrap|Line Numbers
  1. >>> s = "<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>"
  2. >>> token = "pathurl>"
  3. >>> size = len(token)
  4. >>> start = s.find(token)
  5. >>> end = s.find(token, start + size)
  6. >>> s[start + size:end - 2]
  7. 'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'
  8. >>> 
Barton, Adam - I am also trying to learn RE. What do you think of this?
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fn = r'H:\TEMP\temsys\re_parse_string.txt'
  4.  
  5. patt = re.compile(r'<pathurl>file://localhost(.+)</pathurl>')
  6.  
  7. f = open(fn)
  8. data = []
  9. for line in f:
  10.     m = patt.search(line)
  11.     if m:
  12.         data.append(m.group(1))
  13.  
  14. print data
Output:

>>> ['/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif']
Interaction:

>>> m.group(0)
'<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif</pathurl>'
>>> m.group(1)
'/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif'
>>>
Aug 12 '07 #5

bartonc
Expert 5K+
P: 6,596
Barton, Adam - I am also trying to learn RE. What do you think of this?
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fn = r'H:\TEMP\temsys\re_parse_string.txt'
  4.  
  5. patt = re.compile(r'<pathurl>file://localhost(.+)</pathurl>')
  6.  
  7. f = open(fn)
  8. data = []
  9. for line in f:
  10.     m = patt.search(line)
  11.     if m:
  12.         data.append(m.group(1))
  13.  
  14. print data
Output:

>>> ['/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif']
Interaction:

>>> m.group(0)
'<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif</pathurl>'
>>> m.group(1)
'/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif'
>>>
That's the one that I was imagining! Thank you, very much.

It's interesting to note that (say) perl regex would not have created group zero, as there are no parentheses (which are the official "group" operators).
Aug 12 '07 #6

bvdet
Expert Mod 2.5K+
P: 2,851
That's the one that I was imagining! Thank you, very much.

It's interesting to note that (say) perl regex would not have created group zero, as there are no parentheses (which are the official "group" operators).
I have been confused about the group() method all along. I learned this recently by experimenting (trial and error, a lot of error!).
Aug 12 '07 #7

bvdet
Expert Mod 2.5K+
P: 2,851
Barton, Adam - I am also trying to learn RE. What do you think of this?
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fn = r'H:\TEMP\temsys\re_parse_string.txt'
  4.  
  5. patt = re.compile(r'<pathurl>file://localhost(.+)</pathurl>')
  6.  
  7. f = open(fn)
  8. data = []
  9. for line in f:
  10.     m = patt.search(line)
  11.     if m:
  12.         data.append(m.group(1))
  13.  
  14. print data
Output:

>>> ['/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif']
Interaction:

>>> m.group(0)
'<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif</pathurl>'
>>> m.group(1)
'/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif'
>>>
Just to show good practice, the open file object 'f' should be closed:
Expand|Select|Wrap|Line Numbers
  1. f.close()
Aug 12 '07 #8

Post your reply

Sign in to post your reply or Sign up for a free account.