473,499 Members | 1,659 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

question about XML parsing

46 New Member
Hey Guys,

I am discovering the awesomeness that is XML.

I use an application called Final Cut Pro for editing video. The app is able to export is projects as XML.

I am trying to develop a script to read that script and build a list of the files that are listed in the XML. The projects' imported files are enclosed in the 'pathurl' tag in the XML file.

It is nearly working but I just wanted to see what you guys think of the manner I have approached it.

Currently it spits out a list, but the information is enclosed in the xml element tag - it would be great to get a list without the tags.
example (current output):
<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>

example of what I'd like:
/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif
Is there an 'XML parsing call' (sorry i am making up programming lingo as I go) to do this, or is it a matter of using a python tool like strip/split ?

Thanks for any advice!

Adam


Expand|Select|Wrap|Line Numbers
  1. import sys
  2. import os
  3. from xml.dom import minidom  
  4.  
  5. xmldocumentpath = str(sys.argv[1])
  6.  
  7. elementtofind = 'pathurl'
  8.  
  9.  
  10. xmldoc = minidom.parse(xmldocumentpath)
  11. pathlist = xmldoc.getElementsByTagName(elementtofind)
  12. pathlist
  13.  
  14. # All Nodes listed
  15. # AllNode = xmldoc.firstChild
  16.  
  17. itemamount = len (pathlist)
  18.  
  19. print itemamount
  20.  
  21. loop = 0
  22. while loop < itemamount:
  23.     print pathlist[loop].toxml()
  24.     loop = loop + 1
  25.  
Aug 12 '07 #1
7 1700
bartonc
6,596 Recognized Expert Expert
Hi Adam.

I'm currently studying Regular Expressions in my spare (LOL) time. I believe that regex is such an important tool that is perfectly suited to this kind of task that it kills me not to be able to just crank one out for you.

In pure Python, you could use something like this:
Expand|Select|Wrap|Line Numbers
  1. >>> s = "<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>"
  2. >>> token = "pathurl>"
  3. >>> size = len(token)
  4. >>> start = s.find(token)
  5. >>> end = s.find(token, start + size)
  6. >>> s[start + size:end - 2]
  7. 'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'
  8. >>> 
Aug 12 '07 #2
ateale
46 New Member
Hey BartonC thanks a lot!
That is really really cool!

I'll play around with that for a while!

Cheers mate!



Hi Adam.

I'm currently studying Regular Expressions in my spare (LOL) time. I believe that regex is such an important tool that is perfectly suited to this kind of task that it kills me not to be able to just crank one out for you.

In pure Python, you could use something like this:
Expand|Select|Wrap|Line Numbers
  1. >>> s = "<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>"
  2. >>> token = "pathurl>"
  3. >>> size = len(token)
  4. >>> start = s.find(token)
  5. >>> end = s.find(token, start + size)
  6. >>> s[start + size:end - 2]
  7. 'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'
  8. >>> 
Aug 12 '07 #3
bartonc
6,596 Recognized Expert Expert
Hey BartonC thanks a lot!
That is really really cool!

I'll play around with that for a while!

Cheers mate!
Actually, that was kind of dumb... If you know the size of the token AND that it exists, simply:
Expand|Select|Wrap|Line Numbers
  1. >>> token = "<pathurl>"
  2. >>> size = len(token)
  3. >>> s[size:-size - 1]
  4. 'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'
  5. >>> 
Aug 12 '07 #4
bvdet
2,851 Recognized Expert Moderator Specialist
Hi Adam.

I'm currently studying Regular Expressions in my spare (LOL) time. I believe that regex is such an important tool that is perfectly suited to this kind of task that it kills me not to be able to just crank one out for you.

In pure Python, you could use something like this:
Expand|Select|Wrap|Line Numbers
  1. >>> s = "<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>"
  2. >>> token = "pathurl>"
  3. >>> size = len(token)
  4. >>> start = s.find(token)
  5. >>> end = s.find(token, start + size)
  6. >>> s[start + size:end - 2]
  7. 'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'
  8. >>> 
Barton, Adam - I am also trying to learn RE. What do you think of this?
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fn = r'H:\TEMP\temsys\re_parse_string.txt'
  4.  
  5. patt = re.compile(r'<pathurl>file://localhost(.+)</pathurl>')
  6.  
  7. f = open(fn)
  8. data = []
  9. for line in f:
  10.     m = patt.search(line)
  11.     if m:
  12.         data.append(m.group(1))
  13.  
  14. print data
Output:

>>> ['/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif']
Interaction:

>>> m.group(0)
'<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif</pathurl>'
>>> m.group(1)
'/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif'
>>>
Aug 12 '07 #5
bartonc
6,596 Recognized Expert Expert
Barton, Adam - I am also trying to learn RE. What do you think of this?
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fn = r'H:\TEMP\temsys\re_parse_string.txt'
  4.  
  5. patt = re.compile(r'<pathurl>file://localhost(.+)</pathurl>')
  6.  
  7. f = open(fn)
  8. data = []
  9. for line in f:
  10.     m = patt.search(line)
  11.     if m:
  12.         data.append(m.group(1))
  13.  
  14. print data
Output:

>>> ['/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif']
Interaction:

>>> m.group(0)
'<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif</pathurl>'
>>> m.group(1)
'/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif'
>>>
That's the one that I was imagining! Thank you, very much.

It's interesting to note that (say) perl regex would not have created group zero, as there are no parentheses (which are the official "group" operators).
Aug 12 '07 #6
bvdet
2,851 Recognized Expert Moderator Specialist
That's the one that I was imagining! Thank you, very much.

It's interesting to note that (say) perl regex would not have created group zero, as there are no parentheses (which are the official "group" operators).
I have been confused about the group() method all along. I learned this recently by experimenting (trial and error, a lot of error!).
Aug 12 '07 #7
bvdet
2,851 Recognized Expert Moderator Specialist
Barton, Adam - I am also trying to learn RE. What do you think of this?
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fn = r'H:\TEMP\temsys\re_parse_string.txt'
  4.  
  5. patt = re.compile(r'<pathurl>file://localhost(.+)</pathurl>')
  6.  
  7. f = open(fn)
  8. data = []
  9. for line in f:
  10.     m = patt.search(line)
  11.     if m:
  12.         data.append(m.group(1))
  13.  
  14. print data
Output:

>>> ['/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif']
Interaction:

>>> m.group(0)
'<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif</pathurl>'
>>> m.group(1)
'/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_08AUG07/research1.tif'
>>>
Just to show good practice, the open file object 'f' should be closed:
Expand|Select|Wrap|Line Numbers
  1. f.close()
Aug 12 '07 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

2
1731
by: asdfkajsdflkjsadlfkjoewqifoeiwjf | last post by:
Hi Im using php to parse an xml file of below format. I have no problem extracting the various values within the tags (lib, id, url, file etc), but cant get hold of whats in the tag (max_page,...
8
9424
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $...
0
1332
by: Joey Martin | last post by:
Couple questions when parsing using replace. I have the following text I am parsing: $650 Number of Bedrooms 3 Air Conditioning? Yes Original Ad SOUTH, 3BR, air, basement. $650. Call 278-4171....
4
4244
by: annoyingmouse2002 | last post by:
Hi there, sorry if this a long post but I'm really just starting out. I've been using MSXML to parse an OWL but would like to use a different solution. Basically it reads the OWL (Based on XML)...
16
2849
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...
4
1500
by: Joseph | last post by:
Hi all- I am a former VB6 programmer and new at C# and I have a question dealing with converting some code from VB6 to C#. The code is below and essentially, what it does is gets data from a SQL...
4
1143
by: bogus1one | last post by:
Let's say I have the following: #include <iostream> using namespace std; class B { };
42
6734
by: mellyshum123 | last post by:
I need to read in a comma separated file, and for this I was going to use fgets. I was reading about it at http://www.cplusplus.com/ref/ and I noticed that the document said: "Reads characters...
4
1481
by: charonzen | last post by:
I have a list of strings. These strings are previously selected bigrams with underscores between them ('and_the', 'nothing_given', and so on). I need to write a regex that will read another text...
2
2490
by: astroboiii | last post by:
New to the whole xml thing and finding w3schools to be an excellent resource. Now down to my question: I have several xml files I need to parse through and grab relevant information from and...
0
7134
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7180
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7229
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6905
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7395
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
4609
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3108
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
1
667
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
311
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.