question about XML parsing

46 New Member

Hey Guys,

I am discovering the awesomeness that is XML.

I use an application called Final Cut Pro for editing video. The app is able to export is projects as XML.

I am trying to develop a script to read that script and build a list of the files that are listed in the XML. The projects' imported files are enclosed in the 'pathurl' tag in the XML file.

It is nearly working but I just wanted to see what you guys think of the manner I have approached it.

Currently it spits out a list, but the information is enclosed in the xml element tag - it would be great to get a list without the tags.

example (current output):
<pathurl>file ://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediam anaged_07AUG07/research.tif</pathurl>

example of what I'd like:
/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediam anaged_07AUG07/research.tif

Is there an 'XML parsing call' (sorry i am making up programming lingo as I go) to do this, or is it a matter of using a python tool like strip/split ?

Thanks for any advice!

Adam

Expand|Select|Wrap|Line Numbers

 import sys

import os

from xml.dom import minidom  
 
xmldocumentpath = str(sys.argv[1])
 
elementtofind = 'pathurl'
 
xmldoc = minidom.parse(xmldocumentpath)

pathlist = xmldoc.getElementsByTagName(elementtofind)

pathlist
 
# All Nodes listed

# AllNode = xmldoc.firstChild
 
itemamount = len (pathlist)
 
print itemamount
 
loop = 0

while loop < itemamount:

    print pathlist[loop].toxml()

    loop = loop + 1

Aug 12 '07 #1

Subscribe Reply

1719

bartonc

6,596

Recognized Expert Expert

Hi Adam.

I'm currently studying Regular Expressions in my spare (LOL) time. I believe that regex is such an important tool that is perfectly suited to this kind of task that it kills me not to be able to just crank one out for you.

In pure Python, you could use something like this:

Expand|Select|Wrap|Line Numbers

 
>>> s = "<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>"

>>> token = "pathurl>"

>>> size = len(token)

>>> start = s.find(token)

>>> end = s.find(token, start + size)

>>> s[start + size:end - 2]

'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'

>>>

Aug 12 '07 #2

ateale

New Member

Hey BartonC thanks a lot!
That is really really cool!

I'll play around with that for a while!

Cheers mate!

Hi Adam.

I'm currently studying Regular Expressions in my spare (LOL) time. I believe that regex is such an important tool that is perfectly suited to this kind of task that it kills me not to be able to just crank one out for you.

In pure Python, you could use something like this:

Expand|Select|Wrap|Line Numbers

>>> s = "<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>"

>>> token = "pathurl>"

>>> size = len(token)

>>> start = s.find(token)

>>> end = s.find(token, start + size)

>>> s[start + size:end - 2]

'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'

>>>

Aug 12 '07 #3

bartonc

6,596

Recognized Expert Expert

Hey BartonC thanks a lot!
That is really really cool!

I'll play around with that for a while!

Cheers mate!

Actually, that was kind of dumb... If you know the size of the token AND that it exists, simply:

Expand|Select|Wrap|Line Numbers

 
>>> token = "<pathurl>"

>>> size = len(token)

>>> s[size:-size - 1]

'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'

>>>

Aug 12 '07 #4

bvdet

2,851

Recognized Expert Moderator Specialist

Hi Adam.

I'm currently studying Regular Expressions in my spare (LOL) time. I believe that regex is such an important tool that is perfectly suited to this kind of task that it kills me not to be able to just crank one out for you.

In pure Python, you could use something like this:

Expand|Select|Wrap|Line Numbers

>>> s = "<pathurl>file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif</pathurl>"

>>> token = "pathurl>"

>>> size = len(token)

>>> start = s.find(token)

>>> end = s.find(token, start + size)

>>> s[start + size:end - 2]

'file://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediamanaged_07AUG07/research.tif'

>>>

Barton, Adam - I am also trying to learn RE. What do you think of this?

Expand|Select|Wrap|Line Numbers

 import re
 
fn = r'H:\TEMP\temsys\re_parse_string.txt'
 
patt = re.compile(r'<pathurl>file://localhost(.+)</pathurl>')
 
f = open(fn)

data = []

for line in f:

    m = patt.search(line)

    if m:

        data.append(m.group(1))
 
print data

Output:

>>> ['/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediam anaged_07AUG07/research.tif']

Interaction:

>>> m.group(0)
'<pathurl>file ://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediam anaged_08AUG07/research1.tif</pathurl>'
>>> m.group(1)
'/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediam anaged_08AUG07/research1.tif'
>>>

Aug 12 '07 #5

bartonc

6,596

Recognized Expert Expert

Barton, Adam - I am also trying to learn RE. What do you think of this?

Expand|Select|Wrap|Line Numbers

import re

fn = r'H:\TEMP\temsys\re_parse_string.txt'

patt = re.compile(r'<pathurl>file://localhost(.+)</pathurl>')

f = open(fn)

data = []

for line in f:

    m = patt.search(line)

    if m:

        data.append(m.group(1))

print data

Output:

>>> ['/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediam anaged_07AUG07/research.tif']

Interaction:

>>> m.group(0)
'<pathurl>file ://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediam anaged_08AUG07/research1.tif</pathurl>'
>>> m.group(1)
'/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediam anaged_08AUG07/research1.tif'
>>>

That's the one that I was imagining! Thank you, very much.

It's interesting to note that (say) perl regex would not have created group zero, as there are no parentheses (which are the official "group" operators).

Aug 12 '07 #6

bvdet

2,851

Recognized Expert Moderator Specialist

That's the one that I was imagining! Thank you, very much.

It's interesting to note that (say) perl regex would not have created group zero, as there are no parentheses (which are the official "group" operators).

I have been confused about the group() method all along. I learned this recently by experimenting (trial and error, a lot of error!).

Aug 12 '07 #7

bvdet

2,851

Recognized Expert Moderator Specialist

Barton, Adam - I am also trying to learn RE. What do you think of this?

Expand|Select|Wrap|Line Numbers

import re

fn = r'H:\TEMP\temsys\re_parse_string.txt'

patt = re.compile(r'<pathurl>file://localhost(.+)</pathurl>')

f = open(fn)

data = []

for line in f:

    m = patt.search(line)

    if m:

        data.append(m.group(1))

print data

Output:

>>> ['/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediam anaged_07AUG07/research.tif']

Interaction:

>>> m.group(0)
'<pathurl>file ://localhost/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediam anaged_08AUG07/research1.tif</pathurl>'
>>> m.group(1)
'/Volumes/HD1/FCP_Documents/Projects/Media/Petronas_mediam anaged_08AUG07/research1.tif'
>>>

Just to show good practice, the open file object 'f' should be closed:

Expand|Select|Wrap|Line Numbers

f.close()

Aug 12 '07 #8

Similar topics

1744

php and xml question

by: asdfkajsdflkjsadlfkjoewqifoeiwjf | last post by:

Hi Im using php to parse an xml file of below format. I have no problem extracting the various values within the tags (lib, id, url, file etc), but cant get hold of whats in the tag (max_page, page etc) nor whats whithin the , , tags ... Can you suggest a generic parser and how to grab these or any other tips on what Im missing here :/

PHP

9445

PEP 321: Date/Time Parsing and Formatting

by: Gerrit Holl | last post by:

Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $ Last-Modified: $Date: 2003/10/28 19:48:44 $ Author: A.M. Kuchling <amk@amk.ca> Status: Draft Type: Standards Track

Python

1353

parsing question (replace)

by: Joey Martin | last post by:

Couple questions when parsing using replace. I have the following text I am parsing: $650 Number of Bedrooms 3 Air Conditioning? Yes Original Ad SOUTH, 3BR, air, basement. $650. Call 278-4171. First Appeared in the Newspaper Thursday, October 30, 2003 $775

ASP / Active Server Pages

4262

javascript xml parser question.

by: annoyingmouse2002 | last post by:

Hi there, sorry if this a long post but I'm really just starting out. I've been using MSXML to parse an OWL but would like to use a different solution. Basically it reads the OWL (Based on XML) and puts values in a number of arrays and then puts the contents of the array in a HTML table. I'd like to keep the array structure. I've checked out all sorts of different javascript parsers but have not met with a great deal of success with any...

.NET Framework

2907

Help with a Simple Question

by: Terry | last post by:

Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed loaded into cache, the slideshow doesn't look very nice. I am not sure how/when to call the slideshow() function to make sure it starts after the preload has been completed.

Javascript

1510

VB recordset to ADO.Net question

by: Joseph | last post by:

Hi all- I am a former VB6 programmer and new at C# and I have a question dealing with converting some code from VB6 to C#. The code is below and essentially, what it does is gets data from a SQL Server database and parses some of the data and puts the parsed data into a text field. I omitted a some of the code and left the data parsing part of it, which is what my question is about. Exactly what the code below does is gets data from SQL...

C# / C Sharp

1154

Inheritance and Data Access Question

by: bogus1one | last post by:

Let's say I have the following: #include <iostream> using namespace std; class B { };

C / C++

6826

Question regarding fgets and new lines

by: mellyshum123 | last post by:

I need to read in a comma separated file, and for this I was going to use fgets. I was reading about it at http://www.cplusplus.com/ref/ and I noticed that the document said: "Reads characters from stream and stores them in string until (num -1) characters have been read or a newline or EOF character is reached, whichever comes first." My question is that if it stops at a new line character (LF?) then how does one read a file with...

C / C++

1492

python/regex question... hope someone can help

by: charonzen | last post by:

I have a list of strings. These strings are previously selected bigrams with underscores between them ('and_the', 'nothing_given', and so on). I need to write a regex that will read another text string that this list was derived from and replace selections in this text string with those from my list. So in my text string, '... and the... ' becomes ' ... and_the...'. I can't figure out how to manipulate re.sub(r'(*) (*)', r'(????)',...

Python

2502

ant + xslt transform question

by: astroboiii | last post by:

New to the whole xml thing and finding w3schools to be an excellent resource. Now down to my question: I have several xml files I need to parse through and grab relevant information from and produce a new xml file. This needs to be automated through ant. The ant script is working fine, and I am usign the <transform> function to use my xslt file and go through all the required xml files, parse them, style them, and ultimately generate my...

XML

9641

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9480

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10146

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

10080

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8968

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

6735

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5511

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4044

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2875

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General