473,387 Members | 1,520 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Extracting data from xml file

Hi All,
I am new to XML, and trying to extract some data from a file.

The file looks like this:
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<TAPE>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>6.99</PRICE>
<YEAR>1985</YEAR>
<TAPE>
<CATALOG>

I am trying to get
Artist: Bob Dylan
Company: Columbia
CD Price: 10.90
Tape Price: 6.99
What is the best method to do this? Is there a tool or utility you can
recommend for Windows?

Mar 3 '07 #1
6 6299
What is the best method to do this?

Lots of tutorials exist on the web. My standard recommended starting
point: http://www.ibm.com/xml

(I'd probably hardcode it using DOM or SAX. But it might be easier for a
novice to write an XSLT stylesheet. There are other tools which might be
easier again, but they're less well standardized and I hesitate to
recommend that a novice invest in learning them.)
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Mar 3 '07 #2
On Mar 3, 7:57 pm, "Mag Gam" <magaw...@gmail.comwrote:
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<TAPE>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>6.99</PRICE>
<YEAR>1985</YEAR>
<TAPE>
<CATALOG>
This is not well-formed and therefore not XML. If that's
your real data, XML tools are quite unlikely to help you.

Assuming it's just another case of 'oh, for some reason I
just typed that in instead of using copy-paste'...
I am trying to get
Artist: Bob Dylan
Company: Columbia
CD Price: 10.90
Tape Price: 6.99
Another day, another grouping problem...
What is the best method to do this? Is there a tool or
utility you can recommend for Windows?
Define 'best'. Define 'utility'. I don't believe there's a
DWIM-type tool that would automagically, well, do what you
mean at a click of a button. Therefore, it's a programming
problem. You could use a DOM or SAX parser in your language
of choice, as Joseph proposed. Or you could use XSLT. Or
maybe XQuery or xmlgawk. In case it's XSLT/XQuery, I
believe there are many GUI tools that might make working
with the code easier for you; I'm not sure if there are any
good open source ones, though. If you'd be happy with
Unix-style small tools, there's a number of open source
XSLT processors, including Saxon (it's written in Java, so
it shouldn't be a problem running it on a Windows box),
xsltproc and xalan (if there are no native ports, Cygwin or
MinGW will probably save the day). In short, you should
determine what you want then google for it. Come back with
specific questions.

Here's a transformation that does more or less what you
want with your sample data (after it's been fixed, of
course):

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="id" match="CD|TAPE"
use="concat(TITLE,ARTIST,COMPANY)"/>
<xsl:key name="first" match="CD|TAPE"
use=
"
generate-id()=
generate-id
(
key('id',concat(TITLE,ARTIST,COMPANY))[1]
)
"/>
<xsl:output method="text"/>
<xsl:template match="@*|node()"/>
<xsl:template match="/">
<xsl:apply-templates select="key('first',true())"/>
</xsl:template>
<xsl:template match="CD|TAPE">
<xsl:text> </xsl:text>
<xsl:apply-templates/>
<xsl:apply-templates
select="key('id',concat(TITLE,ARTIST,COMPANY))"
mode="prices"/>
</xsl:template>
<xsl:template match="TITLE">
<xsl:text>Title: </xsl:text>
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:template>
<xsl:template match="ARTIST">
<xsl:text>Artist: </xsl:text>
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:template>
<xsl:template match="COMPANY">
<xsl:text>Company: </xsl:text>
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:template>
<xsl:template match="@*|node()" mode="prices"/>
<xsl:template match="CD|TAPE" mode="prices">
<xsl:apply-templates mode="prices"/>
</xsl:template>
<xsl:template match="CD/PRICE" mode="prices">
<xsl:text>CD Price: </xsl:text>
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:template>
<xsl:template match="TAPE/PRICE" mode="prices">
<xsl:text>Tape Price: </xsl:text>
<xsl:value-of select="."/>
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>

--
roy axenov

Mar 3 '07 #3
Mag Gam wrote:
Hi All,
I am new to XML, and trying to extract some data from a file.

The file looks like this:
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<TAPE>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>6.99</PRICE>
<YEAR>1985</YEAR>
<TAPE>
<CATALOG>
The last two last are not correct (closing tags should begin with /).
I am trying to get
Artist: Bob Dylan
Company: Columbia
CD Price: 10.90
Tape Price: 6.99
What is the best method to do this? Is there a tool or utility you can
recommend for Windows?
One of the many tools that can solve the problem is XMLgawk:

http://home.vrweb.de/~juergen.kahrs/gawk/XML/
The following script solves your problem.

@load xml
XMLCHARDATA { data = $0 }
XMLENDELEM == "ARTIST" && index(XMLPATH, "CD") { print "Artist:", data}
XMLENDELEM == "COMPANY" && index(XMLPATH, "CD") { print "Company:", data}
XMLENDELEM == "PRICE" && index(XMLPATH, "CD") { print "CD Price:", data}
XMLENDELEM == "PRICE" && index(XMLPATH, "TAPE") { print "Tape Price:", data}

Invoke the script like this and it will produce the
following output:

xgawk -f catalog.awk catalog.xml
Artist: Bob Dylan
Company: Columbia
CD Price: 10.90
Tape Price: 6.99

Mar 3 '07 #4
On Mar 3, 2:51 pm, Jürgen Kahrs <Juergen.KahrsDELETET...@vr-web.de>
wrote:
Mag Gam wrote:
Hi All,
I am new to XML, and trying to extract some data from a file.
The file looks like this:
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<TAPE>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>6.99</PRICE>
<YEAR>1985</YEAR>
<TAPE>
<CATALOG>

The last two last are not correct (closing tags should begin with /).
I am trying to get
Artist: Bob Dylan
Company: Columbia
CD Price: 10.90
Tape Price: 6.99
What is the best method to do this? Is there a tool or utility you can
recommend for Windows?

One of the many tools that can solve the problem is XMLgawk:

http://home.vrweb.de/~juergen.kahrs/gawk/XML/

The following script solves your problem.

@load xml
XMLCHARDATA { data = $0 }
XMLENDELEM == "ARTIST" && index(XMLPATH, "CD") { print "Artist:", data}
XMLENDELEM == "COMPANY" && index(XMLPATH, "CD") { print "Company:", data}
XMLENDELEM == "PRICE" && index(XMLPATH, "CD") { print "CD Price:", data}
XMLENDELEM == "PRICE" && index(XMLPATH, "TAPE") { print "Tape Price:", data}

Invoke the script like this and it will produce the
following output:

xgawk -f catalog.awk catalog.xml
Artist: Bob Dylan
Company: Columbia
CD Price: 10.90
Tape Price: 6.99

Thanks everyone!
I am very new to XML and trying to learn my ropes.

Roy:
I have yet to try your XSL solution. I will try it. The XML code was
not valid, I know. I used it for an example.
Lets assume this is my new .xml file: http://msdn2.microsoft.com/en-us/library/ms762271.aspx
(made some slight modifications, like added 2 authors)

<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<author>II Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
<book id="bk105">
<author>Corets, Eva</author>
<title>The Sundered Grail</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-09-10</publish_date>
<description>The two daughters of Maeve, half-sisters,
battle one another for control of England. Sequel to
Oberon's Legacy.</description>
</book>
<book id="bk106">
<author>Randall, Cynthia</author>
<title>Lover Birds</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-09-02</publish_date>
<description>When Carla meets Paul at an ornithology
conference, tempers fly as feathers get ruffled.</description>
</book>
<book id="bk107">
<author>Thurman, Paula</author>
<title>Splish Splash</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-11-02</publish_date>
<description>A deep sea diver finds true love twenty
thousand leagues beneath the sea.</description>
</book>
<book id="bk108">
<author>Knorr, Stefan</author>
<title>Creepy Crawlies</title>
<genre>Horror</genre>
<price>4.95</price>
<publish_date>2000-12-06</publish_date>
<description>An anthology of horror stories about roaches,
centipedes, scorpions and other insects.</description>
</book>
<book id="bk109">
<author>Kress, Peter</author>
<title>Paradox Lost</title>
<genre>Science Fiction</genre>
<price>6.95</price>
<publish_date>2000-11-02</publish_date>
<description>After an inadvertant trip through a Heisenberg
Uncertainty Device, James Salway discovers the problems
of being quantum.</description>
</book>
<book id="bk110">
<author>O'Brien, Tim</author>
<title>Microsoft .NET: The Programming Bible</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2000-12-09</publish_date>
<description>Microsoft's .NET initiative is explored in
detail in this deep programmer's reference.</description>
</book>
<book id="bk111">
<author>O'Brien, Tim</author>
<title>MSXML3: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2000-12-01</publish_date>
<description>The Microsoft MSXML3 parser is covered in
detail, with attention to XML DOM interfaces, XSLT processing,
SAX and more.</description>
</book>
<book id="bk112">
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>49.95</price>
<publish_date>2001-04-16</publish_date>
<description>Microsoft Visual Studio 7 is explored in depth,
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.</description>
</book>
</catalog>

How would I get 'Book Title' and 'Book Author' ?

TIA

Mar 4 '07 #5
git
On Sat, 03 Mar 2007 09:57:38 -0800, Mag Gam wrote:
Hi All,
I am new to XML, and trying to extract some data from a file.

The file looks like this:
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<TAPE>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>6.99</PRICE>
<YEAR>1985</YEAR>
<TAPE>
<CATALOG>

I am trying to get
Artist: Bob Dylan
Company: Columbia
CD Price: 10.90
Tape Price: 6.99
What is the best method to do this? Is there a tool or utility you can
recommend for Windows?
On windows, for someone who just wants to get on with the job rather than
learn xslt or xpath, I would recommend coding it all in JScript (or
vbscript). Use use the MS XML parse that comes with windows and walk over
the DOM to find the data you want.

I am working on examples of this technique on my blog/site:

http://nerds-central.blogspot.com/20...pt-exsead.html

http://nerds-central.blogspot.com/20...atom-feed.html
(I promise that I will write the follow up to that second article real
soon! And I am working VBScript examples as well).

Feel free to join the Nerds-Central email group to ask more questions if
you like the method:
http://tech.groups.yahoo.com/group/nerds-central/

Cheers

AJ
--
Cubical Land:
www.cubicalland.com
Nerds-Central:
nerds-central.blogspot.com

Mar 4 '07 #6
Mag Gam wrote:
How would I get 'Book Title' and 'Book Author' ?
Use this XMLgawk script:

@load xml
XMLCHARDATA { data = $0 }
XMLENDELEM == "author" { author = data }
XMLENDELEM == "title" { title = data }
XMLENDELEM == "book" { print author, title}
And you will get the following output from the XML
data that you posted:

xgawk -f catalog2.awk catalog2.xml

II Gambardella, Matthew XML Developer's Guide
Ralls, Kim Midnight Rain
Corets, Eva Maeve Ascendant
Corets, Eva Oberon's Legacy
Corets, Eva The Sundered Grail
Randall, Cynthia Lover Birds
Thurman, Paula Splish Splash
Knorr, Stefan Creepy Crawlies
Kress, Peter Paradox Lost
O'Brien, Tim Microsoft .NET: The Programming Bible
O'Brien, Tim MSXML3: A Comprehensive Guide
Galos, Mike Visual Studio 7: A Comprehensive Guide
Mar 4 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Steve | last post by:
Hi, I have a very long string, someting like: DISPLAY=localhost:0.0,FORT_BUFFERED=true, F_ERROPT1=271\,271\,2\,1\,2\,2\,2\,2,G03BASIS=/opt/g03b05/g03/basis,...
2
by: Avi | last post by:
hi, Can anyone tell me what the problem is and how to solve it The following piece of code resides on an asp page on the server and is used to download files from the server to the machine...
0
by: Nadav | last post by:
Hi, Introduction: *************************** I am using the MSI API to extract MSI embedded files, I do this by iterating through all of the records in the ‘_Streams’ table and dumping...
13
by: Randy | last post by:
Is there any way to do this? I've tried tellg() followed by seekg(), inserting the stream buffer to an ostringstream (ala os << is.rdbuf()), read(), and having no luck. The problem is, all of...
0
by: Sunil Basu | last post by:
Hi, I have a interesting thing to know and discuss with you. I am extracting data from an Excel file in a Delphi DbGrid through SQL. I want to create a criteria on a specific cell value of the...
0
by: sgsiaokia | last post by:
I need help in extracting data from another source file using VBA. I have problems copying the extracted data and format into the required data format. And also, how do i delete the row that is not...
6
by: Amma | last post by:
Hello Every one , Pls help me to extracting number from a text file since I am new to perl programming . I have a file and need to extract the number after semicolon in that ...
6
by: rlntemp-gng | last post by:
I need to extract information from some Excel files but am stuck with part of it: As an example, I have the following Excel File that has 3 tabbed sheets: FileName: ...
6
by: Werner | last post by:
Hi, I try to read (and extract) some "self extracting" zipefiles on a Windows system. The standard module zipefile seems not to be able to handle this. False Is there a wrapper or has...
4
by: poolboi | last post by:
hi guys i've having some problem extracting data from a text file example if i got a text file with infos like: Date 2008-05-01 Time 22-10 Date 2008-05-01 Time 21-00 Date 2008-05-02 Time...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.