473,378 Members | 1,544 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Parsing complex xml file with C#

I have a complex xml file, which contains stories within a magazine. The
structure of the xml file is as follows:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<magazine>
<story>
<story_id>112233</story_id>
<pub_name>Puleen's Publication</pub_name>
<pub_code>PP</pub_code>
<edition_date>20031201</edition_date>
<edition_name></edition_name>
<section_name></section_name>
<page_id></page_id>
<headline>My Story Headline</headline>
<subhead>Sub head</subhead>
<byline>Puleen</byline>
<source></source>
<dateline></dateline>
<storytype></storytype>
<column>Search</column>
<company_list></company_list>
<keyword_list></keyword_list>
<text><p>In other news....</p><p>second paragraph</p></text>
<photo>
<caption></caption>
<photo_filename>197943-96068.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96069.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96067.jpg</photo_filename>
<photocredit></photocredit>
</photo>
</story>
</magazine>

So there could be multiple <story>'s for each magazine. Now in the backend,
the data gets stored into an Oracle database. However, the data for the
photo's are stored in a separate table from the actual story. What's the
best way to approach the parsing of the story contents, and building a query
out of it, and then parsing the photo contents and building a query out of
that.

Any ideas are welcome. I've been trying to parse the xml file, however I
cannot think of a quick way of doing this. So I wonder maybe someone out
there, can guide me in the right direction and/or suggest a quick solution.
Nov 15 '05 #1
3 3470
Serialize and Deserialize for persistant storage.

Then play with the object :D Much easier

"Pir8" <pi**@mscorlib.com> wrote in message
news:OM**************@TK2MSFTNGP11.phx.gbl...
I have a complex xml file, which contains stories within a magazine. The
structure of the xml file is as follows:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<magazine>
<story>
<story_id>112233</story_id>
<pub_name>Puleen's Publication</pub_name>
<pub_code>PP</pub_code>
<edition_date>20031201</edition_date>
<edition_name></edition_name>
<section_name></section_name>
<page_id></page_id>
<headline>My Story Headline</headline>
<subhead>Sub head</subhead>
<byline>Puleen</byline>
<source></source>
<dateline></dateline>
<storytype></storytype>
<column>Search</column>
<company_list></company_list>
<keyword_list></keyword_list>
<text><p>In other news....</p><p>second paragraph</p></text>
<photo>
<caption></caption>
<photo_filename>197943-96068.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96069.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96067.jpg</photo_filename>
<photocredit></photocredit>
</photo>
</story>
</magazine>

So there could be multiple <story>'s for each magazine. Now in the backend, the data gets stored into an Oracle database. However, the data for the
photo's are stored in a separate table from the actual story. What's the
best way to approach the parsing of the story contents, and building a query out of it, and then parsing the photo contents and building a query out of
that.

Any ideas are welcome. I've been trying to parse the xml file, however I
cannot think of a quick way of doing this. So I wonder maybe someone out
there, can guide me in the right direction and/or suggest a quick solution.

Nov 15 '05 #2

"Pir8" <pi**@mscorlib.com> wrote in message
news:%2****************@TK2MSFTNGP10.phx.gbl...
The main problem that I am concerned with is that within the <text> there
might be and will be html tags i.e. <p><strong><a href=""> and so on. I do
realize that I could use the node's innerxml property to retrieve this
but will there be any other complications in the future?
It *should* work, but it depends on your html. I wouldn't want to throw
non-xhtml html at an xml parser, its just not particularly safe. You might
want to consider wrapping the body of text in a CDATA section. The xml format that I pasted is pretty much the same...There are some other tags that I did not include, which
also will go into a separate table of its own into oracle. I asked about the format because, personally, I would have used an id
attribute instead of a <story_id> element.
My main concern is that, when parsing the <story>, <photo> separately, I
need to associate the <story_id>
along with the data from the <photo> section, so as to enter it into the
database to keep the appropriate
relationships for the application that will be using this data.

I will read more about XPath and how it can be helpful. I appreciate your
suggestions. Well, as a very base concept I would probably query the xml document with
XPathNavigator using the xpath query /magazine/story, use the resultant
XPathNodeIterator to grab each story and use subseqent queries to pull out
the various pieces out.
"Daniel O'Connell" <onyxkirx@--NOSPAM--comcast.net> wrote in message
news:Od**************@TK2MSFTNGP10.phx.gbl...
If I may ask, what kind of problems are you having? Serialization is
probably not your only answer(it could have flexibility issues). My
immediate idea would be to use xpath. At that, is this xml format set in
stone?
"Pir8" <pi**@mscorlib.com> wrote in message
news:OM**************@TK2MSFTNGP11.phx.gbl...
I have a complex xml file, which contains stories within a magazine. The structure of the xml file is as follows:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<magazine>
<story>
<story_id>112233</story_id>
<pub_name>Puleen's Publication</pub_name>
<pub_code>PP</pub_code>
<edition_date>20031201</edition_date>
<edition_name></edition_name>
<section_name></section_name>
<page_id></page_id>
<headline>My Story Headline</headline>
<subhead>Sub head</subhead>
<byline>Puleen</byline>
<source></source>
<dateline></dateline>
<storytype></storytype>
<column>Search</column>
<company_list></company_list>
<keyword_list></keyword_list>
<text><p>In other news....</p><p>second paragraph</p></text>
<photo>
<caption></caption>
<photo_filename>197943-96068.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96069.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96067.jpg</photo_filename>
<photocredit></photocredit>
</photo>
</story>
</magazine>

So there could be multiple <story>'s for each magazine. Now in the backend,
the data gets stored into an Oracle database. However, the data for the photo's are stored in a separate table from the actual story. What's the best way to approach the parsing of the story contents, and building a

query
out of it, and then parsing the photo contents and building a query out of
that.

Any ideas are welcome. I've been trying to parse the xml file, however

I cannot think of a quick way of doing this. So I wonder maybe someone out there, can guide me in the right direction and/or suggest a quick

solution.



Nov 15 '05 #3
assuming that the file is valid XML (even with the embedded HTML), you can
easily extract components of the structure using XPath queries, and even
iterate over the structure, pulling out each photo and each item.

The query for story_id is literally: /magazine/story/story_id

--- Nick
"Pir8" <pi**@mscorlib.com> wrote in message
news:OM**************@TK2MSFTNGP11.phx.gbl...
I have a complex xml file, which contains stories within a magazine. The
structure of the xml file is as follows:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<magazine>
<story>
<story_id>112233</story_id>
<pub_name>Puleen's Publication</pub_name>
<pub_code>PP</pub_code>
<edition_date>20031201</edition_date>
<edition_name></edition_name>
<section_name></section_name>
<page_id></page_id>
<headline>My Story Headline</headline>
<subhead>Sub head</subhead>
<byline>Puleen</byline>
<source></source>
<dateline></dateline>
<storytype></storytype>
<column>Search</column>
<company_list></company_list>
<keyword_list></keyword_list>
<text><p>In other news....</p><p>second paragraph</p></text>
<photo>
<caption></caption>
<photo_filename>197943-96068.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96069.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96067.jpg</photo_filename>
<photocredit></photocredit>
</photo>
</story>
</magazine>

So there could be multiple <story>'s for each magazine. Now in the backend, the data gets stored into an Oracle database. However, the data for the
photo's are stored in a separate table from the actual story. What's the
best way to approach the parsing of the story contents, and building a query out of it, and then parsing the photo contents and building a query out of
that.

Any ideas are welcome. I've been trying to parse the xml file, however I
cannot think of a quick way of doing this. So I wonder maybe someone out
there, can guide me in the right direction and/or suggest a quick solution.

Nov 15 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Jean de Largentaye | last post by:
Hi, I need to parse a subset of C (a header file), and generate some unit tests for the functions listed in it. I thus need to parse the code, then rewrite function calls with wrong parameters....
19
by: ARK | last post by:
I am writing a search program in ASP(VBScript). The user can enter keywords and press submit. The user can separate the keywords by spaces and/or commas and key words may contain plain words,...
11
by: Sven Neuberg | last post by:
Hi, I have been handed the task of updating and maintaining a web application, written in ASP and Javascript, that takes complex user inputs in HTML form and submits them to server-side ASP...
1
by: mriedel | last post by:
I'm using the InstallContext class to parse the command-line arguments of a console application. The arguments are in the form of "-file=myFile.txt -flag", and the InstallContext object gives me what...
1
by: Robert Neville | last post by:
Basically, I want to create a table in html, xml, or xslt; with any number of regular expressions; a script (Perl or Python) which reads each table row (regex and replacement); and performs the...
3
by: davebaty | last post by:
I'm relatively new to VB programming (VB 2005), and have come across a problem parsing complex text files. Basically I have a file which has lines something like the following: max_gross_weight...
2
by: nedelm | last post by:
My problem's with parsing. I have this (arbitrary, from a file) string, lets say: "Directory: /file{File:/filename(/size) }" I would like it to behave similar to LaTeX. I parse it, and then I...
1
by: chixor1 | last post by:
I have been charged with Parsing the data from many Abstract files, and then inputing this information into a SQL Database. The file format is rather unusual and certainly not delimited in any...
1
by: padmagvs | last post by:
I am working on some code which parses wsdl . I have a complex wsdl which is failing to parse . I have to modify this wsdl for parsing . wanted to know the complex wsdl i am using is as per...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.