By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,705 Members | 2,017 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,705 IT Pros & Developers. It's quick & easy.

Parsing complex xml file with C#

P: n/a
I have a complex xml file, which contains stories within a magazine. The
structure of the xml file is as follows:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<magazine>
<story>
<story_id>112233</story_id>
<pub_name>Puleen's Publication</pub_name>
<pub_code>PP</pub_code>
<edition_date>20031201</edition_date>
<edition_name></edition_name>
<section_name></section_name>
<page_id></page_id>
<headline>My Story Headline</headline>
<subhead>Sub head</subhead>
<byline>Puleen</byline>
<source></source>
<dateline></dateline>
<storytype></storytype>
<column>Search</column>
<company_list></company_list>
<keyword_list></keyword_list>
<text><p>In other news....</p><p>second paragraph</p></text>
<photo>
<caption></caption>
<photo_filename>197943-96068.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96069.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96067.jpg</photo_filename>
<photocredit></photocredit>
</photo>
</story>
</magazine>

So there could be multiple <story>'s for each magazine. Now in the backend,
the data gets stored into an Oracle database. However, the data for the
photo's are stored in a separate table from the actual story. What's the
best way to approach the parsing of the story contents, and building a query
out of it, and then parsing the photo contents and building a query out of
that.

Any ideas are welcome. I've been trying to parse the xml file, however I
cannot think of a quick way of doing this. So I wonder maybe someone out
there, can guide me in the right direction and/or suggest a quick solution.
Nov 15 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Serialize and Deserialize for persistant storage.

Then play with the object :D Much easier

"Pir8" <pi**@mscorlib.com> wrote in message
news:OM**************@TK2MSFTNGP11.phx.gbl...
I have a complex xml file, which contains stories within a magazine. The
structure of the xml file is as follows:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<magazine>
<story>
<story_id>112233</story_id>
<pub_name>Puleen's Publication</pub_name>
<pub_code>PP</pub_code>
<edition_date>20031201</edition_date>
<edition_name></edition_name>
<section_name></section_name>
<page_id></page_id>
<headline>My Story Headline</headline>
<subhead>Sub head</subhead>
<byline>Puleen</byline>
<source></source>
<dateline></dateline>
<storytype></storytype>
<column>Search</column>
<company_list></company_list>
<keyword_list></keyword_list>
<text><p>In other news....</p><p>second paragraph</p></text>
<photo>
<caption></caption>
<photo_filename>197943-96068.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96069.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96067.jpg</photo_filename>
<photocredit></photocredit>
</photo>
</story>
</magazine>

So there could be multiple <story>'s for each magazine. Now in the backend, the data gets stored into an Oracle database. However, the data for the
photo's are stored in a separate table from the actual story. What's the
best way to approach the parsing of the story contents, and building a query out of it, and then parsing the photo contents and building a query out of
that.

Any ideas are welcome. I've been trying to parse the xml file, however I
cannot think of a quick way of doing this. So I wonder maybe someone out
there, can guide me in the right direction and/or suggest a quick solution.

Nov 15 '05 #2

P: n/a

"Pir8" <pi**@mscorlib.com> wrote in message
news:%2****************@TK2MSFTNGP10.phx.gbl...
The main problem that I am concerned with is that within the <text> there
might be and will be html tags i.e. <p><strong><a href=""> and so on. I do
realize that I could use the node's innerxml property to retrieve this
but will there be any other complications in the future?
It *should* work, but it depends on your html. I wouldn't want to throw
non-xhtml html at an xml parser, its just not particularly safe. You might
want to consider wrapping the body of text in a CDATA section. The xml format that I pasted is pretty much the same...There are some other tags that I did not include, which
also will go into a separate table of its own into oracle. I asked about the format because, personally, I would have used an id
attribute instead of a <story_id> element.
My main concern is that, when parsing the <story>, <photo> separately, I
need to associate the <story_id>
along with the data from the <photo> section, so as to enter it into the
database to keep the appropriate
relationships for the application that will be using this data.

I will read more about XPath and how it can be helpful. I appreciate your
suggestions. Well, as a very base concept I would probably query the xml document with
XPathNavigator using the xpath query /magazine/story, use the resultant
XPathNodeIterator to grab each story and use subseqent queries to pull out
the various pieces out.
"Daniel O'Connell" <onyxkirx@--NOSPAM--comcast.net> wrote in message
news:Od**************@TK2MSFTNGP10.phx.gbl...
If I may ask, what kind of problems are you having? Serialization is
probably not your only answer(it could have flexibility issues). My
immediate idea would be to use xpath. At that, is this xml format set in
stone?
"Pir8" <pi**@mscorlib.com> wrote in message
news:OM**************@TK2MSFTNGP11.phx.gbl...
I have a complex xml file, which contains stories within a magazine. The structure of the xml file is as follows:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<magazine>
<story>
<story_id>112233</story_id>
<pub_name>Puleen's Publication</pub_name>
<pub_code>PP</pub_code>
<edition_date>20031201</edition_date>
<edition_name></edition_name>
<section_name></section_name>
<page_id></page_id>
<headline>My Story Headline</headline>
<subhead>Sub head</subhead>
<byline>Puleen</byline>
<source></source>
<dateline></dateline>
<storytype></storytype>
<column>Search</column>
<company_list></company_list>
<keyword_list></keyword_list>
<text><p>In other news....</p><p>second paragraph</p></text>
<photo>
<caption></caption>
<photo_filename>197943-96068.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96069.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96067.jpg</photo_filename>
<photocredit></photocredit>
</photo>
</story>
</magazine>

So there could be multiple <story>'s for each magazine. Now in the backend,
the data gets stored into an Oracle database. However, the data for the photo's are stored in a separate table from the actual story. What's the best way to approach the parsing of the story contents, and building a

query
out of it, and then parsing the photo contents and building a query out of
that.

Any ideas are welcome. I've been trying to parse the xml file, however

I cannot think of a quick way of doing this. So I wonder maybe someone out there, can guide me in the right direction and/or suggest a quick

solution.



Nov 15 '05 #3

P: n/a
assuming that the file is valid XML (even with the embedded HTML), you can
easily extract components of the structure using XPath queries, and even
iterate over the structure, pulling out each photo and each item.

The query for story_id is literally: /magazine/story/story_id

--- Nick
"Pir8" <pi**@mscorlib.com> wrote in message
news:OM**************@TK2MSFTNGP11.phx.gbl...
I have a complex xml file, which contains stories within a magazine. The
structure of the xml file is as follows:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<magazine>
<story>
<story_id>112233</story_id>
<pub_name>Puleen's Publication</pub_name>
<pub_code>PP</pub_code>
<edition_date>20031201</edition_date>
<edition_name></edition_name>
<section_name></section_name>
<page_id></page_id>
<headline>My Story Headline</headline>
<subhead>Sub head</subhead>
<byline>Puleen</byline>
<source></source>
<dateline></dateline>
<storytype></storytype>
<column>Search</column>
<company_list></company_list>
<keyword_list></keyword_list>
<text><p>In other news....</p><p>second paragraph</p></text>
<photo>
<caption></caption>
<photo_filename>197943-96068.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96069.jpg</photo_filename>
<photocredit></photocredit>
</photo>
<photo>
<caption></caption>
<photo_filename>197943-96067.jpg</photo_filename>
<photocredit></photocredit>
</photo>
</story>
</magazine>

So there could be multiple <story>'s for each magazine. Now in the backend, the data gets stored into an Oracle database. However, the data for the
photo's are stored in a separate table from the actual story. What's the
best way to approach the parsing of the story contents, and building a query out of it, and then parsing the photo contents and building a query out of
that.

Any ideas are welcome. I've been trying to parse the xml file, however I
cannot think of a quick way of doing this. So I wonder maybe someone out
there, can guide me in the right direction and/or suggest a quick solution.

Nov 15 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.