Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old February 28th, 2006, 06:55 PM
Robert Bevington
Guest
 
Posts: n/a
Default Parsing XML file slow!

Hi everyone,

I've written some code that parses an XML file. The whole thing works correctly, but it's really slow. My example only has two elem_x elements. A normal XML file could theoretically have up to a 5-figure number of elem_x elements. I'm testing with a file that has 150 elem_x elements. And it's just taking too long. Basically I'm trying to turn the XML file into a column-based format, as follows:

Input XML file

<elem_x att_a="1" att_b="2" att_c="3">
<elem_y>
<elem_z att_k="9" att_l="8" att_m="7"></elem_z>
</elem_y>
<elem_y lang="EN-US">
<elem_z att_k="9" att_m="7"></elem_z>
<elem_r>textEN</elem_r>
</elem_y>
<elem_y lang="DE-DE">
<elem_z att_k="9" att_l="8" att_m="7"></elem_z>
<elem_r>textDE</elem_r>
</elem_y>
</elem_x>
<elem_x att_a="4" att_b="5" att_c="6">
<elem_y>
<elem_z att_k="6" att_l="5" att_m="4"></elem_z>
</elem_y>
<elem_y lang="EN-US">
<elem_z att_k="6" att_l="5" att_m="4"></elem_z>
<elem_r>textEN</elem_r>
</elem_y>
<elem_y lang="DE-DE">
<elem_z att_k="6" att_l="5" att_m="4"></elem_z>
<elem_r>textDE</elem_r>
</elem_y>
</elem_x>...

The parsed output is a tab-separated file and should look something like this:

att_a att_b att_c att_k att_l att_m EN-US DE-DE
1 2 3 9 7 textEN textDE
4 5 6 6 5 4... TextEN textDE

Since some attributes can be missing in a particular elememt, I have to loop through the entire file to ensure that the column order does not get mixed up. To complicate the matter slightly, I only want to read the attributes from one of the elem_y elements as they are always the same for each elem_y.

I've used the XMLDocument class and using Xpath and SelectedNodes I can drill down through the XML file, navigating to each node block and looping through it, reading the attribute names and values accordingly. By doing this I can build an array which I can then write to the output file. However, I have a feeling my problem is the high number of loops, which is slowing everything down. I've parsed the XML file using an XmlReader and loaded it into a dataset. This is much fastrer, but it just does not seem to help me solve my problem as the attributes for elem_z are not read out on one line, but line by line.

Is XML my problem? Should I try and use XSLT to transform the XML instead? Or would simply parsing it as a text file be more effective?

Any assistance would be greatly appreciated.

Robert


 

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles