473,385 Members | 1,673 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Best way to read in a large hierarchial XML file

First, please forgive my newness to XML. I've used it to serialize/
deserialize objects, exporting and importing datasets, and other such
things that pretty much automate reading in the file. I've done
extensive googling, and most examples people give are so simplistic it
makes me want to cry. Most are one level deep and utilize XMLDocument
or other in-memory processes, or do things like:

while(reader.Read()) {
//do things I'm not going to show you how to do
}

Needless to say, I'm frustrated. I wish I could use XMLDocument, but
theoretically my input file can range from a few MB to 10GB. Well, I
should say the file could be 10 GB within a year or two, not now. I'm
likely going to do an XMLDocument implementation so they have
something that works immediately for their 10MB files. The main
problem here is that we have no control over the writing of those
files, as they are exported automatically from EndNote (which has a
horrible XML output, btw).

In a very basic form, here is some of the XML:
<xml>
<records>
<record>
<contributors>
<authors>
<author>
<style font="default">Johnson, William P.</style>
</author>
</authors>
</contributors>
<titles>
<title>
<style font="default">This is the Main Title</style>
</title>
<secondary-title>
<style font="default">Because one title is never
enough</style>
</secondary-title>
</titles>
<work-type>
<style font="default">Journal Article</style>
</work-type>
</record>
. . .
</records>
</xml>

Ok, so first note that data is ALWAYS wrapped with that stupid style
tag. There's no way to change this. So much for the semantical nature
of XML. Basically, I have to go through each record in the file,
select out particular information (I left a lot of fields out), and
store it in a database. Furthermore, it requires some manipulation
such as appending multiple authors into a comma-delimited string and
things like that.

Can anyone give me some pointers on how to approach this? I assume I'm
going to have to use XMLReader due to memory constraints, but I've
never seen an example past one level deep hierarchy (i.e. I don't just
care if the tag is "work-type", because I have to associate that with
other data within that record).

Or if there's an external library that can make this work as easily as
for-each loops and such, I'm willing to do that as well.

Thanks in advance!

May 1 '07 #1
0 3044

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: | last post by:
I am new to VB6 and need some advice... I am developing software which aims to capture a text feed from a serial port consisting of news stories in ASCII format and then save this text for...
4
by: Frank Millman | last post by:
Hi all I need to generate potentially large reports from a database, and I want to offer the option of print preview before actually printing (using wxPython). I figure that the best way to...
4
by: Chuck Ritzke | last post by:
I keep asking myself this question as I write class modules. What's the best/smartest/most efficient way to send a large object back and forth to a class module? For example, say I have a data...
3
by: Johnny | last post by:
Hello all, I have a 1GB XML file that I need to read once a day and I would like to get feedback to find out what is the most efficient way to go about reading this file. The application reading...
1
by: rawCoder | last post by:
Hi All, Is the Hierarchial Display of data not in DataGridView in Visual Studio 2005 Beta 2 even? I am talking about showing rows with tree like (+) and (-) and collapse and expand in a...
0
by: David Helgason | last post by:
I think those best practices threads are a treat to follow (might even consider archiving some of them in a sort of best-practices faq), so here's one more. In coding an game asset server I want...
20
by: Joel Hedlund | last post by:
Hi all! I use python for writing terminal applications and I have been bothered by how hard it seems to be to determine the terminal size. What is the best way of doing this? At the end I've...
2
by: Kevin Ar18 | last post by:
I posted this on the forum, but nobody seems to know the solution: http://python-forum.org/py/viewtopic.php?t=5230 I have a zip file that is several GB in size, and one of the files inside of it...
4
by: ink | last post by:
Hi all, I am trying to pull some financial data off of an HTML web page so that I can store it in a Database for Sorting and filtering. I have been thinking about this for some time and trying...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.