First, please forgive my newness to XML. I've used it to serialize/
deserialize objects, exporting and importing datasets, and other such
things that pretty much automate reading in the file. I've done
extensive googling, and most examples people give are so simplistic it
makes me want to cry. Most are one level deep and utilize XMLDocument
or other in-memory processes, or do things like:
while(reader.Read()) {
//do things I'm not going to show you how to do
}
Needless to say, I'm frustrated. I wish I could use XMLDocument, but
theoretically my input file can range from a few MB to 10GB. Well, I
should say the file could be 10 GB within a year or two, not now. I'm
likely going to do an XMLDocument implementation so they have
something that works immediately for their 10MB files. The main
problem here is that we have no control over the writing of those
files, as they are exported automatically from EndNote (which has a
horrible XML output, btw).
In a very basic form, here is some of the XML:
<xml>
<records>
<record>
<contributors>
<authors>
<author>
<style font="default">Johnson, William P.</style>
</author>
</authors>
</contributors>
<titles>
<title>
<style font="default">This is the Main Title</style>
</title>
<secondary-title>
<style font="default">Because one title is never
enough</style>
</secondary-title>
</titles>
<work-type>
<style font="default">Journal Article</style>
</work-type>
</record>
. . .
</records>
</xml>
Ok, so first note that data is ALWAYS wrapped with that stupid style
tag. There's no way to change this. So much for the semantical nature
of XML. Basically, I have to go through each record in the file,
select out particular information (I left a lot of fields out), and
store it in a database. Furthermore, it requires some manipulation
such as appending multiple authors into a comma-delimited string and
things like that.
Can anyone give me some pointers on how to approach this? I assume I'm
going to have to use XMLReader due to memory constraints, but I've
never seen an example past one level deep hierarchy (i.e. I don't just
care if the tag is "work-type", because I have to associate that with
other data within that record).
Or if there's an external library that can make this work as easily as
for-each loops and such, I'm willing to do that as well.
Thanks in advance!