I am working on a project that will have about 500,000 records in an XML
document. This document will need to be queried with XPath, and records
will need to be updated. I was thinking about splitting up the XML into
several XML documents (perhaps 50,000 per document) to be more efficient but
this will make things a lot more complex because the searching needs to go
accross all 500,000 records. Can anyone point me to some best practices /
performance techniques for handling large XML documents? Obviously, the
XmlDocument object is probably not a good choice... 6 2582
Greg,
The recommended store to query large XML documents in .NET is the
XPathDocument. However, the XPathDocument, just as the XmlDocument will
keep all data from the document plus all the DOM-related information in
memory, i.e. you will need sufficient memory in your server. On top of
that, you have to deal with whatever query optimizing the XPathDocument
does under the covers. If you wanted to add any custom indexing, you
would have first walk the entire document to build your custom index.
Would be able to add a SQL Server database (MSDE might do, but
preferably SQL 2005 Express, currently in Beta 2) to your environment?
Is your XML format strongly structured, so it's easily shredded into
relational tables? If that's the case, you'd save yourself the headache
of managing memory and indexes and let SQL Server do the work for you.
With SQL 2005 you can even store the XML document as a whole in a column
and let SQL Server do the indexing.
HTH,
Christoph Schittko
MS MVP XML http://weblogs.asp.net/cschittko -----Original Message----- From: Greg [mailto:na] Posted At: Monday, December 27, 2004 1:49 PM Posted To: microsoft.public.dotnet.xml Conversation: Querying Very Large XML Subject: Querying Very Large XML
I am working on a project that will have about 500,000 records in an
XML document. This document will need to be queried with XPath, and
records will need to be updated. I was thinking about splitting up the XML
into several XML documents (perhaps 50,000 per document) to be more
efficient but this will make things a lot more complex because the searching needs
to go accross all 500,000 records. Can anyone point me to some best
practices / performance techniques for handling large XML documents? Obviously,
the XmlDocument object is probably not a good choice...
Thanks for the info Chris. I was thinking along the same lines w/ the XML
objects. Unfortunately, a database isn't really an option for us due to
the cost (or percieved cost... and databases need DBA's...). A big reason
for using XML is to avoid having to use and maintain a database. We are
phasing out an old VAX program that currently does things completely file
based, and trying to do a similar thing with XML on the .NET platform. The
data tends to be relatively simple- the general process is going from a
fixed flat file, converting to XML, and then allowing the user to build
queries for tweaking some of the data. The queries would be XPATH (of
course, built with a nice UI)... Perhaps one of the biggest challenges is
eliminating duplicate records accross the entire data set. I'll probably
have to come up with an interesting data structure to do it efficiently in
conjunction with XML since I won't be loading everything in to the
XpathDocument at once. I would think everything else that has to be done
should be relatively doable by chunking out the files and using
XpathDocuments and xpath queries.
"Christoph Schittko [MVP]" <IN**********@austin.rr.com> wrote in message
news:ew****************@TK2MSFTNGP14.phx.gbl... Greg,
The recommended store to query large XML documents in .NET is the XPathDocument. However, the XPathDocument, just as the XmlDocument will keep all data from the document plus all the DOM-related information in memory, i.e. you will need sufficient memory in your server. On top of that, you have to deal with whatever query optimizing the XPathDocument does under the covers. If you wanted to add any custom indexing, you would have first walk the entire document to build your custom index.
Would be able to add a SQL Server database (MSDE might do, but preferably SQL 2005 Express, currently in Beta 2) to your environment? Is your XML format strongly structured, so it's easily shredded into relational tables? If that's the case, you'd save yourself the headache of managing memory and indexes and let SQL Server do the work for you. With SQL 2005 you can even store the XML document as a whole in a column and let SQL Server do the indexing.
HTH, Christoph Schittko MS MVP XML http://weblogs.asp.net/cschittko
-----Original Message----- From: Greg [mailto:na] Posted At: Monday, December 27, 2004 1:49 PM Posted To: microsoft.public.dotnet.xml Conversation: Querying Very Large XML Subject: Querying Very Large XML
I am working on a project that will have about 500,000 records in an XML document. This document will need to be queried with XPath, and records will need to be updated. I was thinking about splitting up the XML into several XML documents (perhaps 50,000 per document) to be more efficient but this will make things a lot more complex because the searching needs to go accross all 500,000 records. Can anyone point me to some best practices / performance techniques for handling large XML documents? Obviously, the XmlDocument object is probably not a good choice...
Dare Obasanjo wrote this article about efficient ways to handle (read,
update) large XML files: http://msdn.microsoft.com/webservice...l/largexml.asp
Mujtaba.
"Greg" <na> wrote in message news:Ou*************@TK2MSFTNGP11.phx.gbl... I am working on a project that will have about 500,000 records in an XML document. This document will need to be queried with XPath, and records will need to be updated. I was thinking about splitting up the XML into several XML documents (perhaps 50,000 per document) to be more efficient
but this will make things a lot more complex because the searching needs to go accross all 500,000 records. Can anyone point me to some best practices
/ performance techniques for handling large XML documents? Obviously, the XmlDocument object is probably not a good choice...
Greg,
I was hoping that MSDE (or SQL 2005 Express) might let you get around
the "we don't want to run a database" argument. Both versions are free
and shouldn't require much maintenance. Yet they provide the same XML
support as the full version of SQL Server. The only downside is that
they are not really built for concurrent access by a bigger number of
users simultaneously.
You sound like you know what you're in for with not using a database in
terms of concurrency management, access control, indexing across the
individual chunks, transactional integrity, etc, i.e. all those reasons
why databases are popular ;).
If you determined that it's still more economical to build that
functionality then that's hard to argue with. The trickiest piece to
figure out is figuring out which file to add new XML and how to perform
any updates that spawn multiple files, but again ... you sound like
you're well aware of what you're in for.
HTH,
Christoph Schittko
MVP XML http://weblogs.asp.net/cschittko -----Original Message----- From: Greg [mailto:na] Posted At: Monday, December 27, 2004 3:21 PM Posted To: microsoft.public.dotnet.xml Conversation: Querying Very Large XML Subject: Re: Querying Very Large XML
Thanks for the info Chris. I was thinking along the same lines w/ the
XML objects. Unfortunately, a database isn't really an option for us due
to the cost (or percieved cost... and databases need DBA's...). A big reason for using XML is to avoid having to use and maintain a database. We
are phasing out an old VAX program that currently does things completely
file based, and trying to do a similar thing with XML on the .NET platform. The data tends to be relatively simple- the general process is going from
a fixed flat file, converting to XML, and then allowing the user to
build queries for tweaking some of the data. The queries would be XPATH (of course, built with a nice UI)... Perhaps one of the biggest
challenges is eliminating duplicate records accross the entire data set. I'll
probably have to come up with an interesting data structure to do it
efficiently in conjunction with XML since I won't be loading everything in to the XpathDocument at once. I would think everything else that has to be
done should be relatively doable by chunking out the files and using XpathDocuments and xpath queries. "Christoph Schittko [MVP]" <IN**********@austin.rr.com> wrote in
message news:ew****************@TK2MSFTNGP14.phx.gbl... Greg,
The recommended store to query large XML documents in .NET is the XPathDocument. However, the XPathDocument, just as the XmlDocument
will keep all data from the document plus all the DOM-related information
in memory, i.e. you will need sufficient memory in your server. On top
of that, you have to deal with whatever query optimizing the
XPathDocument does under the covers. If you wanted to add any custom indexing, you would have first walk the entire document to build your custom
index. Would be able to add a SQL Server database (MSDE might do, but preferably SQL 2005 Express, currently in Beta 2) to your
environment? Is your XML format strongly structured, so it's easily shredded into relational tables? If that's the case, you'd save yourself the
headache of managing memory and indexes and let SQL Server do the work for
you. With SQL 2005 you can even store the XML document as a whole in a
column and let SQL Server do the indexing.
HTH, Christoph Schittko MS MVP XML http://weblogs.asp.net/cschittko
-----Original Message----- From: Greg [mailto:na] Posted At: Monday, December 27, 2004 1:49 PM Posted To: microsoft.public.dotnet.xml Conversation: Querying Very Large XML Subject: Querying Very Large XML
I am working on a project that will have about 500,000 records in
an XML document. This document will need to be queried with XPath, and records will need to be updated. I was thinking about splitting up the XML into several XML documents (perhaps 50,000 per document) to be more efficient but this will make things a lot more complex because the searching
needs to go accross all 500,000 records. Can anyone point me to some best practices / performance techniques for handling large XML documents?
Obviously, the XmlDocument object is probably not a good choice...
Mujtaba, thanks for the link to the article. Those are some interesting
ideas!
Greg
"Mujtaba Syed" <mu*****@marlabs.com> wrote in message
news:ew**************@TK2MSFTNGP09.phx.gbl... Dare Obasanjo wrote this article about efficient ways to handle (read, update) large XML files:
http://msdn.microsoft.com/webservice...l/largexml.asp Mujtaba.
"Greg" <na> wrote in message news:Ou*************@TK2MSFTNGP11.phx.gbl... I am working on a project that will have about 500,000 records in an XML document. This document will need to be queried with XPath, and records will need to be updated. I was thinking about splitting up the XML into several XML documents (perhaps 50,000 per document) to be more efficient but this will make things a lot more complex because the searching needs to
go accross all 500,000 records. Can anyone point me to some best
practices / performance techniques for handling large XML documents? Obviously,
the XmlDocument object is probably not a good choice...
Chris, please see responses inline..
"Christoph Schittko [MVP]" <IN**********@austin.rr.com> wrote in message
news:e9**************@TK2MSFTNGP15.phx.gbl... Greg,
I was hoping that MSDE (or SQL 2005 Express) might let you get around the "we don't want to run a database" argument. Both versions are free and shouldn't require much maintenance. Yet they provide the same XML support as the full version of SQL Server. The only downside is that they are not really built for concurrent access by a bigger number of users simultaneously.
MSDE is a good alternative but there are definitely some costs associated
with running it (at least that is what my manager will tell me). MSDE is
vulnerable to many of the same exploits that SQL Server is, so that means it
will have to be updated periodically. With my particular application,
that's probably the only real maintenance cost that would need to be
considered since I will be reloading the entire data set frequently.
However, I will definitely have to think about it as an alternative. It
would be interesting to estimate out what it would take to do an MSDE
solution vs. an XML solution. Even if it were cheaper to initially develop,
I think I could be challenged with the "what about maintenance and security"
concerns. Concurrency is definitely not an issue because it is only a one
user application. The only technical issue would be if there is a limit to
how much data you can store in MSDE, of which I don't believe there is one. You sound like you know what you're in for with not using a database in terms of concurrency management, access control, indexing across the individual chunks, transactional integrity, etc, i.e. all those reasons why databases are popular ;).
Transactional integrity and indexing is another good point. With spanning
multiple files, I'll probably need to be able to rollback changes if an
update on one of them fails. That may mean having to create new files, then
deleting the old ones when they are all successful. I'm not that concerned
about indexing since most of the searching I'm doing will be on just about
any field. XPath seems to do a pretty good job since most everything is
loaded in memory (at least for the file I'm searching...) If you determined that it's still more economical to build that functionality then that's hard to argue with. The trickiest piece to figure out is figuring out which file to add new XML and how to perform any updates that spawn multiple files, but again ... you sound like you're well aware of what you're in for.
I won't actually need to add new XML, I'll just need to update certain
records it in my particular case. That definitely simplifies things.
Regardless, I think I'm going to take a look at what it may take to do an
MSDE solution. Thanks for the suggestion.
Greg HTH, Christoph Schittko MVP XML http://weblogs.asp.net/cschittko
-----Original Message----- From: Greg [mailto:na] Posted At: Monday, December 27, 2004 3:21 PM Posted To: microsoft.public.dotnet.xml Conversation: Querying Very Large XML Subject: Re: Querying Very Large XML
Thanks for the info Chris. I was thinking along the same lines w/ the XML objects. Unfortunately, a database isn't really an option for us due to the cost (or percieved cost... and databases need DBA's...). A big reason for using XML is to avoid having to use and maintain a database. We are phasing out an old VAX program that currently does things completely file based, and trying to do a similar thing with XML on the .NET platform. The data tends to be relatively simple- the general process is going from a fixed flat file, converting to XML, and then allowing the user to build queries for tweaking some of the data. The queries would be XPATH (of course, built with a nice UI)... Perhaps one of the biggest challenges is eliminating duplicate records accross the entire data set. I'll probably have to come up with an interesting data structure to do it efficiently in conjunction with XML since I won't be loading everything in to the XpathDocument at once. I would think everything else that has to be done should be relatively doable by chunking out the files and using XpathDocuments and xpath queries. "Christoph Schittko [MVP]" <IN**********@austin.rr.com> wrote in message news:ew****************@TK2MSFTNGP14.phx.gbl... Greg,
The recommended store to query large XML documents in .NET is the XPathDocument. However, the XPathDocument, just as the XmlDocument will keep all data from the document plus all the DOM-related information in memory, i.e. you will need sufficient memory in your server. On top of that, you have to deal with whatever query optimizing the XPathDocument does under the covers. If you wanted to add any custom indexing, you would have first walk the entire document to build your custom index. Would be able to add a SQL Server database (MSDE might do, but preferably SQL 2005 Express, currently in Beta 2) to your environment? Is your XML format strongly structured, so it's easily shredded into relational tables? If that's the case, you'd save yourself the headache of managing memory and indexes and let SQL Server do the work for you. With SQL 2005 you can even store the XML document as a whole in a column and let SQL Server do the indexing.
HTH, Christoph Schittko MS MVP XML http://weblogs.asp.net/cschittko
> -----Original Message----- > From: Greg [mailto:na] > Posted At: Monday, December 27, 2004 1:49 PM > Posted To: microsoft.public.dotnet.xml > Conversation: Querying Very Large XML > Subject: Querying Very Large XML > > I am working on a project that will have about 500,000 records in an XML > document. This document will need to be queried with XPath, and records > will need to be updated. I was thinking about splitting up the XML into > several XML documents (perhaps 50,000 per document) to be more efficient > but > this will make things a lot more complex because the searching needs to go > accross all 500,000 records. Can anyone point me to some best practices > / > performance techniques for handling large XML documents? Obviously, the > XmlDocument object is probably not a good choice... >
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Matt Young |
last post by:
I've been tasked with integrating an older management system based on
DBF files with my snappy new ASP application to provide users of the
ASP application with real-time data from the management...
|
by: Michael roedig |
last post by:
I need to query multiple (15) tables to build a data entry screen. Is
better to create 15 separate SELECT statements and query each table
independently or create 1 large SELECT and query one time?...
|
by: Emilio |
last post by:
(MS Access 2002)
Hello, I'm working with some big Census (PUMS) files, and I run into a
peculiar problem once the data field exceeds five integers. I'll explain
every step, since I am doing it in...
|
by: Shane |
last post by:
I wonder if someone has any ideas about the following.
I am currently producing some reports for a manufacturing company who work
with metal.
A finished part can contain multiple sub-parts to...
|
by: MDB |
last post by:
I'd normally Google for a question like this, and hope to snag a few
examples along with the answer, but this time I can't see to get the
keywords specific enough.
Or I'd ask coworkers, but...
|
by: loosecannon_1 |
last post by:
I get a 90-120 second blocking when send 15 or so simultaneous queries
to SQL Server 2000 that query a view made up of two joined tables.
After each query is blocking for the same amount of time...
|
by: sql_er |
last post by:
Guys,
I have an XML file which is 233MB in size. It was created by loading 6
tables from an sql server database into a dataset object and then
writing out the contents from this dataset into an...
|
by: RajSharma |
last post by:
Hi,
I am facing a problem regarding querying thru a large table having millions of rows.......
Its hanging in between while querying for all those rows
Can anybody suggest me a query regarding :...
|
by: =?Utf-8?B?U3VoYXMgVmVuZ2lsYXQ=?= |
last post by:
Hello,
I am facing an issue while querying Active directory using C# code with
system.DirectoryServices namespace.
Here is the path for my LDAP -
"LDAP://CN=XY - C++/Unix and other,...
|
by: Rina0 |
last post by:
Cybersecurity engineering is a specialized field that focuses on the design, development, and implementation of systems, processes, and technologies that protect against cyber threats and...
|
by: linyimin |
last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
|
by: erikbower65 |
last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA:
1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
|
by: kcodez |
last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
|
by: DJRhino1175 |
last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this -
If...
|
by: Rina0 |
last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
|
by: DJRhino |
last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer)
If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _
310030356 Or 310030359 Or 310030362 Or...
|
by: lllomh |
last post by:
How does React native implement an English player?
|
by: DJRhino |
last post by:
Was curious if anyone else was having this same issue or not....
I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
| |