473,324 Members | 2,535 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

Large XML file opinions

Hi,

I have several large XML files (500-700 MB) that get updated once a day. I
have an app that will need to Query (read only operations is all I'll ever
need) these files. Obviously, querying a file this large will be somewhat
of a challenge because it can't be loaded into memory all at once. Also, I
am skeptical of the speed of Xpath/Xquery.

I am wondering if I shouldn't just create a table in SQL server, then create
a routine that dumps the XML file's values into corresponding SQL server
columns.

More background info, my app will need to query this XML data fairly
regularly, about every 5 minutes that the app is being used. So speed and
efficiency are crucial.

Any thoughts on a best implementation strategy would be much appreciated!

Thanks,
Marc.

Nov 12 '05 #1
2 1333
Without running any actual benchmarks, I would say that putting the
described XML file into a database will be a more efficient solution for the
outlined problem. I'm also assuming that the fields that you will be
searching on can be indexed.

Richard Rosenheim
"Marc Thompson" <(NO SPAM) my email address is marc at sycron dot com> wrote
in message news:uL**************@TK2MSFTNGP12.phx.gbl...
Hi,

I have several large XML files (500-700 MB) that get updated once a day. I have an app that will need to Query (read only operations is all I'll ever
need) these files. Obviously, querying a file this large will be somewhat
of a challenge because it can't be loaded into memory all at once. Also, I am skeptical of the speed of Xpath/Xquery.

I am wondering if I shouldn't just create a table in SQL server, then create a routine that dumps the XML file's values into corresponding SQL server
columns.

More background info, my app will need to query this XML data fairly
regularly, about every 5 minutes that the app is being used. So speed and
efficiency are crucial.

Any thoughts on a best implementation strategy would be much appreciated!

Thanks,
Marc.

Nov 12 '05 #2
Marc Thompson wrote:
I have several large XML files (500-700 MB) that get updated once a day. I
have an app that will need to Query (read only operations is all I'll ever
need) these files. Obviously, querying a file this large will be somewhat
of a challenge because it can't be loaded into memory all at once. Also, I
am skeptical of the speed of Xpath/Xquery.

I am wondering if I shouldn't just create a table in SQL server, then create
a routine that dumps the XML file's values into corresponding SQL server
columns.

More background info, my app will need to query this XML data fairly
regularly, about every 5 minutes that the app is being used. So speed and
efficiency are crucial.


Alternatively you may want take a look at XML data type columns in SQL
Server 2005.
Otherwise you would need to load that XML into XPathDocument (read-only
hence simpler, faster and smaller than XmlDocument) and query it.
Optimize your queries (never use // etc).
If your queries are similar ones, you might try XPath indexing using
IndexingXPathNavigator from Mvp.Xml library (http://mvp-xml.sf.net/common).

--
Oleg Tkachenko [XML MVP, MCP]
http://blog.tkachenko.com
Nov 12 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Shashikant Kore | last post by:
Hi, I am using MySQL for a table which will have 100M+ records, avg length of records being 130 bytes. When the number of records reach approx. 25M (and the file size close to 4GB), the rate of...
3
by: David | last post by:
I found and interesting article, "Experiences of Using PHP in Large Websites" (http://www.ukuug.org/events/linux2002/papers/html/php/) , which lists some issues with scaling PHP for larger...
3
by: Chris | last post by:
When you have to read a big file (5-30MB) and throw the data into the database, ofcourse some logics inbetween (doesn't matter) which of the ADO methods is recommended. 1. read line by line and do...
1
by: Patrick | last post by:
Hi, This post is the 'sequel' ;) of the "Data Oriented vs Object Oriented Design" post, but it can be read and treated apart from that one. I will just quote the beginning of my previous message...
8
by: Sarah | last post by:
I need to access some data on a server. I can access it directly using UNC (i.e. \\ComputerName\ShareName\Path\FileName) or using a mapped network drive resource (S:\Path\FileName). Here is my...
5
by: Tim Marsden | last post by:
HI, I am developing a application in vb.net. I split down my functionality into several separate DLL's. The solution is becoming very large, nearly 100 projects, each project a dll's. There is...
3
by: Dan Munk | last post by:
Hello, I am working on a very large multi-tier Web application. The application consists of approximately 100 middle-tier/back-end projects and 200-300 presentation projects. Obviously this is...
0
by: Christoph Haas | last post by:
Hi, list... I have written an application in Perl some time ago (I was young and needed the money) that parses multiple large text files containing nested data structures and allows the user to...
22
by: Jesse Burns | last post by:
I'm about to start working on my first large scale site (in my opinion) that will hopefully have 1000+ users a day. ok, this isn't on the google/facebook scale, but it's going to be have more hits...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.