473,387 Members | 3,801 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

High Performance Xml parser

Hi,
I am looking for component which allows me to parse my xml file.
the reason i am asking this, is because my xml files are huge it can
reach as far as 1GB more or less.
the time to parse such a file is something like 5 Hours.
Now i am using the XmlRead, XmlNode ... (I do not load the file to the
memory).
Can you suggest better components to use?

** I tried SAX but i couldn't understand how it works, because there is
no examples for .net , and very bad documentation.
p.s : I am writing in C#.

Regards, Rony

Nov 27 '06 #1
3 3822
If parsing a 1GB file is taking 5 hours, the problem isn't the parser --
it's the fact that the data model (presumably an implementation of the
DOM?) is becoming so huge that your machine's thrashing itself to death
swapping data in and out of memory.

SAX-based processing, when appropriate, is indeed a recommended solution
for that. Or SAX feeding into a more specialized data model. Or --
perhaps -- an XML database tool, which has its own specialized models
and may be able to handle paging of data more intelligently than the
system's default swapper.

I don't use C#, so I can't advise you regarding specific tools.

--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
Nov 27 '06 #2
rony wrote:
I am looking for component which allows me to parse my xml file.
the reason i am asking this, is because my xml files are huge it can
reach as far as 1GB more or less.
the time to parse such a file is something like 5 Hours.
Now i am using the XmlRead, XmlNode ... (I do not load the file to the
memory).
Can you suggest better components to use?

** I tried SAX but i couldn't understand how it works, because there is
no examples for .net , and very bad documentation.
p.s : I am writing in C#.
XmlNode in the .NET framework is part of .NET's DOM implementation thus
if you use XmlNode then your code is loading the XML in memory, or at
least part of it depending on what exactly your code does.

With .NET you have XmlReader for fast forwards only pull parsing, that
is the best approach the .NET framework has to offer for parsing such
large files. With the XmlReader the memory/resource consumption should
not increase with the size of the XML as the reader pulls in the XML
node by node.

I think microsoft.public.dotnet.xml is a better place to discuss .NET
specific questions on parsing XML.

--

Martin Honnen
http://JavaScript.FAQTs.com/
Nov 27 '06 #3
HI,
What i am doing is making a reader with XmlTextReader
end then
while (reader.Read())
{
}
so nothing is loaded to the memory.
but still i think 5 hours to 1gb of xml file is very slow.
is there any components that based on sax that can improve the
performance?
Martin Honnen wrote:
rony wrote:
I am looking for component which allows me to parse my xml file.
the reason i am asking this, is because my xml files are huge it can
reach as far as 1GB more or less.
the time to parse such a file is something like 5 Hours.
Now i am using the XmlRead, XmlNode ... (I do not load the file to the
memory).
Can you suggest better components to use?

** I tried SAX but i couldn't understand how it works, because there is
no examples for .net , and very bad documentation.
p.s : I am writing in C#.

XmlNode in the .NET framework is part of .NET's DOM implementation thus
if you use XmlNode then your code is loading the XML in memory, or at
least part of it depending on what exactly your code does.

With .NET you have XmlReader for fast forwards only pull parsing, that
is the best approach the .NET framework has to offer for parsing such
large files. With the XmlReader the memory/resource consumption should
not increase with the size of the XML as the reader pulls in the XML
node by node.

I think microsoft.public.dotnet.xml is a better place to discuss .NET
specific questions on parsing XML.

--

Martin Honnen
http://JavaScript.FAQTs.com/
Nov 28 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Andreas Lauffer | last post by:
I changed from Access97 to AccessXP and I have immense performance problems. Details: - Access XP MDB with Jet 4.0 ( no ADP-Project ) - Linked Tables to SQL-Server 2000 over ODBC I used...
2
by: sree | last post by:
hello, I am working on a project that requires improving the performance of xml to reduce the access time . I use xml to take the values from a database located in US and store the values and...
2
by: Tom Kerigan | last post by:
I know that longer element names increase the size of an XML document, ultimately resulting in a larger amount of data at parse-time. Is there anything else, specifically related to an element name...
8
by: Benjamin Bécar | last post by:
Hello everyone. I have to find a correct architecture to achieve this XML <=Text conversion platform. The platform (based on Win2003Server) will have to deal with 21 million XML files and 16...
3
by: rony_16 | last post by:
Hi, I am looking for component which allows me to parse my xml file. the reason i am asking this, is because my xml files are huge it can reach as far as 1GB more or less. the time to parse such...
9
by: starlight | last post by:
Hallo, there were some posts about this, but nothing I could find useful. I have a large XML file (80MB) and need certain information out of it. I though I could use XSLT with an fairy simple...
1
by: Robert Strickland | last post by:
I have a .Net web service (written for 1.1 Framework using C#) running on Windows 2003 with all latest patches. To help monitor the service, the code creates several performance counters. One...
2
by: Jay Loden | last post by:
All, In studying Python, I have predictably run across quite a bit of talk about the GIL and threading in Python. As my day job, I work with a (mostly Java) application that is heavily threaded....
0
by: dotnetrocks | last post by:
Hi, I'm writing a high performance tcp/ip server using IOCP. Recently I found XF.Server component at http://www.kodart.com They claim that it is the fastest server implementation. Is it possible?...
2
by: Paul McGuire | last post by:
I just ran my pyparsing unit tests with the latest Python 2.6b1 (labeled internally as Python 2.6a3 - ???), and the current 1.5.0 version of pyparsing runs with no warnings or regressions. I was...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.