By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,997 Members | 2,841 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,997 IT Pros & Developers. It's quick & easy.

Garbage Collection Problems: Performance and Optimization for WebService XmlDocument XPath query

P: n/a

I wrote a webservice to output a report file. The fields of the report
are formatted based on information in an in-memory XmlDocument. As
each row of a SqlDataReader are looped through, a lookup is done, and
format information retrieved.

The performance was extremely poor -- producing about 1000 rows per minute.

However, when I used tracing/logging, my results were inconclusive.
First of all, based on the size of the data and the size of the
XmlDocument, I would have expected the whole process per record to be < 1ms.

I put a statement to record the time, to the millesecond, before each
call to the XmlDocument, and in the routine, before and after each XPath
query. Then I put a statement after each line was written to the text
stream.

What was odd, was that I could see milleseconds being chewed up in the
code, that contributed to the poor performance, the time where it was
chewed up was random! Sometimes the XmlDocument was 0 ms, sometimes
20-30s per lookup. Sometimes, the clock would add ms in the loop that
retrieved the record from the dataset.

Another thing that puzzled me is that as the program ran, performance
*degraded* -- the whole loop and all the individual processes ran slower
and slower!

To me, this indicates severe problems with Ms .NET garbage collection
and memory management.
--
http://www.texeme.com/
Nov 22 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
You shouldn't be using an in-memory XML document and XPath when you care
about performance. XML, by its very nature, is slow. You should be loading
the information from the XML file into a normal class and use a hash-map for
the lookup.

Jonathan

"John Bailo" <ja*****@earthlink.net> wrote in message
news:2q*************@uni-berlin.de...

I wrote a webservice to output a report file. The fields of the report
are formatted based on information in an in-memory XmlDocument. As
each row of a SqlDataReader are looped through, a lookup is done, and
format information retrieved.

The performance was extremely poor -- producing about 1000 rows per minute.
However, when I used tracing/logging, my results were inconclusive.
First of all, based on the size of the data and the size of the
XmlDocument, I would have expected the whole process per record to be < 1ms.
I put a statement to record the time, to the millesecond, before each
call to the XmlDocument, and in the routine, before and after each XPath
query. Then I put a statement after each line was written to the text
stream.

What was odd, was that I could see milleseconds being chewed up in the
code, that contributed to the poor performance, the time where it was
chewed up was random! Sometimes the XmlDocument was 0 ms, sometimes
20-30s per lookup. Sometimes, the clock would add ms in the loop that
retrieved the record from the dataset.

Another thing that puzzled me is that as the program ran, performance
*degraded* -- the whole loop and all the individual processes ran slower
and slower!

To me, this indicates severe problems with Ms .NET garbage collection
and memory management.
--
http://www.texeme.com/

Nov 22 '05 #2

P: n/a
Jonathan Allen wrote:
You shouldn't be using an in-memory XML document and XPath when you care
about performance. XML, by its very nature, is slow. You should be loading
the information from the XML file into a normal class and use a hash-map for
the lookup.
I used a string array.

But what do you mean -- "by it's very nature" -- that is meaningless. An
XmlDocument object should be a b-tree -- in code essentially -- and
hence fast. And my tracing showed that it would sometimes be fast --
0ms and sometimes slow - 20-30ms. Why would it be random -- unless the
..Net memory model is severely flawed.

Why? Also, the performance of the code did not change when I moved it
from a single proc with .5G memory with hyperthreading to a dual proc
with hyperthreading and 2G memory.

The performance was /exactly/ the same! How can that be ? Does .NET
have inherent limitations in terms of accessing system resources ?!

Jonathan

"John Bailo" <ja*****@earthlink.net> wrote in message
news:2q*************@uni-berlin.de...
I wrote a webservice to output a report file. The fields of the report
are formatted based on information in an in-memory XmlDocument. As
each row of a SqlDataReader are looped through, a lookup is done, and
format information retrieved.

The performance was extremely poor -- producing about 1000 rows per


minute.
However, when I used tracing/logging, my results were inconclusive.
First of all, based on the size of the data and the size of the
XmlDocument, I would have expected the whole process per record to be <


1ms.
I put a statement to record the time, to the millesecond, before each
call to the XmlDocument, and in the routine, before and after each XPath
query. Then I put a statement after each line was written to the text
stream.

What was odd, was that I could see milleseconds being chewed up in the
code, that contributed to the poor performance, the time where it was
chewed up was random! Sometimes the XmlDocument was 0 ms, sometimes
20-30s per lookup. Sometimes, the clock would add ms in the loop that
retrieved the record from the dataset.

Another thing that puzzled me is that as the program ran, performance
*degraded* -- the whole loop and all the individual processes ran slower
and slower!

To me, this indicates severe problems with Ms .NET garbage collection
and memory management.
--
http://www.texeme.com/


--
http://www.texeme.com
Nov 22 '05 #3

P: n/a
Jonathan Allen wrote:
You shouldn't be using an in-memory XML document and XPath when you care
about performance. XML, by its very nature, is slow. You should be loading
the information from the XML file into a normal class and use a hash-map for
the lookup.


Here's some sample code that shows exactly what I mean.

I've compiled this code, and run it against the attached xml file.

My results, from running on a P4 workstation are as below.

I have compiled this code both using .Net's compiler and the mono
compiler for Windows ( www.go-mono.com ). The results are exactly the same.

What you see is that the same query, executed over and over again,
sometimes takes 0 seconds and then randomly 16 seconds.

Why would such a thing happen?

16
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
16
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
16
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
16
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
15
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
16
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
15
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
16
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
16
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
15
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
16
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
16
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
15
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
16
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
15
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
0
This is book9999
16
This is book9999
0
This is book9999
0
This is book9999

--
http://www.texeme.com/

using System;
using System.Xml;
using System.Xml.XPath;

namespace XMLSamps
{

public class readwrite {


static void Main(string[] args)
{
// Load your Xml document
XmlDocument mydoc = new XmlDocument();
mydoc.Load(args[0]);
int beg = 0;
//Use selectNodes to get the book node where the attribute "id=1"
//and write out the response
for(int i=0; i<1000; i++)
{
beg = DateTime.Now.Second*1000 + DateTime.Now.Millisecond;
XmlNode xmn = mydoc.SelectSingleNode("//book[@id='9999']");
Console.WriteLine((DateTime.Now.Second*1000 +DateTime.Now.Millisecond)-beg);
Console.WriteLine(xmn.InnerText);
}

}

static string getPath()
{
string path;
path = System.IO.Path.GetDirectoryName(
System.Reflection.Assembly.GetExecutingAssembly(). GetName().CodeBase );
return path;
}
}
}
Nov 22 '05 #4

P: n/a
Thanks for the sample. One note, you might want to try using "Ticks" instead
of seconds and milliseconds. (It doesn't change the result, it just find it
helpful.)

Jonathan
Nov 22 '05 #5

P: n/a
> But what do you mean -- "by it's very nature" -- that is meaningless.

This is the best explaination I've read about why you should avoid using XML
as much as possible.

http://www.joelonsoftware.com/articl...000000319.html

That said, I would like to tell you the lesson I keep forgetting, "Don't
worry about performance until it becomes an issue". If using XML internally
is "fast enough", the don't go off and start building your own classes.
Concentrate on areas where making improvements will actually be noticeable
to the user.
And my tracing showed that it would sometimes be fast --
0ms and sometimes slow - 20-30ms. Why would it be random -- unless the
.Net memory model is severely flawed.
I think it is because you are running multiple applications. That 16 ms
could be the amount of time it takes Windows to check to see if any other
programs want to run.

Jonathan

"John Bailo" <ja*****@earthlink.net> wrote in message
news:mA****************@newsread3.news.pas.earthli nk.net... Jonathan Allen wrote:
You shouldn't be using an in-memory XML document and XPath when you care
about performance. XML, by its very nature, is slow. You should be loading the information from the XML file into a normal class and use a hash-map for the lookup.


I used a string array.

But what do you mean -- "by it's very nature" -- that is meaningless. An
XmlDocument object should be a b-tree -- in code essentially -- and
hence fast. And my tracing showed that it would sometimes be fast --
0ms and sometimes slow - 20-30ms. Why would it be random -- unless the
.Net memory model is severely flawed.

Why? Also, the performance of the code did not change when I moved it
from a single proc with .5G memory with hyperthreading to a dual proc
with hyperthreading and 2G memory.

The performance was /exactly/ the same! How can that be ? Does .NET
have inherent limitations in terms of accessing system resources ?!

Jonathan

"John Bailo" <ja*****@earthlink.net> wrote in message
news:2q*************@uni-berlin.de...
I wrote a webservice to output a report file. The fields of the report
are formatted based on information in an in-memory XmlDocument. As
each row of a SqlDataReader are looped through, a lookup is done, and
format information retrieved.

The performance was extremely poor -- producing about 1000 rows per


minute.
However, when I used tracing/logging, my results were inconclusive.
First of all, based on the size of the data and the size of the
XmlDocument, I would have expected the whole process per record to be <


1ms.
I put a statement to record the time, to the millesecond, before each
call to the XmlDocument, and in the routine, before and after each XPath
query. Then I put a statement after each line was written to the text
stream.

What was odd, was that I could see milleseconds being chewed up in the
code, that contributed to the poor performance, the time where it was
chewed up was random! Sometimes the XmlDocument was 0 ms, sometimes
20-30s per lookup. Sometimes, the clock would add ms in the loop that
retrieved the record from the dataset.

Another thing that puzzled me is that as the program ran, performance
*degraded* -- the whole loop and all the individual processes ran slower
and slower!

To me, this indicates severe problems with Ms .NET garbage collection
and memory management.
--
http://www.texeme.com/


--
http://www.texeme.com

Nov 22 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.