By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,190 Members | 802 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,190 IT Pros & Developers. It's quick & easy.

Calls to GetElementsByTagName affect performance of XML DOM

P: n/a
Call to XmlNode.GetElementsByTagName returns XmlNodeList that stays in
sync with XmlDocument thanks to events fired by XmlDocument. Once this
list is created there is no way to remove its event handlers from the
document. Calling GetElementsByTagName second time for the same tag
name will create new list and add more event handlers.

As result over time these handlers accumulate and reach pretty high
number (millions). Every modification done to the DOM fires event and
XmlDocument calls all these handlers. This significantly slows down all
modifications to the DOM.

To me it looks like a bug. Did I overlook somethning? Any feedback will
be appreciated.
Thank you
Dima

Nov 12 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a

Dima wrote:
Call to XmlNode.GetElementsByTagName returns XmlNodeList that stays in
sync with XmlDocument thanks to events fired by XmlDocument. Once this
list is created there is no way to remove its event handlers from the
document. Calling GetElementsByTagName second time for the same tag
name will create new list and add more event handlers.
If you already know that DOM collections returned by
GetElementsByTagName are "live collections" kept in sync with the
document, why do you then call the method with the same tag name again
in your code? Can't you simply store the result of the first call and
use that collection returned in the rest of your code?
As result over time these handlers accumulate and reach pretty high
number (millions).
Have you run tests that show that even for collections gone out of scope
(e.g. local variables created in a method and not returned by the
method) those event handlers are still fired?

As long as your code keeps using a collection an implementation needs to
keep it in sync.
To me it looks like a bug. Did I overlook somethning? Any feedback will
be appreciated.


It is not clear whether you have a test case where you observe the
performance loss or whether you are just speculating whether there might
be a performance loss due to the need to keep collections in sync.
Do you have code where you experience performance problems?

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Nov 12 '05 #2

P: n/a
Dima,

You might want to read Erik Saltwell's article on GetElementsByTagName:
http://blogs.msdn.com/eriksalt/archi...ByTagName.aspx

Erik is a dev lead for the system.xml team.

--
Stan Kitsis
Program Manager, XML Technologies
Microsoft Corporation

This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Dima" <dm*****@phaseforward.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com...
Call to XmlNode.GetElementsByTagName returns XmlNodeList that stays in
sync with XmlDocument thanks to events fired by XmlDocument. Once this
list is created there is no way to remove its event handlers from the
document. Calling GetElementsByTagName second time for the same tag
name will create new list and add more event handlers.

As result over time these handlers accumulate and reach pretty high
number (millions). Every modification done to the DOM fires event and
XmlDocument calls all these handlers. This significantly slows down all
modifications to the DOM.

To me it looks like a bug. Did I overlook somethning? Any feedback will
be appreciated.
Thank you
Dima

Nov 12 '05 #3

P: n/a

Martin Honnen wrote:
Dima wrote:
Call to XmlNode.GetElementsByTagName returns XmlNodeList that stays in
sync with XmlDocument thanks to events fired by XmlDocument. Once this
list is created there is no way to remove its event handlers from the
document. Calling GetElementsByTagName second time for the same tag
name will create new list and add more event handlers.
If you already know that DOM collections returned by
GetElementsByTagName are "live collections" kept in sync with the
document, why do you then call the method with the same tag name again
in your code? Can't you simply store the result of the first call and
use that collection returned in the rest of your code?


Martin, thank you for your respose!

I use it because it is faster that using xpath. I fixed the problem by
switching to xpath. I could store the result, but I believe DOM
implementation has much better position to store this result: if
collection is live and once created it cannot be easily disposed, why
second call to GetElementsByTagName returns new collection?
As result over time these handlers accumulate and reach pretty high
number (millions).
Have you run tests that show that even for collections gone out of scope
(e.g. local variables created in a method and not returned by the
method) those event handlers are still fired?


All collections I used in my code were local variables. I use C#, so
going out of scope will not free anything. I tried to set collection to
null and it predictably did not unregistered handlers. XmlNodeList is
not IDisposable. What else can I do? I probably could cast it to
XmlElementList (undocumented), get its OnListChanged handler
(undocumented) and unregister it, but so far I am trying to use only
documented features of .NET 1.1.

As long as your code keeps using a collection an implementation needs to
keep it in sync.
I agree, but I use collection once and would like to dispose it, but I
do not see a way to do it.
To me it looks like a bug. Did I overlook somethning? Any feedback will
be appreciated.
It is not clear whether you have a test case where you observe the
performance loss or whether you are just speculating whether there might
be a performance loss due to the need to keep collections in sync.
Do you have code where you experience performance problems?


Yes, I profiled my app with ANTS profiler, and all AppendChild calls
are slow, because of call to XmlDocument.AfterEvent that invokes ~1.2
million handlers. Interestigly performance does not depend much on the
size of the document, but more on how many times GetElementsByTagName
was called.

I would not spend my time on speculation. I hope MS will fix it.

Again, I partially solved my problem by switching to XPath (which is
slower than GetElementsByTagName, according to ANTS profiler), but to
me behaviour of GetElementsByTagName seems simply dangerous. The only
functions that cannot be called twice are ctors and dtors.
GetElementsByTagName does not fit into this category, yet it behaives
that way.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/


Nov 12 '05 #4

P: n/a
Stan, thank you for response!
The posting is very interesting, however I think the performance
problem is not rooted in the conformance to the standard. I think the
real problem with GetElementsByTagName is not live collection and event
handlers themselves, but inability to remove/dispose collection when it
is not need anymore. XmlNodeList is not IDisposable, so once it is
created and handlers are registered, it's forever (for the document
lifetime). Maybe CG will clean it, but it might be too late already.
Second issues is that second call to GetElementsByTagName for the same
name returns new live collection and register more event handleres. In
my experience these 2 factors hurt performance the most, not the live
property of the collection alone.
If these issues will be addressed, performance will imporve and
standard will not be violated.

Nov 12 '05 #5

P: n/a

Dima wrote:
I use it because it is faster that using xpath. I fixed the problem by
switching to xpath.
How do you use XPath, simply with SelectNodes instead of
GetElementsByTagName called one a node in an XmlDocument? Doesn't that
give an XmlNodeList too?
Or have you switched to XPathDocument?
I could store the result, but I believe DOM
implementation has much better position to store this result: if
collection is live and once created it cannot be easily disposed, why
second call to GetElementsByTagName returns new collection?
Well DOM with live collections has been around before .NET and is also a
W3C standard, I am not sure it would fit in with other implementations
or the standard if each call to the method on a certain node with the
same argument would return the same cached object.

For instance the W3C DOM Level 2 Core specification
<http://www.w3.org/TR/DOM-Level-2-Core/core.html#i-Document>
says about getElementsByTagName:
Return Value NodeList
A new NodeList object containing all the matched Elements.

so returning the same object is not what that standard suggests.
All collections I used in my code were local variables. I use C#, so
going out of scope will not free anything. I tried to set collection to
null and it predictably did not unregistered handlers. XmlNodeList is
not IDisposable. What else can I do? I probably could cast it to
XmlElementList (undocumented), get its OnListChanged handler
(undocumented) and unregister it, but so far I am trying to use only
documented features of .NET 1.1. Yes, I profiled my app with ANTS profiler, and all AppendChild calls
are slow, because of call to XmlDocument.AfterEvent that invokes ~1.2
million handlers. Interestigly performance does not depend much on the
size of the document, but more on how many times GetElementsByTagName
was called.


Good that we know details about what you have tested. I will try to look
into this tomorrow.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Nov 12 '05 #6

P: n/a

Martin Honnen wrote:
Dima wrote:
I use it because it is faster that using xpath. I fixed the problem by
switching to xpath.
How do you use XPath, simply with SelectNodes instead of
GetElementsByTagName called one a node in an XmlDocument? Doesn't that
give an XmlNodeList too?
Or have you switched to XPathDocument?


In my tests I used SelectNodes, that returns XmlNodeList, but it does
not register event handlers, so I guess this list is not live
collection, whatever standard says it should be. In real app I use
XPathNavigator created on XmlNode (it is another story why, now I
suspect the root cause is the same), so this approach does not use
XmlNodeList at all.
I could store the result, but I believe DOM
implementation has much better position to store this result: if
collection is live and once created it cannot be easily disposed, why
second call to GetElementsByTagName returns new collection?
Well DOM with live collections has been around before .NET and is also a
W3C standard, I am not sure it would fit in with other implementations
or the standard if each call to the method on a certain node with the
same argument would return the same cached object.

For instance the W3C DOM Level 2 Core specification
<http://www.w3.org/TR/DOM-Level-2-Core/core.html#i-Document>
says about getElementsByTagName:
Return Value NodeList
A new NodeList object containing all the matched Elements.

so returning the same object is not what that standard suggests.


Well, if there are 2 collections and they are live and thus they
contain exactly the same objects, what makes them different? The only
new aspect of second collection is newly wasted memory. But I will not
go deep into the standards.

Not all implementations conform to the standard (see Stan Kitsis post
in this thread), for example Sun Java, and probably for a good reason!

All collections I used in my code were local variables. I use C#, so
going out of scope will not free anything. I tried to set collection to
null and it predictably did not unregistered handlers. XmlNodeList is
not IDisposable. What else can I do? I probably could cast it to
XmlElementList (undocumented), get its OnListChanged handler
(undocumented) and unregister it, but so far I am trying to use only
documented features of .NET 1.1.

Yes, I profiled my app with ANTS profiler, and all AppendChild calls
are slow, because of call to XmlDocument.AfterEvent that invokes ~1.2
million handlers. Interestigly performance does not depend much on the
size of the document, but more on how many times GetElementsByTagName
was called.


Good that we know details about what you have tested. I will try to look
into this tomorrow.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/


Nov 12 '05 #7

P: n/a

Dima wrote:
Not all implementations conform to the standard (see Stan Kitsis post
in this thread), for example Sun Java,


A bit off topic, but as far as I know and test it the DOM implementation
in Sun's Java 1.4 (org.apache.crimson.tree.XmlDocument) and in Sun's
Java 1.5 (com.sun.org.apache.xerces.internal.dom.DocumentIm pl) both give
live NodeLists on getElementsByTagName calls.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Nov 12 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.