473,406 Members | 2,273 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Calls to GetElementsByTagName affect performance of XML DOM

Call to XmlNode.GetElementsByTagName returns XmlNodeList that stays in
sync with XmlDocument thanks to events fired by XmlDocument. Once this
list is created there is no way to remove its event handlers from the
document. Calling GetElementsByTagName second time for the same tag
name will create new list and add more event handlers.

As result over time these handlers accumulate and reach pretty high
number (millions). Every modification done to the DOM fires event and
XmlDocument calls all these handlers. This significantly slows down all
modifications to the DOM.

To me it looks like a bug. Did I overlook somethning? Any feedback will
be appreciated.
Thank you
Dima

Nov 12 '05 #1
7 7099

Dima wrote:
Call to XmlNode.GetElementsByTagName returns XmlNodeList that stays in
sync with XmlDocument thanks to events fired by XmlDocument. Once this
list is created there is no way to remove its event handlers from the
document. Calling GetElementsByTagName second time for the same tag
name will create new list and add more event handlers.
If you already know that DOM collections returned by
GetElementsByTagName are "live collections" kept in sync with the
document, why do you then call the method with the same tag name again
in your code? Can't you simply store the result of the first call and
use that collection returned in the rest of your code?
As result over time these handlers accumulate and reach pretty high
number (millions).
Have you run tests that show that even for collections gone out of scope
(e.g. local variables created in a method and not returned by the
method) those event handlers are still fired?

As long as your code keeps using a collection an implementation needs to
keep it in sync.
To me it looks like a bug. Did I overlook somethning? Any feedback will
be appreciated.


It is not clear whether you have a test case where you observe the
performance loss or whether you are just speculating whether there might
be a performance loss due to the need to keep collections in sync.
Do you have code where you experience performance problems?

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Nov 12 '05 #2
Dima,

You might want to read Erik Saltwell's article on GetElementsByTagName:
http://blogs.msdn.com/eriksalt/archi...ByTagName.aspx

Erik is a dev lead for the system.xml team.

--
Stan Kitsis
Program Manager, XML Technologies
Microsoft Corporation

This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Dima" <dm*****@phaseforward.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com...
Call to XmlNode.GetElementsByTagName returns XmlNodeList that stays in
sync with XmlDocument thanks to events fired by XmlDocument. Once this
list is created there is no way to remove its event handlers from the
document. Calling GetElementsByTagName second time for the same tag
name will create new list and add more event handlers.

As result over time these handlers accumulate and reach pretty high
number (millions). Every modification done to the DOM fires event and
XmlDocument calls all these handlers. This significantly slows down all
modifications to the DOM.

To me it looks like a bug. Did I overlook somethning? Any feedback will
be appreciated.
Thank you
Dima

Nov 12 '05 #3

Martin Honnen wrote:
Dima wrote:
Call to XmlNode.GetElementsByTagName returns XmlNodeList that stays in
sync with XmlDocument thanks to events fired by XmlDocument. Once this
list is created there is no way to remove its event handlers from the
document. Calling GetElementsByTagName second time for the same tag
name will create new list and add more event handlers.
If you already know that DOM collections returned by
GetElementsByTagName are "live collections" kept in sync with the
document, why do you then call the method with the same tag name again
in your code? Can't you simply store the result of the first call and
use that collection returned in the rest of your code?


Martin, thank you for your respose!

I use it because it is faster that using xpath. I fixed the problem by
switching to xpath. I could store the result, but I believe DOM
implementation has much better position to store this result: if
collection is live and once created it cannot be easily disposed, why
second call to GetElementsByTagName returns new collection?
As result over time these handlers accumulate and reach pretty high
number (millions).
Have you run tests that show that even for collections gone out of scope
(e.g. local variables created in a method and not returned by the
method) those event handlers are still fired?


All collections I used in my code were local variables. I use C#, so
going out of scope will not free anything. I tried to set collection to
null and it predictably did not unregistered handlers. XmlNodeList is
not IDisposable. What else can I do? I probably could cast it to
XmlElementList (undocumented), get its OnListChanged handler
(undocumented) and unregister it, but so far I am trying to use only
documented features of .NET 1.1.

As long as your code keeps using a collection an implementation needs to
keep it in sync.
I agree, but I use collection once and would like to dispose it, but I
do not see a way to do it.
To me it looks like a bug. Did I overlook somethning? Any feedback will
be appreciated.
It is not clear whether you have a test case where you observe the
performance loss or whether you are just speculating whether there might
be a performance loss due to the need to keep collections in sync.
Do you have code where you experience performance problems?


Yes, I profiled my app with ANTS profiler, and all AppendChild calls
are slow, because of call to XmlDocument.AfterEvent that invokes ~1.2
million handlers. Interestigly performance does not depend much on the
size of the document, but more on how many times GetElementsByTagName
was called.

I would not spend my time on speculation. I hope MS will fix it.

Again, I partially solved my problem by switching to XPath (which is
slower than GetElementsByTagName, according to ANTS profiler), but to
me behaviour of GetElementsByTagName seems simply dangerous. The only
functions that cannot be called twice are ctors and dtors.
GetElementsByTagName does not fit into this category, yet it behaives
that way.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/


Nov 12 '05 #4
Stan, thank you for response!
The posting is very interesting, however I think the performance
problem is not rooted in the conformance to the standard. I think the
real problem with GetElementsByTagName is not live collection and event
handlers themselves, but inability to remove/dispose collection when it
is not need anymore. XmlNodeList is not IDisposable, so once it is
created and handlers are registered, it's forever (for the document
lifetime). Maybe CG will clean it, but it might be too late already.
Second issues is that second call to GetElementsByTagName for the same
name returns new live collection and register more event handleres. In
my experience these 2 factors hurt performance the most, not the live
property of the collection alone.
If these issues will be addressed, performance will imporve and
standard will not be violated.

Nov 12 '05 #5

Dima wrote:
I use it because it is faster that using xpath. I fixed the problem by
switching to xpath.
How do you use XPath, simply with SelectNodes instead of
GetElementsByTagName called one a node in an XmlDocument? Doesn't that
give an XmlNodeList too?
Or have you switched to XPathDocument?
I could store the result, but I believe DOM
implementation has much better position to store this result: if
collection is live and once created it cannot be easily disposed, why
second call to GetElementsByTagName returns new collection?
Well DOM with live collections has been around before .NET and is also a
W3C standard, I am not sure it would fit in with other implementations
or the standard if each call to the method on a certain node with the
same argument would return the same cached object.

For instance the W3C DOM Level 2 Core specification
<http://www.w3.org/TR/DOM-Level-2-Core/core.html#i-Document>
says about getElementsByTagName:
Return Value NodeList
A new NodeList object containing all the matched Elements.

so returning the same object is not what that standard suggests.
All collections I used in my code were local variables. I use C#, so
going out of scope will not free anything. I tried to set collection to
null and it predictably did not unregistered handlers. XmlNodeList is
not IDisposable. What else can I do? I probably could cast it to
XmlElementList (undocumented), get its OnListChanged handler
(undocumented) and unregister it, but so far I am trying to use only
documented features of .NET 1.1. Yes, I profiled my app with ANTS profiler, and all AppendChild calls
are slow, because of call to XmlDocument.AfterEvent that invokes ~1.2
million handlers. Interestigly performance does not depend much on the
size of the document, but more on how many times GetElementsByTagName
was called.


Good that we know details about what you have tested. I will try to look
into this tomorrow.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Nov 12 '05 #6

Martin Honnen wrote:
Dima wrote:
I use it because it is faster that using xpath. I fixed the problem by
switching to xpath.
How do you use XPath, simply with SelectNodes instead of
GetElementsByTagName called one a node in an XmlDocument? Doesn't that
give an XmlNodeList too?
Or have you switched to XPathDocument?


In my tests I used SelectNodes, that returns XmlNodeList, but it does
not register event handlers, so I guess this list is not live
collection, whatever standard says it should be. In real app I use
XPathNavigator created on XmlNode (it is another story why, now I
suspect the root cause is the same), so this approach does not use
XmlNodeList at all.
I could store the result, but I believe DOM
implementation has much better position to store this result: if
collection is live and once created it cannot be easily disposed, why
second call to GetElementsByTagName returns new collection?
Well DOM with live collections has been around before .NET and is also a
W3C standard, I am not sure it would fit in with other implementations
or the standard if each call to the method on a certain node with the
same argument would return the same cached object.

For instance the W3C DOM Level 2 Core specification
<http://www.w3.org/TR/DOM-Level-2-Core/core.html#i-Document>
says about getElementsByTagName:
Return Value NodeList
A new NodeList object containing all the matched Elements.

so returning the same object is not what that standard suggests.


Well, if there are 2 collections and they are live and thus they
contain exactly the same objects, what makes them different? The only
new aspect of second collection is newly wasted memory. But I will not
go deep into the standards.

Not all implementations conform to the standard (see Stan Kitsis post
in this thread), for example Sun Java, and probably for a good reason!

All collections I used in my code were local variables. I use C#, so
going out of scope will not free anything. I tried to set collection to
null and it predictably did not unregistered handlers. XmlNodeList is
not IDisposable. What else can I do? I probably could cast it to
XmlElementList (undocumented), get its OnListChanged handler
(undocumented) and unregister it, but so far I am trying to use only
documented features of .NET 1.1.

Yes, I profiled my app with ANTS profiler, and all AppendChild calls
are slow, because of call to XmlDocument.AfterEvent that invokes ~1.2
million handlers. Interestigly performance does not depend much on the
size of the document, but more on how many times GetElementsByTagName
was called.


Good that we know details about what you have tested. I will try to look
into this tomorrow.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/


Nov 12 '05 #7

Dima wrote:
Not all implementations conform to the standard (see Stan Kitsis post
in this thread), for example Sun Java,


A bit off topic, but as far as I know and test it the DOM implementation
in Sun's Java 1.4 (org.apache.crimson.tree.XmlDocument) and in Sun's
Java 1.5 (com.sun.org.apache.xerces.internal.dom.DocumentIm pl) both give
live NodeLists on getElementsByTagName calls.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Nov 12 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: mmike | last post by:
How does the ASP.NET output cache affect an httpS:// (SSL) connection? Suppose I have a default.aspx page that has the following: <%@OutputCache Duration="3600" VaryByParam="None" %> ...
1
by: Robin Tucker | last post by:
I'm considering adding domain integrity checks to some of my database table items. How does adding such constraints affect SQL Server performance? For example, I have a simple constraint that...
1
by: Peter Bär | last post by:
A Question to the C#/.Net Gods of this forum: are there performance penalties when i compile (C#, FW1.1, ASP.NET, Studio2003) a central baseclass in a different assembly than all the derived...
2
by: Peter Bär | last post by:
A Question to the C#/.Net Gods of this forum: are there performance penalties when i compile (C#, FW1.1, ASP.NET, Studio2003) a central baseclass in a different assembly than all the derived...
11
by: Raja Chandrasekaran | last post by:
Hai folks, I have a question to get exact answer from you people. My question is How Static class is differ from instance class and If you use static class in ASP.NET, ll it affect speed or...
6
by: Dasn | last post by:
Hi, there. 'lines' is a large list of strings each of which is seperated by '\t' I wanna split each string into a list. For speed, using map() instead of 'for' loop. 'map(str.split, lines)'...
16
by: John | last post by:
Does the length of my C variable names have any affect, performance-wise, on my final executable program? I mean, once compiled, etc., is there any difference between these two: number = 3; n =...
11
by: BillGatesFan | last post by:
I have a web service which calls a .NET queued serviced component in COM+. I turned statistics on for the component. I call the component 10 times, 10 objects get created but they do not go away....
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.