Ian Collins wrote:
Duncan Booth wrote: Interestingly the author of Sarissa (Manos Batsis) found that using
document.createElement was much slower than using innerHTML, so he
serializes the XML DOM node and feeds that to innerHTML.
I can see why, if you call document.createElement() in IE, the resulting
element has 79 attributes with a value of 'null'. Must be creating an
attribute node for each of these...
Not quite. When you create an element in IE it doesn't create all those
empty attributes until you access the 'attributes' collection. If you stick
to getAttribute/setAttribute and avoid accessing 'attributes' you can get
big performance improvements.
Kupu used to do content filtering by iterating over the attributes
collection on each node when you saved, and only keeping those attributes
which weren't banned and were non-null. I changed it to have a whitelist of
valid attributes for each tagname, and testing each of those attributes for
existence using getAttribute(). I expected a speedup of maybe 10-fold since
the whitelist meant it was only processing about 10th as many attributes,
but in fact it was closer to 100 times faster, I think because IE was no
longer creating the attributes object at all.
What I think it does show is that although IE exposes a DOM interface,
internally it does everything with totally different data structures. There
are plenty of other indications of this: create an HTML page with a <base>
tag followed by the <body> tag, then have a look at the base element's
firstChild and nextSibling: they will both be the body element.
Another fun one is the sequence <font><p>text</font></p>: the <p> element
is both firstChild and nextSibling of the <font> element. You might not
think it matters (after all the HTML was invalid in the first place), but
if you paste into a contentEditable area some text copied from Microsoft
Works (or some other MS applications), the rich text on the clipboard gets
converted to exactly that 'HTML' by IE, so an editor based on
contentEditable has to be able to handle (and ideally clean-up) such
situations.
Both of these indicate that the internal structure used by IE is pretty
close to the unparsed HTML and the DOM nodes are simply providing a view
onto the much less structured HTML.