VK, Johannes, Thomas, Peter, RobG. Thanks for all the replies !
I think I have a much better understanding now.
What I am actually doing is : I am grabbing the xpath of an element(by
building it bottom-up at the client side) and then storing it on the
server. Then later I am retrieving that xpath and again tracing it back
for the same URL. But I am tracing it back using
XPathAPI.selectSingleNode on the server side, whereas I was building
the xpath on the client side using javascript.
So there was a little bit inconsistency in the way the two 'documents'
are intrepreted by two different parsers. (Jtidy,Xerces on the server
side and javascript on the client side)
I was converting the HTML to XHTML using JTidy and then parsing it
using Xerces at the server side.(So it 'was' XHTML on the server side,
so no TBODYs)
But I guess now I will parse it also on the client side using mozilla's
doc.evaluate (It's ok for me if it runs only on Mozilla)
I hope it works. Will let you know.
Thanks again,
Anupam
VK wrote:
an********@gmail.com wrote: An interesting thing is happening. My table doesnt have 'TBody', but
the elem.parentNode.tagName is returning 'TBody' where elem refers to
the 'tr' tag.
Shouldnt it be returning 'table' ?
The TBODY element is exposed for all tables, even if the table does not
explicitly define a TBODY element. (True for IE at least => 90% of
UA's)
Same for HTML (=> document.documentElement) It is actually an
obligatory element for HTML documents, unlike say <body>.
Is there a way to tell javascript not to be
intelligent? (because I am building an xpath and I want to extract the
exact xpath from the real-world-html-document)
JavaScript has nothing to do with it. The "intelligence" is
demonstarated by browser DOM parser. You seem mixing two very different
issues here: i) the source HTML code representing a page and ii) DOM
tree built on the basis of this source code.
You can retrieve any HTML source by using say responseText from an
ajaxoid and study it line by line. Here it is not important how many
rude mistakes is made in the layout, because it is just plain text for
you.
But with xpath and DOM methods you are dealing with the parsing
*results*, and these results can be far away of what is written in the
code. More poorly written code -> more efforts UA needs to spend to
build some reasonnable DOM tree -> more this DOM tree may differ from
the one intended in the obscured author's mind.
From the other side without DOM tree ready you cannot work with it. So
for xpath you just have to drop the idea to study the source and
concentrate on source parsing results:- bearing in mind that these
results may differ significally from one browser to another.
An alternative solution would be only to write your very own HTML
parser and feed the source into it over responseText.
P.S. It is actually strange that you are worring about such small and
easy to fix issues. I would expect you being nocked by phantom nodes on
tags' borders in W3C-victimized browsers (cannot say "W3C-compliant" in
this particular case). Either you already solved it or did not noticed
yet.