<a href="https://bytes.com/topic/javascript/answers/836697-xpath-web-page-javascript">XPath on web page with JavaScript

kaer wrote:

I have to send an XPath request on web page with JavaScript (with
XMLHttpRequest) that has to be executed before. I have no idea on how
to do that.

I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.

<http://jibbering.com/faq/>
<http://catb.org/~esr/faqs/smart-questions.html>
<http://developer.mozilla.org/en/docs/XPath>
<http://developer.mozilla.org/en/docs/AJAX>
PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f8*******************@news.demon.co.uk>

Sep 9 '08 #2

kaer

On 9 sep, 22:08, Thomas 'PointedEars' Lahn <PointedE...@web.dewrote:

>
I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.

I would like to be able to do something like:

import AbstractBrowser
browser=AbstractBrowser.Browser()
browser.goto('www.somesitewithajax.com', wait_until_onload_done=True)
what_i_want= browser.XPath('/html/body/center/div/div[2]/div/div/div/
div[3]/div[2]/div[2]/div'])

This is pseudo-code (could be python but this is not important) on
pseudo-library (on whatever language, an application could do the job
as well) just to show what I want to do and what I am looking for.

Sep 9 '08 #3

kaer wrote:

On 9 sep, 22:08, Thomas 'PointedEars' Lahn <PointedE...@web.dewrote:
>I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.

I would like to be able to do something like:

import AbstractBrowser
browser=AbstractBrowser.Browser()
browser.goto('www.somesitewithajax.com', wait_until_onload_done=True)
what_i_want= browser.XPath('/html/body/center/div/div[2]/div/div/div/
div[3]/div[2]/div[2]/div'])

This is pseudo-code (could be python but this is not important) on
pseudo-library (on whatever language, an application could do the job
as well) just to show what I want to do and what I am looking for.

For executing an ECMAScript program that makes use of an XHR implementation,
you need an environment which supports that.

With the Gecko DOM API, you can do:

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
iframe.addEventListener("load",
function() {
var d = this.contentDocument;
var what_you_want =
d.evaluate('/html/body/...', d.documentElement, null, 0, null);
}, false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

If `www.somesitewithajax.com' is not the same domain as the domain of the
URI of the accessing document resource, or the protocols or ports differ,
you will need HTTP proxying that fetches the content, because the SOP will
prevent access to the iframe document otherwise.
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee

Sep 9 '08 #4

Thomas 'PointedEars' Lahn wrote:

kaer wrote:
>On 9 sep, 22:08, Thomas 'PointedEars' Lahn <PointedE...@web.dewrote:
>>I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.
I would like to be able to do something like:

import AbstractBrowser
browser=AbstractBrowser.Browser()
browser.goto('www.somesitewithajax.com', wait_until_onload_done=True)
what_i_want= browser.XPath('/html/body/center/div/div[2]/div/div/div/
div[3]/div[2]/div[2]/div'])
[...]

[...]
With the Gecko DOM API, you can do:

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
iframe.addEventListener("load",
function() {
var d = this.contentDocument;
var what_you_want =
d.evaluate('/html/body/...', d.documentElement, null, 0, null);
}, false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

Reviewing this, it is not going to work this way.

1. Loading the iframe means nothing about loading the iframe document.

2. The context node must be `d', not `d.documentElement', for `/html'
to work.

3. Loading the iframe document means nothing about the *A*JAX code to be
done modifying the document tree, for the very point of it is that it
is *asynchronous*.

While having the evaluation code be executed through window.setTimeout()
is a possibility, the reliable but non-trivial way would be to tap into
the `onreadystatechange' listener of the XHR object. The issue then is
to find the name of the property that refers to that object.

So it would be better but yet to be improved if you used

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
var d = iframe.contentDocument;
d.addEventListener("load",
function() {
var t = window.setTimeout(
function() {
window.clearTimeout(t);

var what_you_want =
d.evaluate('/html/body/...', d, null, 0, null);
},
1000);
},
false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

And then there is still the issue of frame-breaking scripts running on that
site.
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee

Sep 10 '08 #5

kaer

On 10 sep, 08:44, Thomas 'PointedEars' Lahn <PointedE...@web.de>
wrote:

Thomas 'PointedEars' Lahn wrote:
kaer wrote:
On 9 sep, 22:08, Thomas 'PointedEars' Lahn <PointedE...@web.dewrote:
I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.
I would like to be able to do something like:

import AbstractBrowser
browser=AbstractBrowser.Browser()
browser.goto('www.somesitewithajax.com', wait_until_onload_done=True)
what_i_want= browser.XPath('/html/body/center/div/div[2]/div/div/div/
div[3]/div[2]/div[2]/div'])
[...]

[...]
With the Gecko DOM API, you can do:

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
iframe.addEventListener("load",
function() {
var d = this.contentDocument;
var what_you_want =
d.evaluate('/html/body/...', d.documentElement, null, 0, null);
}, false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

Reviewing this, it is not going to work this way.

1. Loading the iframe means nothing about loading the iframe document.

2. The context node must be `d', not `d.documentElement', for `/html'
to work.

3. Loading the iframe document means nothing about the *A*JAX code to be
done modifying the document tree, for the very point of it is that it
is *asynchronous*.

While having the evaluation code be executed through window.setTimeout()
is a possibility, the reliable but non-trivial way would be to tap into
the `onreadystatechange' listener of the XHR object. The issue then is
to find the name of the property that refers to that object.

So it would be better but yet to be improved if you used

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
var d = iframe.contentDocument;
d.addEventListener("load",
function() {
var t = window.setTimeout(
function() {
window.clearTimeout(t);

var what_you_want =
d.evaluate('/html/body/...', d, null, 0, null);
},
1000);
},
false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

And then there is still the issue of frame-breaking scripts running on that
site.

PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee

Many thanks for that, I will try to go deeper inside those stuffs as
soon as I can. I have a lot to learn but very interesting anyway.

I wonder though that there is no a libraries or applications doing
that. If you think about it, what I need is just a browser without the
display stuff BUT with the ability to call functions giving back the
document tree in its actuel state.

Thanks again.

Sep 10 '08 #6

mozilla and XPath Expressions

kaer wrote:

Thomas 'PointedEars' Lahn wrote:
>[...]
So it would be better but yet to be improved if you used

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
var d = iframe.contentDocument;
d.addEventListener("load",
function() {
var t = window.setTimeout(
function() {
window.clearTimeout(t);

var what_you_want =
d.evaluate('/html/body/...', d, null, 0, null);
},
1000);
},
false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

And then there is still the issue of frame-breaking scripts running on that
site.

This can be worked around if one does not use an iframe but a popup
window/tab. It would move the issue from the target (Web site) to the
source (client), though, where there may be popup blockers.

>[...]

Many thanks for that, I will try to go deeper inside those stuffs as
soon as I can. I have a lot to learn but very interesting anyway.

You are welcome.

I wonder though that there is no a libraries or applications doing
that.

It may be because (white-hat) hackers would not support the idea of using
content that they did not create without permission; generally, this is a
copyright/author's rights issue. (IANAL; you have been warned.)

If you think about it, what I need is just a browser without the
display stuff BUT with the ability to call functions giving back the
document tree in its actuel state.

It would appear that those specifications are mutually exclusive. You can
certainly parse the `responseText' into a Document object, however ISTM you
need the "display stuff" for the tree-manipulating script code to be
executed in the context of the represented document, generally.

While I can think of a hack that evaluates all script code in the document
this way, it remains to be seen how adaptive and cross-browser such a
solution would be. For example, if the XHR code used the value of the
`offsetWidth' property to determine whether or not an element should be
created or modified in, or removed from the document tree, would that value
make sense when the element in question is not displayed?

Please trim your quotes and do not quote signatures.
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee

Sep 10 '08 #7

Aaron Gray

"kaer" <ka*******@gmail.comwrote in message
news:e3**********************************@e53g2000 hsa.googlegroups.com...

I wonder though that there is no a libraries or applications doing
that. If you think about it, what I need is just a browser without the
display stuff BUT with the ability to call functions giving back the
document tree in its actuel state.

I did not say this but have a look at jQuery it may help you, but there is
no substitute for knowing things from the ground up. jQuery has an XPath
plug in, not sure how good it is or how well documented it is.

Try Googling for "jQuery" and "jQuery XPath".

Good luck,

Aaron

Sep 10 '08 #8

Similar topics

by: Neil Zanella | last post by:

Hello, I would like to know whether the mozilla web browser has built in support for searching XML documents via XPath expressions as with IE's xmlobject's and xmlDoc's function selectNodes() or...

Xpath: select namespace

by: Tjerk Wolterink | last post by:

IU have the following xsl root element: <xsl:stylesheet version="1.0" xmlns="http://www.w3.org/1999/xhtml" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"...

System.Xml.XPath.XPathException

by: laks | last post by:

Hi I have the following xsl stmt. <xsl:for-each select="JOB_POSTINGS/JOB_POSTING \"> <xsl:sort select="JOB_TITLE" order="ascending"/> This works fine when I use it. But when using multiple...

XmlDocument and Xpath

by: Gnic | last post by:

Hi , I have an XmlDocument instance, I want to find a node in the xml, but I don't know it's path until runtime, for example <aaa> <bbb name="x"/> <aaa attr="y"> <ccc>sometext</ccc> </aaa>

C# / C Sharp

jQuery and XPath

by: Gale | last post by:

I'm working on something in jQuery with XPath What I want to do is: if checkbox is checked, set background color od label that contain input(checkbox) to red I have this code:...

compound doc, using html, svg, JavaScript and XPath (XSLT)

by: Greg | last post by:

Hi, I want to create a web based interface that uses a form + Javascript (in an XHTML namespace) to construct an XPath to query and modify the attributes of some SVG (in an SVG namespace). ...

Clipping a remote webpage with Javascript/XPath and including in a "local" webpage

by: soren625 | last post by:

I have searched this (and other) groups extensively, in addition to the clj FAQ and the Web, and (to my surprise) this question doesn't come up as often as I thought it would. Maybe this is because...

Search DOM elements with XPath (getElementById too slow)

by: Claudio Calboni | last post by:

Hello folks, I'm having some performance issues with the client-side part of my application. Basically, it renders a huge HTML table (about 20'000 cells in my testing scenario), without content....

Using XPATH in Mozilla

by: newToAjax | last post by:

I have created an ajax application which retrievs an xml file and fills in the tab fields on the form.The code works fine in IE while its does not in Mozilla. Can you please let me know if i have to...