By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,627 Members | 1,870 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,627 IT Pros & Developers. It's quick & easy.

XPath on web page with JavaScript

P: n/a
I have to send an XPath request on web page with JavaScript (with
XMLHttpRequest) that has to be executed before. I have no idea on how
to do that. Any pointer is welcome.
Sep 9 '08 #1
Share this Question
Share on Google+
7 Replies


P: n/a
kaer wrote:
I have to send an XPath request on web page with JavaScript (with
XMLHttpRequest) that has to be executed before. I have no idea on how
to do that.
I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.

<http://jibbering.com/faq/>
<http://catb.org/~esr/faqs/smart-questions.html>
<http://developer.mozilla.org/en/docs/XPath>
<http://developer.mozilla.org/en/docs/AJAX>
PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f8*******************@news.demon.co.uk>
Sep 9 '08 #2

P: n/a
On 9 sep, 22:08, Thomas 'PointedEars' Lahn <PointedE...@web.dewrote:
>
I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.
I would like to be able to do something like:

import AbstractBrowser
browser=AbstractBrowser.Browser()
browser.goto('www.somesitewithajax.com', wait_until_onload_done=True)
what_i_want= browser.XPath('/html/body/center/div/div[2]/div/div/div/
div[3]/div[2]/div[2]/div'])

This is pseudo-code (could be python but this is not important) on
pseudo-library (on whatever language, an application could do the job
as well) just to show what I want to do and what I am looking for.
Sep 9 '08 #3

P: n/a
kaer wrote:
On 9 sep, 22:08, Thomas 'PointedEars' Lahn <PointedE...@web.dewrote:
>I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.

I would like to be able to do something like:

import AbstractBrowser
browser=AbstractBrowser.Browser()
browser.goto('www.somesitewithajax.com', wait_until_onload_done=True)
what_i_want= browser.XPath('/html/body/center/div/div[2]/div/div/div/
div[3]/div[2]/div[2]/div'])

This is pseudo-code (could be python but this is not important) on
pseudo-library (on whatever language, an application could do the job
as well) just to show what I want to do and what I am looking for.
For executing an ECMAScript program that makes use of an XHR implementation,
you need an environment which supports that.

With the Gecko DOM API, you can do:

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
iframe.addEventListener("load",
function() {
var d = this.contentDocument;
var what_you_want =
d.evaluate('/html/body/...', d.documentElement, null, 0, null);
}, false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

If `www.somesitewithajax.com' is not the same domain as the domain of the
URI of the accessing document resource, or the protocols or ports differ,
you will need HTTP proxying that fetches the content, because the SOP will
prevent access to the iframe document otherwise.
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Sep 9 '08 #4

P: n/a
Thomas 'PointedEars' Lahn wrote:
kaer wrote:
>On 9 sep, 22:08, Thomas 'PointedEars' Lahn <PointedE...@web.dewrote:
>>I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.
I would like to be able to do something like:

import AbstractBrowser
browser=AbstractBrowser.Browser()
browser.goto('www.somesitewithajax.com', wait_until_onload_done=True)
what_i_want= browser.XPath('/html/body/center/div/div[2]/div/div/div/
div[3]/div[2]/div[2]/div'])
[...]

[...]
With the Gecko DOM API, you can do:

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
iframe.addEventListener("load",
function() {
var d = this.contentDocument;
var what_you_want =
d.evaluate('/html/body/...', d.documentElement, null, 0, null);
}, false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}
Reviewing this, it is not going to work this way.

1. Loading the iframe means nothing about loading the iframe document.

2. The context node must be `d', not `d.documentElement', for `/html'
to work.

3. Loading the iframe document means nothing about the *A*JAX code to be
done modifying the document tree, for the very point of it is that it
is *asynchronous*.

While having the evaluation code be executed through window.setTimeout()
is a possibility, the reliable but non-trivial way would be to tap into
the `onreadystatechange' listener of the XHR object. The issue then is
to find the name of the property that refers to that object.

So it would be better but yet to be improved if you used

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
var d = iframe.contentDocument;
d.addEventListener("load",
function() {
var t = window.setTimeout(
function() {
window.clearTimeout(t);

var what_you_want =
d.evaluate('/html/body/...', d, null, 0, null);
},
1000);
},
false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

And then there is still the issue of frame-breaking scripts running on that
site.
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Sep 10 '08 #5

P: n/a
On 10 sep, 08:44, Thomas 'PointedEars' Lahn <PointedE...@web.de>
wrote:
Thomas 'PointedEars' Lahn wrote:
kaer wrote:
On 9 sep, 22:08, Thomas 'PointedEars' Lahn <PointedE...@web.dewrote:
I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.
I would like to be able to do something like:
import AbstractBrowser
browser=AbstractBrowser.Browser()
browser.goto('www.somesitewithajax.com', wait_until_onload_done=True)
what_i_want= browser.XPath('/html/body/center/div/div[2]/div/div/div/
div[3]/div[2]/div[2]/div'])
[...]
[...]
With the Gecko DOM API, you can do:
var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
iframe.addEventListener("load",
function() {
var d = this.contentDocument;
var what_you_want =
d.evaluate('/html/body/...', d.documentElement, null, 0, null);
}, false);
iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

Reviewing this, it is not going to work this way.

1. Loading the iframe means nothing about loading the iframe document.

2. The context node must be `d', not `d.documentElement', for `/html'
to work.

3. Loading the iframe document means nothing about the *A*JAX code to be
done modifying the document tree, for the very point of it is that it
is *asynchronous*.

While having the evaluation code be executed through window.setTimeout()
is a possibility, the reliable but non-trivial way would be to tap into
the `onreadystatechange' listener of the XHR object. The issue then is
to find the name of the property that refers to that object.

So it would be better but yet to be improved if you used

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
var d = iframe.contentDocument;
d.addEventListener("load",
function() {
var t = window.setTimeout(
function() {
window.clearTimeout(t);

var what_you_want =
d.evaluate('/html/body/...', d, null, 0, null);
},
1000);
},
false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

And then there is still the issue of frame-breaking scripts running on that
site.

PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Many thanks for that, I will try to go deeper inside those stuffs as
soon as I can. I have a lot to learn but very interesting anyway.

I wonder though that there is no a libraries or applications doing
that. If you think about it, what I need is just a browser without the
display stuff BUT with the ability to call functions giving back the
document tree in its actuel state.

Thanks again.
Sep 10 '08 #6

P: n/a
kaer wrote:
Thomas 'PointedEars' Lahn wrote:
>[...]
So it would be better but yet to be improved if you used

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
var d = iframe.contentDocument;
d.addEventListener("load",
function() {
var t = window.setTimeout(
function() {
window.clearTimeout(t);

var what_you_want =
d.evaluate('/html/body/...', d, null, 0, null);
},
1000);
},
false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

And then there is still the issue of frame-breaking scripts running on that
site.
This can be worked around if one does not use an iframe but a popup
window/tab. It would move the issue from the target (Web site) to the
source (client), though, where there may be popup blockers.
>[...]

Many thanks for that, I will try to go deeper inside those stuffs as
soon as I can. I have a lot to learn but very interesting anyway.
You are welcome.
I wonder though that there is no a libraries or applications doing
that.
It may be because (white-hat) hackers would not support the idea of using
content that they did not create without permission; generally, this is a
copyright/author's rights issue. (IANAL; you have been warned.)
If you think about it, what I need is just a browser without the
display stuff BUT with the ability to call functions giving back the
document tree in its actuel state.
It would appear that those specifications are mutually exclusive. You can
certainly parse the `responseText' into a Document object, however ISTM you
need the "display stuff" for the tree-manipulating script code to be
executed in the context of the represented document, generally.

While I can think of a hack that evaluates all script code in the document
this way, it remains to be seen how adaptive and cross-browser such a
solution would be. For example, if the XHR code used the value of the
`offsetWidth' property to determine whether or not an element should be
created or modified in, or removed from the document tree, would that value
make sense when the element in question is not displayed?

Please trim your quotes and do not quote signatures.
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Sep 10 '08 #7

P: n/a
"kaer" <ka*******@gmail.comwrote in message
news:e3**********************************@e53g2000 hsa.googlegroups.com...
I wonder though that there is no a libraries or applications doing
that. If you think about it, what I need is just a browser without the
display stuff BUT with the ability to call functions giving back the
document tree in its actuel state.
I did not say this but have a look at jQuery it may help you, but there is
no substitute for knowing things from the ground up. jQuery has an XPath
plug in, not sure how good it is or how well documented it is.

Try Googling for "jQuery" and "jQuery XPath".

Good luck,

Aaron
Sep 10 '08 #8

This discussion thread is closed

Replies have been disabled for this discussion.