473,394 Members | 1,715 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

XPath on web page with JavaScript

I have to send an XPath request on web page with JavaScript (with
XMLHttpRequest) that has to be executed before. I have no idea on how
to do that. Any pointer is welcome.
Sep 9 '08 #1
7 2373
kaer wrote:
I have to send an XPath request on web page with JavaScript (with
XMLHttpRequest) that has to be executed before. I have no idea on how
to do that.
I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.

<http://jibbering.com/faq/>
<http://catb.org/~esr/faqs/smart-questions.html>
<http://developer.mozilla.org/en/docs/XPath>
<http://developer.mozilla.org/en/docs/AJAX>
PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f8*******************@news.demon.co.uk>
Sep 9 '08 #2
On 9 sep, 22:08, Thomas 'PointedEars' Lahn <PointedE...@web.dewrote:
>
I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.
I would like to be able to do something like:

import AbstractBrowser
browser=AbstractBrowser.Browser()
browser.goto('www.somesitewithajax.com', wait_until_onload_done=True)
what_i_want= browser.XPath('/html/body/center/div/div[2]/div/div/div/
div[3]/div[2]/div[2]/div'])

This is pseudo-code (could be python but this is not important) on
pseudo-library (on whatever language, an application could do the job
as well) just to show what I want to do and what I am looking for.
Sep 9 '08 #3
kaer wrote:
On 9 sep, 22:08, Thomas 'PointedEars' Lahn <PointedE...@web.dewrote:
>I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.

I would like to be able to do something like:

import AbstractBrowser
browser=AbstractBrowser.Browser()
browser.goto('www.somesitewithajax.com', wait_until_onload_done=True)
what_i_want= browser.XPath('/html/body/center/div/div[2]/div/div/div/
div[3]/div[2]/div[2]/div'])

This is pseudo-code (could be python but this is not important) on
pseudo-library (on whatever language, an application could do the job
as well) just to show what I want to do and what I am looking for.
For executing an ECMAScript program that makes use of an XHR implementation,
you need an environment which supports that.

With the Gecko DOM API, you can do:

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
iframe.addEventListener("load",
function() {
var d = this.contentDocument;
var what_you_want =
d.evaluate('/html/body/...', d.documentElement, null, 0, null);
}, false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

If `www.somesitewithajax.com' is not the same domain as the domain of the
URI of the accessing document resource, or the protocols or ports differ,
you will need HTTP proxying that fetches the content, because the SOP will
prevent access to the iframe document otherwise.
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Sep 9 '08 #4
Thomas 'PointedEars' Lahn wrote:
kaer wrote:
>On 9 sep, 22:08, Thomas 'PointedEars' Lahn <PointedE...@web.dewrote:
>>I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.
I would like to be able to do something like:

import AbstractBrowser
browser=AbstractBrowser.Browser()
browser.goto('www.somesitewithajax.com', wait_until_onload_done=True)
what_i_want= browser.XPath('/html/body/center/div/div[2]/div/div/div/
div[3]/div[2]/div[2]/div'])
[...]

[...]
With the Gecko DOM API, you can do:

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
iframe.addEventListener("load",
function() {
var d = this.contentDocument;
var what_you_want =
d.evaluate('/html/body/...', d.documentElement, null, 0, null);
}, false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}
Reviewing this, it is not going to work this way.

1. Loading the iframe means nothing about loading the iframe document.

2. The context node must be `d', not `d.documentElement', for `/html'
to work.

3. Loading the iframe document means nothing about the *A*JAX code to be
done modifying the document tree, for the very point of it is that it
is *asynchronous*.

While having the evaluation code be executed through window.setTimeout()
is a possibility, the reliable but non-trivial way would be to tap into
the `onreadystatechange' listener of the XHR object. The issue then is
to find the name of the property that refers to that object.

So it would be better but yet to be improved if you used

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
var d = iframe.contentDocument;
d.addEventListener("load",
function() {
var t = window.setTimeout(
function() {
window.clearTimeout(t);

var what_you_want =
d.evaluate('/html/body/...', d, null, 0, null);
},
1000);
},
false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

And then there is still the issue of frame-breaking scripts running on that
site.
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Sep 10 '08 #5
On 10 sep, 08:44, Thomas 'PointedEars' Lahn <PointedE...@web.de>
wrote:
Thomas 'PointedEars' Lahn wrote:
kaer wrote:
On 9 sep, 22:08, Thomas 'PointedEars' Lahn <PointedE...@web.dewrote:
I have no idea what you mean, and I do know my way around XPath and
XMLHttpRequest.
I would like to be able to do something like:
import AbstractBrowser
browser=AbstractBrowser.Browser()
browser.goto('www.somesitewithajax.com', wait_until_onload_done=True)
what_i_want= browser.XPath('/html/body/center/div/div[2]/div/div/div/
div[3]/div[2]/div[2]/div'])
[...]
[...]
With the Gecko DOM API, you can do:
var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
iframe.addEventListener("load",
function() {
var d = this.contentDocument;
var what_you_want =
d.evaluate('/html/body/...', d.documentElement, null, 0, null);
}, false);
iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

Reviewing this, it is not going to work this way.

1. Loading the iframe means nothing about loading the iframe document.

2. The context node must be `d', not `d.documentElement', for `/html'
to work.

3. Loading the iframe document means nothing about the *A*JAX code to be
done modifying the document tree, for the very point of it is that it
is *asynchronous*.

While having the evaluation code be executed through window.setTimeout()
is a possibility, the reliable but non-trivial way would be to tap into
the `onreadystatechange' listener of the XHR object. The issue then is
to find the name of the property that refers to that object.

So it would be better but yet to be improved if you used

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
var d = iframe.contentDocument;
d.addEventListener("load",
function() {
var t = window.setTimeout(
function() {
window.clearTimeout(t);

var what_you_want =
d.evaluate('/html/body/...', d, null, 0, null);
},
1000);
},
false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

And then there is still the issue of frame-breaking scripts running on that
site.

PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Many thanks for that, I will try to go deeper inside those stuffs as
soon as I can. I have a lot to learn but very interesting anyway.

I wonder though that there is no a libraries or applications doing
that. If you think about it, what I need is just a browser without the
display stuff BUT with the ability to call functions giving back the
document tree in its actuel state.

Thanks again.
Sep 10 '08 #6
kaer wrote:
Thomas 'PointedEars' Lahn wrote:
>[...]
So it would be better but yet to be improved if you used

var iframe = document.body.appendChild(document.createElement(" iframe"));
if (iframe)
{
var d = iframe.contentDocument;
d.addEventListener("load",
function() {
var t = window.setTimeout(
function() {
window.clearTimeout(t);

var what_you_want =
d.evaluate('/html/body/...', d, null, 0, null);
},
1000);
},
false);

iframe.contentWindow.location = "http://www.somesitewithajax.com/";
}

And then there is still the issue of frame-breaking scripts running on that
site.
This can be worked around if one does not use an iframe but a popup
window/tab. It would move the issue from the target (Web site) to the
source (client), though, where there may be popup blockers.
>[...]

Many thanks for that, I will try to go deeper inside those stuffs as
soon as I can. I have a lot to learn but very interesting anyway.
You are welcome.
I wonder though that there is no a libraries or applications doing
that.
It may be because (white-hat) hackers would not support the idea of using
content that they did not create without permission; generally, this is a
copyright/author's rights issue. (IANAL; you have been warned.)
If you think about it, what I need is just a browser without the
display stuff BUT with the ability to call functions giving back the
document tree in its actuel state.
It would appear that those specifications are mutually exclusive. You can
certainly parse the `responseText' into a Document object, however ISTM you
need the "display stuff" for the tree-manipulating script code to be
executed in the context of the represented document, generally.

While I can think of a hack that evaluates all script code in the document
this way, it remains to be seen how adaptive and cross-browser such a
solution would be. For example, if the XHR code used the value of the
`offsetWidth' property to determine whether or not an element should be
created or modified in, or removed from the document tree, would that value
make sense when the element in question is not displayed?

Please trim your quotes and do not quote signatures.
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Sep 10 '08 #7
"kaer" <ka*******@gmail.comwrote in message
news:e3**********************************@e53g2000 hsa.googlegroups.com...
I wonder though that there is no a libraries or applications doing
that. If you think about it, what I need is just a browser without the
display stuff BUT with the ability to call functions giving back the
document tree in its actuel state.
I did not say this but have a look at jQuery it may help you, but there is
no substitute for knowing things from the ground up. jQuery has an XPath
plug in, not sure how good it is or how well documented it is.

Try Googling for "jQuery" and "jQuery XPath".

Good luck,

Aaron
Sep 10 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Neil Zanella | last post by:
Hello, I would like to know whether the mozilla web browser has built in support for searching XML documents via XPath expressions as with IE's xmlobject's and xmlDoc's function selectNodes() or...
2
by: Tjerk Wolterink | last post by:
IU have the following xsl root element: <xsl:stylesheet version="1.0" xmlns="http://www.w3.org/1999/xhtml" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"...
5
by: laks | last post by:
Hi I have the following xsl stmt. <xsl:for-each select="JOB_POSTINGS/JOB_POSTING \"> <xsl:sort select="JOB_TITLE" order="ascending"/> This works fine when I use it. But when using multiple...
5
by: Gnic | last post by:
Hi , I have an XmlDocument instance, I want to find a node in the xml, but I don't know it's path until runtime, for example <aaa> <bbb name="x"/> <aaa attr="y"> <ccc>sometext</ccc> </aaa>
6
by: Gale | last post by:
I'm working on something in jQuery with XPath What I want to do is: if checkbox is checked, set background color od label that contain input(checkbox) to red I have this code:...
3
by: Greg | last post by:
Hi, I want to create a web based interface that uses a form + Javascript (in an XHTML namespace) to construct an XPath to query and modify the attributes of some SVG (in an SVG namespace). ...
2
by: soren625 | last post by:
I have searched this (and other) groups extensively, in addition to the clj FAQ and the Web, and (to my surprise) this question doesn't come up as often as I thought it would. Maybe this is because...
4
by: Claudio Calboni | last post by:
Hello folks, I'm having some performance issues with the client-side part of my application. Basically, it renders a huge HTML table (about 20'000 cells in my testing scenario), without content....
1
by: newToAjax | last post by:
I have created an ajax application which retrievs an xml file and fills in the tab fields on the form.The code works fine in IE while its does not in Mozilla. Can you please let me know if i have to...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.