469,645 Members | 1,706 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,645 developers. It's quick & easy.

how to filter a page using javascript ?



Let's say I want to filter the contents of a target web page, and
present a simpler page on the screen.

For example, let's say a target web page is full of links, text, images,
forms, etc. and I want to present a simple page containing just the links.
The original page is not "mine", that is, I can't just edit it in
notepad and stick some javascript in it.
So, what I want to do is write some javascript on a new page that I'm
developing, that will somehow "access" or "read" the target page, scan
and find all the links, and present them to the user.

My question is, what's a straightforward way from javascript to access
the target page ?

Should I be somehow loading it into a DOM object and then "walking" the
tree ?

Or should I somehow read it's HTML as text strings and parse it looking
for anchor links ?

What functions, methods, classes, objects in javascript achieve the goal
of something accessing a remote page and "looking" at its contents.

Point me in a direction and I'll look up the appropriate objects,
methods, functions etc.

Thanks ! er*******@rcn.com

Jul 20 '05 #1
11 1623

Another wording of my question:

Suppose I want my javascript in html page A to be able to view the html
or DOM content of page B. What javascript functions, classes, methods
etc. should I look at for achieving this?

Jul 20 '05 #2
Eric Osman <40**************@rcn.com> wrote:
Another wording of my question:

Suppose I want my javascript in html page A to be able to view the
html or DOM content of page B. What javascript functions, classes,
methods etc. should I look at for achieving this?


If page A and B are within the same domain then the W3C DOM and the
innerHTML extension (where supported) are possibilities, is you are only
interested in links (as your original post implies) then the
document.links collection would be the place to look for the
information. Probably loading the second page into an IFRAME.

If the two pages originate in different domains then forget about it as
security restrictions will prevent such an action (except maybe from an
HTA or a local browser with significantly reduced security settings,
that is only really applicable on a personal basis).

Generally what you describe would be better achieved with a server side
script loading page B and analysing it outside of any security context
(and preferably with the permission of the owner of page B).

Richard.
Jul 20 '05 #3
Eric Osman wrote:

Another wording of my question:

Suppose I want my javascript in html page A to be able to view the html
or DOM content of page B. What javascript functions, classes, methods
etc. should I look at for achieving this?


As Richard Cornford mentioned, if these sites are not from the same
domain, this will not work. You can use document.links to read all of
the links in a page.

Also, assuming that A and B are on the same domain, you can load the
data page, using a hidden IFRAME, and use javascript to access the
IFRAME's document tree.

I have used HTA (Hypertext Applications) in the past, in order to bypass
the domain security thing, but that _only_ works with IE. More info at
msdn.microsoft.com.

Brian

Jul 20 '05 #4

Then let me ask it another way please:
Just as browsers are so willing to let the user do a "view source" on a
web page, is there a way I can write javascript to obtain the same source ?
Thanks.

/Eric
Brian Genisio wrote:
Eric Osman wrote:

Another wording of my question:

Suppose I want my javascript in html page A to be able to view the
html or DOM content of page B. What javascript functions, classes,
methods etc. should I look at for achieving this?


As Richard Cornford mentioned, if these sites are not from the same
domain, this will not work. You can use document.links to read all of
the links in a page.

Also, assuming that A and B are on the same domain, you can load the
data page, using a hidden IFRAME, and use javascript to access the
IFRAME's document tree.

I have used HTA (Hypertext Applications) in the past, in order to bypass
the domain security thing, but that _only_ works with IE. More info at
msdn.microsoft.com.

Brian

Jul 20 '05 #5
Eric Osman wrote:

<top posting fixed, read the FAQ>
Brian Genisio wrote:
Eric Osman wrote:

Another wording of my question:

Suppose I want my javascript in html page A to be able to view the
html or DOM content of page B. What javascript functions, classes,
methods etc. should I look at for achieving this?


As Richard Cornford mentioned, if these sites are not from the same
domain, this will not work. You can use document.links to read all of
the links in a page.

Also, assuming that A and B are on the same domain, you can load the
data page, using a hidden IFRAME, and use javascript to access the
IFRAME's document tree.

I have used HTA (Hypertext Applications) in the past, in order to
bypass the domain security thing, but that _only_ works with IE. More
info at msdn.microsoft.com.

Brian


Then let me ask it another way please:

Just as browsers are so willing to let the user do a "view source" on a
web page, is there a way I can write javascript to obtain the same source ?

You are comparing apples and oranges though. view source on a webpage is
tantamount to "show me what you have" whereas you are wanting to "show
me what that person over there has" and theres a world of difference.

You could try, with its still-limited security concerns, an
HTTPRequestObject (in the FAQ). If its running locally on your computer,
you get around the security issue, with it running from a server, you
get slapped in the face trying to read a file from elsewhere.
--
Randy
Chance Favors The Prepared Mind
comp.lang.javascript FAQ - http://jibbering.com/faq/

Jul 20 '05 #6
Eric Osman wrote on 26 feb 2004 in comp.lang.javascript:
Just as browsers are so willing to let the user do a "view source" on
a web page, is there a way I can write javascript to obtain the same
source ?


Yes.

Write a bookmarklet / favelet:
[InternetShortcut]
URL=javascript:void(location.href="view-source:"+location.href)
IE tested

Probably not what you want to hear?
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Jul 20 '05 #7
JRS: In article <40**************@rcn.com>, seen in
news:comp.lang.javascript, Eric Osman <er*******@rcn.com> posted at Thu,
26 Feb 2004 11:02:23 :-

Just as browsers are so willing to let the user do a "view source" on a
web page, is there a way I can write javascript to obtain the same source ? Brian Genisio wrote:
Eric Osman wrote:


Responses should go after trimmed quotes.

ISTM that you want to write a Web page that displays its own javascript,
or some of it.

For that, see <URL:http://www.merlyn.demon.co.uk/js-nclds.htm> and
<URL:http://www.merlyn.demon.co.uk/js-index.htm#CD>.

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4 ©
<URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
Jul 20 '05 #8
Dr John Stockton wrote:
JRS: In article <40**************@rcn.com>, seen in
news:comp.lang.javascript, Eric Osman <er*******@rcn.com> posted at Thu,
26 Feb 2004 11:02:23 :-
Just as browsers are so willing to let the user do a "view source" on a
web page, is there a way I can write javascript to obtain the same source ?


Brian Genisio wrote:
Eric Osman wrote:


Responses should go after trimmed quotes.

ISTM that you want to write a Web page that displays its own javascript,
or some of it.

For that, see <URL:http://www.merlyn.demon.co.uk/js-nclds.htm> and
<URL:http://www.merlyn.demon.co.uk/js-index.htm#CD>.


I understood it that he wants a page of his own to be able to display
the source/js of another domains webpages.

--
Randy
Chance Favors The Prepared Mind
comp.lang.javascript FAQ - http://jibbering.com/faq/

Jul 20 '05 #9


Thanks for all the replies.

I'm having success with HTTPRequestObject . My purpose is to parse the
data generated by a page, in order to capture just a relevant sublist.

By the way, please explain why the following defense of top-posting
doesn't make sense:

It seems to me that top-posting is convenient for someone that is
diligently following the discussion, since as they progress to each
subsequent reply, they immediately see the new material.

Compare that with bottom-posting, which forces the diligent follower,
upon arriving at each subsequent reply, to first scroll down through all
the quoted material from all the previous replies, before seeing the new
material.

er*******@rcn.com
Randy Webb wrote:
Dr John Stockton wrote:
JRS: In article <40**************@rcn.com>, seen in
news:comp.lang.javascript, Eric Osman <er*******@rcn.com> posted at Thu,
26 Feb 2004 11:02:23 :-
Just as browsers are so willing to let the user do a "view source" on
a web page, is there a way I can write javascript to obtain the same
source ?

Brian Genisio wrote:

Eric Osman wrote:


Responses should go after trimmed quotes.

ISTM that you want to write a Web page that displays its own javascript,
or some of it.

For that, see <URL:http://www.merlyn.demon.co.uk/js-nclds.htm> and
<URL:http://www.merlyn.demon.co.uk/js-index.htm#CD>.

I understood it that he wants a page of his own to be able to display
the source/js of another domains webpages.


Jul 20 '05 #10
Eric Osman wrote:
<snip>
By the way, please explain why the following defense of top-posting
doesn't make sense:

It seems to me that top-posting is convenient for someone that is
diligently following the discussion, since as they progress to each
subsequent reply, they immediately see the new material.
Assuming that Usenet should only pander to the diligent follower of
discussions, even they may not be able to accurately guess which
specific points are being replied to. Whereas responding below the
appropriately trimmed quote of the point (or points) being responded to
make the context of the response evident.
Compare that with bottom-posting, which forces the diligent follower,
upon arriving at each subsequent reply, to first scroll down through
all the quoted material from all the previous replies, before seeing
the new material.

<snip>

Any significant need to scroll down would be indicative of insufficient
material having been trimmed. It is only necessary to provide sufficient
quoted material to place a response in context.

Experienced users of Usenet, used to the operating conventions, may see
any material appearing at the top of a post as a preamble to possibly
more detailed responses to specific points raised in quoted material
below. They will not discover that they are mistaken until they scroll
to the end of a post and discover that there were no further responses.
Meaning that top posting may result in more scrolling, and for exactly
those people who tend to also be the people best qualified to answer any
questions raised, wasting their time and potentially generating
resentment.

Generally, as the point in posting to a newsgroup is to elicit some sort
of response, any action that may alienate the people who's responses
would be of most value would be misguided. Following the established and
documented conventions of the medium is one way of minimising the risk
of causing offence.

The guidelines on posting style that can be found in the FAQ of this
group are subject to public review at intervals, and to date there have
not even been any suggestions that they be altered to remover the
request for users of the group to follow the established Usenet
conventions.

Richard.
Jul 20 '05 #11
Thomas 'PointedEars' Lahn wrote:
Richard Cornford wrote:
Thomas 'PointedEars' Lahn wrote:
Richard Cornford wrote:
Thomas 'PointedEars' Lahn wrote:
> s/domains/second-level domains/ <snip>
That was the only meaningful content in your post (the preceding
character sequence having failed to achieve the status of sentence).
The "preceding character sequence" is common Usenet jargon, especially
in technical groups like this,


I read technical groups like this on a regular basis and have never seen
it used before, making "common" a questionable catagorisation.
and specifies a substitution operation
possible with sed and Perl, among others.[1]
So maybe it is common on Unix and Perl groups, it doesn't appear to
feature on web development groups.

<snip> Third- and other sub-level domains are also domains which simply
makes your statement false and thus required correction.

<snip>

Sub-domains may be domains but it would not be reasonable to call
them different domains,


It would be reasonable and thus it is done, as you could have read.
which is probably why people don't.


Who the heck is "people"? "Others also do it" has never been a good
argument. Was is not you who recently pointed out that in informal
speech many technical terms are used incorrectly or inexactly?


Which is exactly my point. The informal meaning of domain is derived
from how it is used by people in normal conversation, to communicate a
concept. It may be a shorthand and there may be a more technically
correct term for that concept (that has additional qualification) but my
intention was not to write a discourse on cross-domain security only to
suggest that the concept might have significance in the situation.
Without additional information form the OP there was no reason to
attempt to convey anything beyond the fact that two situations may apply
and that one excluded the other.
foo.foobar.com and bar.foobar.com *are* different domains, like it
or not.


And foobar.com and foobar.com are not different domains, while
example.com is. The distinctness of parts is less significant than the
distinctness of wholes in normal language, and the (informal) unit of
"domain" is example.com or foobar.com. The specific technical
qualification is superfluous until the relevance of the (informal)
concept has been established.
But there was no "correction" in your post. There where not enough
actual statements in it.


There was and there were. ...

<snip>

OK, there was, but only for people who recognise you "common"
substitution syntax. It remains a correction that was unnecessary in
context and turned out to be irrelevant to the question.

Richard.
Jul 20 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by ianv2 | last post: by
2 posts views Thread by simon.wilkinson | last post: by
7 posts views Thread by vunet.us | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.