473,406 Members | 2,707 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Extract links from Javascript (not using Javascript)?

I am looking for a method to extract the links embedded within the
Javascript in a web page: an ActiveX component, or example code in
C++/Pascal/etc. I am looking for a general solution, not one tailored
to a particular page/script.

Hopefully, the problem can be solved without recreating a complete
Javascript interpreter. Any ideas?

May 26 '06 #1
9 3474

<ch************@yahoo.com> wrote in message
news:11**********************@j55g2000cwa.googlegr oups.com...
I am looking for a method to extract the links embedded within the
Javascript in a web page: an ActiveX component, or example code in
C++/Pascal/etc. I am looking for a general solution, not one tailored
to a particular page/script.

Hopefully, the problem can be solved without recreating a complete
Javascript interpreter. Any ideas?


If you expect to have any chance at getting at links that are anything
that other coded directly in a string liveral, you will need at least a full
JavaScript parser.
See http://www.semanticdesigns.com/Produ...nds/index.html
for a JavaScript front end that is designed to be used in custom tasks
like this.

--
Ira Baxter, CTO
www.semanticdesigns.com

May 26 '06 #2
ch************@yahoo.com said the following on 5/26/2006 3:03 PM:
I am looking for a method to extract the links embedded within the
Javascript in a web page: an ActiveX component, or example code in
C++/Pascal/etc. I am looking for a general solution, not one tailored
to a particular page/script.


There are too many possibilities to deal with for a solution to that
question to be simple and/or general. Just too many ways that a URL can
be put together in script.

Can you give a general example of what you are trying to do though?
--
Randy
comp.lang.javascript FAQ - http://jibbering.com/faq & newsgroup weekly
Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
May 26 '06 #3
Ira Baxter said the following on 5/26/2006 3:44 PM:
<ch************@yahoo.com> wrote in message
news:11**********************@j55g2000cwa.googlegr oups.com...
I am looking for a method to extract the links embedded within the
Javascript in a web page: an ActiveX component, or example code in
C++/Pascal/etc. I am looking for a general solution, not one tailored
to a particular page/script.

Hopefully, the problem can be solved without recreating a complete
Javascript interpreter. Any ideas?
If you expect to have any chance at getting at links that are anything
that other coded directly in a string liveral, you will need at least a full
JavaScript parser.


And even that is not a guarantee of success.
See http://www.semanticdesigns.com/Produ...nds/index.html
for a JavaScript front end that is designed to be used in custom tasks
like this.


It is designed to parse out any and all URL's that a document possesses?

I find that a dubious claim.

--
Randy
comp.lang.javascript FAQ - http://jibbering.com/faq & newsgroup weekly
Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
May 26 '06 #4
Randy Webb wrote:
ch************@yahoo.com said the following on 5/26/2006 3:03 PM:
I am looking for a method to extract the links embedded within the
Javascript in a web page: an ActiveX component, or example code in
C++/Pascal/etc. I am looking for a general solution, not one tailored
to a particular page/script.


There are too many possibilities to deal with for a solution to that
question to be simple and/or general. Just too many ways that a URL can
be put together in script.

Can you give a general example of what you are trying to do though?


I would like to transform web pages "in the wild" into tables of links
for a site map, regardless of whether those links are encoded in HTML,
CSS, Flash, Javascript, etc. Sounds like this is not possible,
particularly for event-driven aspects of the script like rollover image
menus?

May 27 '06 #5
ch************@yahoo.com said the following on 5/26/2006 8:44 PM:
Randy Webb wrote:
ch************@yahoo.com said the following on 5/26/2006 3:03 PM:
I am looking for a method to extract the links embedded within the
Javascript in a web page: an ActiveX component, or example code in
C++/Pascal/etc. I am looking for a general solution, not one tailored
to a particular page/script.

There are too many possibilities to deal with for a solution to that
question to be simple and/or general. Just too many ways that a URL can
be put together in script.

Can you give a general example of what you are trying to do though?


I would like to transform web pages "in the wild" into tables of links
for a site map, regardless of whether those links are encoded in HTML,
CSS, Flash, Javascript, etc. Sounds like this is not possible,
particularly for event-driven aspects of the script like rollover image
menus?


It could be done with regards to the CSS, HTML, and JS aspects, but it
wouldn't be a pretty task to try to accomplish. Just trying to resolve
relative paths would be a major headache.

--
Randy
comp.lang.javascript FAQ - http://jibbering.com/faq & newsgroup weekly
Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
May 27 '06 #6
ch************@yahoo.com wrote:
Randy Webb wrote:
ch************@yahoo.com said the following on 5/26/2006 3:03 PM:
> I am looking for a method to extract the links embedded within the
> Javascript in a web page: an ActiveX component, or example code in
> C++/Pascal/etc.
Obviously you are not yet sure what to use, so a newsgroup dedicated to a
certain (group of) language(s), like this one, is not the place to start.
Try comp.infosystems.www.authoring.misc, or comp.lang.misc.
> I am looking for a general solution, not one tailored
> to a particular page/script.
There are too many possibilities to deal with for a solution to that
question to be simple and/or general. Just too many ways that a URL can
be put together in script.

Can you give a general example of what you are trying to do though?


I would like to transform web pages "in the wild" into tables of links
for a site map,


A site map is best implemented using lists (in [X]HTML: `ul' and `ol'
elements), not tables. A table is a table is a table. [psf 3.8]
regardless of whether those links are encoded in HTML, CSS, Flash,
Javascript, etc. Sounds like this is not possible,
It is possible to a certain point (I don't think decompiling Flash is
possible easily). There is software for that already (Web spiders),
and you could use its output.
particularly for event-driven aspects of the script like rollover image
menus?


The rollover effect has to take place on existing markup, so it does not
matter here. You will have difficulties to recognize not gracefully
degrading client-side generated menus, and those that use pseudo-links
like (<a href="javascript:somefunction()">...</a>), though.

Which also tells you that unless you are using server-side J(ava)Script,
J(ava)Script is not the appropriate language for generating the site map.
However, e.g. it can help with letting the user expand/collapse it later.
PointedEars
--
This is Usenet. It is a discussion group, not a helpdesk. You post
something, we discuss it. If you have a question and that happens to get
answered in the course of the discussion, then great. If not, you can
have a full refund of your membership fees. -- Mark Parnell in alt.html
May 29 '06 #7
Thomas 'PointedEars' Lahn wrote:
ch************@yahoo.com wrote:
Randy Webb wrote:
ch************@yahoo.com said the following on 5/26/2006 3:03 PM:
> I am looking for a method to extract the links embedded within the
> Javascript in a web page: an ActiveX component, or example code in
> C++/Pascal/etc.
Obviously you are not yet sure what to use, so a newsgroup dedicated to a
certain (group of) language(s), like this one, is not the place to start.
Try comp.infosystems.www.authoring.misc, or comp.lang.misc.


I am not *unsure* what language to use to solve this problem; actually
I don't care. My question is about algorithms for parsing and
interpreting Javascript.

> I am looking for a general solution, not one tailored
> to a particular page/script.

There are too many possibilities to deal with for a solution to that
question to be simple and/or general. Just too many ways that a URL can
be put together in script.

Can you give a general example of what you are trying to do though?


I would like to transform web pages "in the wild" into tables of links
for a site map,


A site map is best implemented using lists (in [X]HTML: `ul' and `ol'
elements), not tables. A table is a table is a table. [psf 3.8]


I do not mean "table" as in HTML table, but "table" as in raw data set.

regardless of whether those links are encoded in HTML, CSS, Flash,
Javascript, etc. Sounds like this is not possible,


It is possible to a certain point (I don't think decompiling Flash is
possible easily). There is software for that already (Web spiders),
and you could use its output.


Have you used any that actually extract links from Javascript? I have
not, though I know some claim to do so.

particularly for event-driven aspects of the script like rollover image
menus?


The rollover effect has to take place on existing markup, so it does not
matter here. You will have difficulties to recognize not gracefully
degrading client-side generated menus, and those that use pseudo-links
like (<a href="javascript:somefunction()">...</a>), though.

Which also tells you that unless you are using server-side J(ava)Script,
J(ava)Script is not the appropriate language for generating the site map.
However, e.g. it can help with letting the user expand/collapse it later.


Again, I am not looking to write a solution *in* Javascript
(necessarily), I am looking to read links *from* Javascript using
whatever tools are available.

May 29 '06 #8
chrisspencer02 said:
I am looking for a method to extract the links embedded within the
Javascript in a web page: an ActiveX component, or example code in
C++/Pascal/etc. I am looking for a general solution, not one tailored
to a particular page/script.


How general do you want this to be - a completely general solution is
probably impossible. I'm not being arsey about this - I'm just interested
in the problem.

E.g. sometimes people are going to write code which is something like this:

var siteName="http://lofty.dyndns.info";
....
var paths=Array("images","js");
....
var filename="icon.gif";
....
var url=siteName+paths+filename;

So if you come at it from the side of parsing the code to see if there are
any valid links embedded in it, you won't get them all without (in
the worst case) writing some AI that is on a par with a human javascript
programmer...

If you come at it from the side of running the code in a javascript
interpreter to see what links it generates, it could be just as bad. E.g.
someone might have a puzzle page that links you to another page when
you've solved the problem. To get at the url this way, you would have to
write some AI that could firstly work out that it /was/ a puzzle page, and
then solve the puzzle, which is even worse.

In practice it's probably not that bad, so you're probably better off
spending some time reading people's javascript, looking for common ways
people do stuff (e.g. rollover buttons), and then writing code tailored to
those.

--
http://www.niftybits.ukfsn.org/

remove 'n-u-l-l' to email me. html mail or attachments will go in the spam
bin unless notified with [html] or [attachment] in the subject line.

May 30 '06 #9
ch************@yahoo.com wrote:
Thomas 'PointedEars' Lahn wrote:
ch************@yahoo.com wrote:
> Randy Webb wrote:
>> ch************@yahoo.com said the following on 5/26/2006 3:03 PM:
>> > I am looking for a method to extract the links embedded within the
>> > Javascript in a web page: an ActiveX component, or example code in
>> > C++/Pascal/etc.
Obviously you are not yet sure what to use, so a newsgroup dedicated to a
certain (group of) language(s), like this one, is not the place to start.
Try comp.infosystems.www.authoring.misc, or comp.lang.misc.


I am not *unsure* what language to use to solve this problem; actually
I don't care. My question is about algorithms for parsing and
interpreting Javascript.


Interpretation of "Javascript" would first include the recognition that
there are different implementations of ECMAScript: JavaScript, JScript,
Opera-ECMAScript, KJS; just to name the most widely distributed ones.

Whether script code executes or not, i.e. whether there is a "link" or
not, would depend entirely on how tight something is coded to a specific
implementation, let alone a specific execution environment or, object
model.

Second, if you would stick to strictly ECMAScript-conforming code as
should be expected by an interoperable Web site that is to be parsed,
the matter of interpretation includes how you want to recognize what
is a "link" or not. Because

var img = new Image();
img.src = "foo";

could be considered a link (to an image resource named `foo').

var img = new Object();
img.src = "foo";

could not.

As for recognizing links and pseudo-links such as

function updateFrame(o)
{
var f = window.parent.frames['foo'];
if (f && f.document)
{
f.document.URL = "bar/" + o.href;
return false;
}

return true;
}

<a href="blurb.html" onclick="return updateFrame(this);"

or the ill-conceived

<a href="#" onclick="location = foo + 'bar'">...</a>

<a href="javascript:someFunction()">...</a>

or even something dynamically scripted like

<script type="text/javascript">
var a = document.createElement("a");
if (a && isMethod(a.appendChild, a.addEventListener,
document.createTextNode, document.body.appendChild))
{
a.appendChild(document.createTextNode("foo"));
a.addEventListener('click',
function(e)
{
if (!e) e = window.event;
if (e)
{
(dhtml.getElem("id", "bar") || {click: function(){}}).onclick();
if (isMethod(e.stopPropagation)) e.stopPropagation();
if (isMethod(e.preventDefault)) e.preventDefault();
if (typeof e.cancelBubble != "undefined") e.cancelBubble = true;
}
},
false);

document.body.appendChild(a);
}
</script>

how would you even /know/ that there is a "link" and where it points to
without implementing the script engine along with its execution environment
itself? I think there are far too many variables here to make even an
educated guess.
> regardless of whether those links are encoded in HTML, CSS, Flash,
> Javascript, etc. Sounds like this is not possible,


It is possible to a certain point (I don't think decompiling Flash is
possible easily). There is software for that already (Web spiders),
and you could use its output.


Have you used any that actually extract links from Javascript?


No. Probably for good reason.
Again, I am not looking to write a solution *in* Javascript
(necessarily), I am looking to read links *from* Javascript
using whatever tools are available.


I don't think this is very much on topic here.
PointedEars
May 30 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: livin | last post by:
I'm hoping someone knows of an example script I can see to help me build mine. I'm looking for an easy way to automate the below web site browsing and pull the data I'm searching for. Here's...
1
by: kidkurious | last post by:
I have a script that will read web file, extract the hyperlinks and sort them in alphabetical order. It works fine, but not the way I want. I want to change the script so that it will extract...
7
by: fox | last post by:
Hi, Lacking javascript knowledge, I just realized why my project has a bug. I am using ASP to loop through a set of records while it creates URLs with a querystring that has a single value pair....
5
by: jimFDAC | last post by:
Hi- I would like to extract a value from the displayed url in the address, i.e. the 222 from http://www.virtual.com/test.htm?sid=222 I now need to hold that value in a variable var XXX= 222...
0
by: Rama Jayapal | last post by:
I am pretty new to VB, so please forgive the simplistic question. This is using VB .NET 2005 My form has three objects on it: a TextBox named URL, a Button named Extract and a WebBrowser named...
2
by: learnyourabc | last post by:
For a webcrawler, you need to extract all links from the web page. For normal html anchor tags or any of the src and href attribute on the tag can be easily extracted using ihtmldocument. What...
0
by: Formula | last post by:
Hello everybody,because I am newbie in python two weeks only but I had programming in another languages but the python take my heart there's 3 kind of arrays Wow now I hate JAVA :) . I am working...
0
by: Dev .Net | last post by:
see following to extract links from javascript also http://urenjoy.blogspot.com/2008/10/extract-links-from-string.html
2
by: HTCAthenaGuy | last post by:
Hey ive got a simple problem here im using Forum.Document.Links to extract all links from a webpage loaded into a webbrowser control . Some of the links contain url variables like the subscribe...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.