473,787 Members | 2,857 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to extract all links/url from web page?

For a webcrawler, you need to extract all links from the web page. For
normal html anchor tags or any of the src and href attribute on the
tag can be easily extracted using ihtmldocument.
What about links inside of javascript function like below??

<HEAD>
<SCRIPT language="JavaS cript">
<!--hide

function newwindow()
{
window.open('je x5.htm','jav',' width=300,heigh t=200,resizable =yes');
}
//-->
</SCRIPT>

<A HREF="javascrip t:newwindow()" >Click Here!</A>

or
javascript function with the following
function newwindow()
{
......
window.location ('http://www.google.com' )
}

<input type=button onclick="javasc ript:newwindow( )" >Click Here!

How to extract the links from these javascript function??

Any help would be much appreciated. Can a crawler extract such links
and how??

May 5 '07 #1
2 13268
On May 5, 8:59 am, learnyourabc <learnyour...@y ahoo.comwrote:
For a webcrawler, you need to extract all links from the web page. For
normal html anchor tags or any of the src and href attribute on the
tag can be easily extracted using ihtmldocument.
What about links inside of javascript function like below??

<HEAD>
<SCRIPT language="JavaS cript">
<!--hide

function newwindow()
{
window.open('je x5.htm','jav',' width=300,heigh t=200,resizable =yes');}

//-->
</SCRIPT>

<A HREF="javascrip t:newwindow()" >Click Here!</A>

or
javascript function with the following
function newwindow()
{
.....
window.location ('http://www.google.com' )

}

<input type=button onclick="javasc ript:newwindow( )" >Click Here!

How to extract the links from these javascript function??

Any help would be much appreciated. Can a crawler extract such links
and how??
Regular expressions are the best way to go. Store the entire HTML
contents in a string and search it for patterns matches. You can find
a ton of RegEx tutorials online.

May 5 '07 #2
Regular expressions can only be used to extract the link from the text
if it is displayed inside the javascript in clear text. how to extract
all instances of links formed inside javascript automatically? say
combination of some variables to form the link? Have to execute the
script for the onclick button ext to get the link?? Anyone has any
suggestions?? How

May 6 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
3467
by: Zhang Le | last post by:
Hello, I'm writing a little Tkinter application to retrieve news from various news websites such as http://news.bbc.co.uk/, and display them in a TK listbox. All I want are news title and url information. Since each news site has a different layout, I think I need some template-based techniques to build news extractors for each site, ignoring information such as table, image, advertise, flash that I'm not interested in. So far I have...
1
8847
by: livin | last post by:
I'm hoping someone knows of an example script I can see to help me build mine. I'm looking for an easy way to automate the below web site browsing and pull the data I'm searching for. Here's steps it needs to accomplish... 1) login to the site (windows dialog when hitting web page) *optional* 2) Choose menu link from ASP page (script shows/hides menu items depending
2
4493
by: Thief_ | last post by:
I've got this type of info on a web page: ---------------------------------------------------------------------------- -------------------------------------------- <tr height="25"> <td nowrap class="odd" align="center"><img src="/forums/images/icon_topic_new.gif" width=14 height=14 alt='New Topic' border=0></td> <td nowrap class="odd" align="center">&nbsp;</td>
1
6894
by: kidkurious | last post by:
I have a script that will read web file, extract the hyperlinks and sort them in alphabetical order. It works fine, but not the way I want. I want to change the script so that it will extract the text link as well. //sortlinks.php <?php $matches= array();
9
3502
by: chrisspencer02 | last post by:
I am looking for a method to extract the links embedded within the Javascript in a web page: an ActiveX component, or example code in C++/Pascal/etc. I am looking for a general solution, not one tailored to a particular page/script. Hopefully, the problem can be solved without recreating a complete Javascript interpreter. Any ideas?
5
8173
by: jimFDAC | last post by:
Hi- I would like to extract a value from the displayed url in the address, i.e. the 222 from http://www.virtual.com/test.htm?sid=222 I now need to hold that value in a variable var XXX= 222 for example and dynamically append it to links on the page (sometimes all of the links, other times only some of the links)... this value XXX is different for every incoming link landing on this page. The new link needs to be constructed as:...
5
17955
markmcgookin
by: markmcgookin | last post by:
Hi Folks, I am writing a program to analyse an html page in java, I am connecting to a website, then going to extract ALL the links from it. I think the best way to do this is using the <a href... /a> tags as a guideline. I have the code.... String data1; DataInputStream webadd = null;
0
1344
by: Rama Jayapal | last post by:
I am pretty new to VB, so please forgive the simplistic question. This is using VB .NET 2005 My form has three objects on it: a TextBox named URL, a Button named Extract and a WebBrowser named AxWebBrowser1. The goal is to have the user enter a URL in the TextBox and then hit the Extract button and then to get the links from the web page they entered. So far I have:
2
2498
by: HTCAthenaGuy | last post by:
Hey ive got a simple problem here im using Forum.Document.Links to extract all links from a webpage loaded into a webbrowser control . Some of the links contain url variables like the subscribe link above http://bytes.com/subscription.php?do=viewsubscription . Each one of these links getspassed through a foreachloop . I use String.Contains() to filter out all links that dont contain ?do= the url paremer in this case but after that im...
0
9655
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10169
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10110
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9964
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7517
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6749
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5398
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4067
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3670
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.