473,757 Members | 10,708 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Finding Broken Link in WebSite

3 New Member
Hi everyone,
i want .net(VB or C#) code for finding broken links in a website.
The requirement is that the user will be able to type the
url in a text box so once the button is clicked , it has to show
whether there are any broken links in that particular page.
Please help me out in this.


Thanks
Sridhar.S
Feb 11 '08 #1
8 5224
wimpos
19 New Member
You can download the website entered into the textbox.
try using the webclient class
Now you have the source html code of the webpage.
Than find al the links in the webpages <a href="**"></a> (maybe a method in webclient, otherwise use a regex)
extract the ** and try to download this website as you did before. If it succeeds the link is alive, otherwise it is dead.

This is a guidline you can follow. try it, if you run into trouble don't hesitate to get more detailed info.

regards
W.
Feb 11 '08 #2
kenobewan
4,871 Recognized Expert Specialist
I find this question interesting because a link isn't really broken until you click and get the 404 error. Also an ok local link may be broken on the web. Having said that there are third party tools that may help, but my assumption is that it may be a case of 'physician heal thyself".

Having said that a site map may help, but you would need to customize to cover all dynamic links. Here is a great resource on TDD, applied mainly to software but if web programming was done this way...
My Uber-Test-Driven Development ("TDD") Links Listing
Feb 11 '08 #3
Plater
7,872 Recognized Expert Expert
W3.org offers testing for broken links.
All they do is check to see if a good status is returned when they attempt to navigate to the address inside an anchor tag.
Feb 11 '08 #4
sristhrashguy
3 New Member
Hi all,

I'm storing a set of "<a href" tags in a list. Now i want to check all these links are valid or not? i.e, the list contains any broken links or not.

This is the code.....

the code will allow user to enter the url and it will render all the html code back
and display only the href tags. Now i want to check the validity of all these links
and display the result.... i.e all the links are not dead.....

Expand|Select|Wrap|Line Numbers
  1. // make an object of the WebClient class 
  2.             WebClient objWebClient = new WebClient();
  3.          // gets the HTML from the url written in the textbox
  4.             aRequestHTML = objWebClient.DownloadData(TextBox1.Text); 
  5.          // creates UTf8 encoding object
  6.             UTF8Encoding utf8 = new UTF8Encoding();  
  7.         // gets the UTF8 encoding of all the html we got in aRequestHTML
  8.             myString = utf8.GetString(aRequestHTML); 
  9.  
  10.  
  11.             ArrayList list = new ArrayList();
  12.             int curindex = 0;
  13.             int index = 0;
  14.             do
  15.             {
  16.                 index = myString.IndexOf("<a href=", curindex);
  17.                 if (index==-1) {break;}
  18.                 curindex = myString.IndexOf(">", index);
  19.                 string ancordata = myString.Substring(index, curindex - index);
  20.                 if (ancordata.ToLower().IndexOf("javascript") < 0)
  21.                 {                    
  22.                     list.Add(ancordata);
  23.                 }
  24.             } 
  25.             while (index != -1);
  26.               GridView1.DataSource = list; 
  27.         //// binds the databind
  28.             GridView1.DataBind();
  29.  
Feb 13 '08 #5
sristhrashguy
3 New Member
You can download the website entered into the textbox.
try using the webclient class
Now you have the source html code of the webpage.
Than find al the links in the webpages <a href="**"></a> (maybe a method in webclient, otherwise use a regex)
extract the ** and try to download this website as you did before. If it succeeds the link is alive, otherwise it is dead.

This is a guidline you can follow. try it, if you run into trouble don't hesitate to get more detailed info.

regards
W.
Hi thanks for ur reply,

i have passed half the ocean.
Now i have all the list of "<a href" tags in a array list.
I'm totally stuck here. Please help me.
Feb 13 '08 #6
Plater
7,872 Recognized Expert Expert
Try using an HttpWebRequest to see if they return a good status or not?
Feb 14 '08 #7
jallred
6 New Member
Try using an HttpWebRequest to see if they return a good status or not?
Spot on. For each url, use the code at http://msdn2.microsoft .com/en-us/library/system.net.http webrequest.getr esponse.aspx, check the status code of the response. 200 is OK, while 404 is the typical broken link. Other status codes will require some judgement.

John
http://blogs.msdn.com/usisvde/
Feb 18 '08 #8
wimpos
19 New Member
Tip:
It might be not that important but I still recommend using a regular expression to search for the <a href="" >

It's cleaner, it 's faster, more reliable

regards
Feb 20 '08 #9

Sign in to post your reply or Sign up for a free account.

Similar topics

2
3085
by: Danny | last post by:
Anyone knows how to detect broken link using ASP? Just say the link is got from query string, ie: http://www.mydomain.com/check_validity_link.asp?www.otherdomain.com/1.htm TIA.
1
1864
by: talyabn | last post by:
Hi, I'm trying to invoke the 'Broken Hyperlinks' option in the FrontPage application. The problem is that I get all the links in a given HTML page instead of getting only the broken links. I'm using automation in my Visual Basic program and I'd like to know if there is any way to get only the broken links in a web page.
1
2234
by: | last post by:
I am planning to develop a directory website (ASP.NET) which will contain links to hundreds of external web pages. In an effort to keep the directory up to date, I would like to trap (perhaps as an event) and then log when the user clicks on a broken link (page not found or server not available). Keep in mind that these are links to 3rd party pages not hosted on my website or server. I think I can do this using client side Java scripting,...
0
971
by: Martin Atukunda | last post by:
On this page: http://techdocs.postgresql.org/techdocs/ this section: Development the link: http://techdocs.postgresql.org/redir.php?link=http://www.postgresql.org/docs/momjian/writing_apps is broken.
28
3324
by: Craig Cockburn | last post by:
I have a tool which tells me the number of times that visitors attempt to access a link from my site to an external site and what the response code received was. In the event of the remote site returning an error code, they are not sent to the remote site - why bother, it wouldn't work! Since I have over 1000 external links, this allows me to locate the broken links that people see the most often and fix those first. Conventional link...
11
2905
by: sweetbox | last post by:
Hi Guys! I am new on this forum and I am hoping that I could acquire help to improve my PERL skill. My goal is to be able to create a report of all broken links from a website. Based from my understanding, PERL can go through to server directory and search for htm/html file and put it on an array. From this array I believe PERL can search for a "http://" string and emulate to click the URL. If the URL contains "Page cannot be found" then...
4
1373
by: Mel | last post by:
My link does not work. I want to create a link to a file that exists on a different server. The path to the file is this "\\Junebug\Groups \90000003.pdf". My web server is "10.10.1.111". How do I get this link to work? When I hover over the link the path looks like this: http://10.10.1.111/\\Juno\Groups\90000003.pdf
8
2297
by: punk86 | last post by:
Hi, i been working on this codes but i keep getting broken links for the pictures. Im using apache. Need help for this please. I think its just the codes in my index.php is wrong and i do not know what is the solution. Code for my addstar.php <?php $HOST = 'localhost'; $USERNAME = 'root'; $PASSWORD = '';
0
9489
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9298
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
9885
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9737
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8737
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7286
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6562
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5172
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3829
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.