473,788 Members | 2,733 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Ferreting out broken links

Is it difficult to write a program that, given an array of URLs, will probe
each one, and return a status of Found or Not Found? How would you approach
it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave
Nov 21 '05 #1
7 1818
Try something like this. It is not the most intelligent nor elegant
solution, but it will get you what you want.

Dim aList As ArrayList
Dim qXML As Xml.XmlDocument
qXML = New Xml.XmlDocument

aList = New ArrayList

Dim oml As mylist
With oml
.SiteIndex = 1
.SiteURL = "http://www.microsoft.c om"
.SiteValidFlagB oolean = False
End With

aList.Add(oml)

With oml
.SiteIndex = 1
.SiteURL = "http://www.yourdomain. com"
.SiteValidFlagB oolean = False
End With

aList.Add(oml)

For Each oml In aList

Try
qXML.Load(oml.S iteURL)
oml.SiteValidFl agBoolean = True
Catch exxml As System.Xml.XmlE xception
'Page loaded, but was not parsable by xml
oml.SiteValidFl agBoolean = True
Catch exweb As System.net.WebE xception
'Page Not Found
If exweb.ToString. IndexOf("404") > 0 Then
oml.SiteValidFl agBoolean = False
Else
'Some Other Net Message, prolly domain not found.
MsgBox(exweb.To String)
End If
Catch ex As Exception
MsgBox(ex.ToStr ing)
End Try

Next
"Dave" <da************ *************** **@stic.net> wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. ..
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave

Nov 21 '05 #2
Dave,

Not exactly sure what you are wanting but it might be similar to a function
I use in one of my apps. You can call this function within in a loop, and if
you don't receive a reponse it will catch the exception, it uses the MSHTML
class. It might be a little more than you need but might be what you are
looking for.

Public Function Send(ByVal URL As String, _
Optional ByVal PostData As String = "", _
Optional ByVal Method As HTTPMethod = HTTPMethod.HTTP _GET, _
Optional ByVal ContentType As String = "") As String
Dim Request As HttpWebRequest = WebRequest.Crea te(URL)
Dim Response As HttpWebResponse
Dim SW As StreamWriter
Dim SR As StreamReader
Dim ResponseData As String
Dim I As Integer
Dim RcookCon As New CookieContainer

' Prepare Request Object
Request.Method = Method.ToString ().Substring(5)
Request.KeepAli ve = True
Request.AllowAu toRedirect = True
If HldCookCon.Coun t > 0 Then
RcookCon = HldCookCon
End If
Request.CookieC ontainer = RcookCon

' Set form/post content-type if necessary
If (Method = HTTPMethod.HTTP _POST AndAlso PostData <> "" AndAlso
ContentType = "") Then
ContentType = "applicatio n/x-www-form-urlencoded"
End If

' Set Content-Type
If (ContentType <> "") Then
Request.Content Type = ContentType
Request.Content Length = PostData.Length
End If

' Send Request, If Request
If (Method = HTTPMethod.HTTP _POST) Then
Try
SW = New StreamWriter(Re quest.GetReques tStream())
SW.Write(PostDa ta)
Catch Err As WebException
MsgBox(Err.Mess age, MsgBoxStyle.Inf ormation, "Error")

Finally
Try
SW.Close()
Catch
'Don't process an error from SW not closing
End Try
End Try
End If
'Get Response
Try
Response = Request.GetResp onse()
SR = New StreamReader(Re sponse.GetRespo nseStream())
ResponseData = SR.ReadToEnd()
'Display cookies
For I = 0 To Response.Cookie s.Count - 1
HldCookCon.Add( Response.Cookie s.Item(I))
Next
Catch Err As WebException
Return False
Finally
Try
SR.Close()
Catch
'Don't process an error from SR not closing
End Try
End Try
Return ResponseData
End Function

Curtis

"Dave" <da************ *************** **@stic.net> wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. ..
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave

Nov 21 '05 #3
In article <#1************ **@TK2MSFTNGP14 .phx.gbl>, Dave wrote:
Is it difficult to write a program that, given an array of URLs, will probe
each one, and return a status of Found or Not Found? How would you approach
it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave


hmmm...

Should be fairly straight forward using the System.Net.WebC lient class.
Or better yet, would probably be the System.Net.Http WebRequest class...

Something like:

Dim request As HttpWebRequest
Dim response As HttpWebResponse

For Each url As String In urls
request = WebRequest.Crea te (url)
response = request.GetResp onse ()

If Response.Status Code = 404 Then
Console.WriteLi ne ("Not Found")
Else
Console.WriteLi ne ("Found")
End If
Next

Actually, you might want to do a more take a closer look at the
StatusCode :)
--
Tom Shelton [MVP]
Nov 21 '05 #4
Amdrit, Curtis, Tom,

I haven't worked all these details out yet, but I didn't want too much time
to pass before I said thanks.

I have only had time to try Curtis & Tom's solution. The
Response.GetRes ponse() line is giving me a problem when I try to use
https:// sites because they cannot establish a trust relationship. This may
not be fatal - it may be acceptable to just inform my users that they will
have to manually check those.

I'm getting a problem when I hit a 404; Tom's line "If Response.Status Code
= 404 Then" just doesn't work.

I get the crash screen with the message "The remote server returned an
error: (404) Not Found"
So, I'm going to try to work with those methods, then I'll try Amdrit's
method if I can't get them good to go.

I really do appreciate the responses. I had no clue how to approach this
problem, and now I have plenty to work with.

Thanks again.

Dave


"Dave" <da************ *************** **@stic.net> wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. ..
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave

Nov 21 '05 #5
In article <#$************ *@TK2MSFTNGP14. phx.gbl>, Dave wrote:
Amdrit, Curtis, Tom,

I haven't worked all these details out yet, but I didn't want too much time
to pass before I said thanks.

I have only had time to try Curtis & Tom's solution. The
Response.GetRes ponse() line is giving me a problem when I try to use
https:// sites because they cannot establish a trust relationship. This may
not be fatal - it may be acceptable to just inform my users that they will
have to manually check those.

I'm getting a problem when I hit a 404; Tom's line "If Response.Status Code
= 404 Then" just doesn't work.

I get the crash screen with the message "The remote server returned an
error: (404) Not Found"
So, I'm going to try to work with those methods, then I'll try Amdrit's
method if I can't get them good to go.

I really do appreciate the responses. I had no clue how to approach this
problem, and now I have plenty to work with.

Thanks again.

Dave


"Dave" <da************ *************** **@stic.net> wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. ..
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave



Dave,

Sounds like you might need to refine what I wrote some :) I did that
off the top, and didn't test any of that - so it may not be exactly
right. Another method that's a bit more work, but not to bad, is to
simply open a socket connection to the server and make the http request
your self. This would avoid the "non-trust" issues and exceptions :)

--
Tom Shelton [MVP]
Nov 21 '05 #6
Dave,

This should fix the trusted relationship problem. Its kinda of a work around
that I found. Here is a link to an explanation of it:
http://gotdotnet.com/Community/Messa....aspx?id=40795. I have
implemented it in VB.net by creating a new class with the following code.

Imports System.Net
Imports System.Security .Cryptography.X 509Certificates
Public Class myCertificatePo licy
Implements ICertificatePol icy
Public Function CheckValidation Result(ByVal srvPoint As ServicePoint, _
ByVal cert As X509Certificate , ByVal request As
WebRequest, _
ByVal certificateProb lem As Integer) _
As Boolean Implements
ICertificatePol icy.CheckValida tionResult
'Return True to force the certificate to be accepted.
Return True
End Function
End Class

You would then call the class with this line in your application:

'force the certificate to be accepted
System.Net.Serv icePointManager .CertificatePol icy = New myCertificatePo licy

This basically overrides a "non-trusted connection" by making your
application alway accept the certificates.

Curtis

"Dave" <da************ *************** **@stic.net> wrote in message
news:%2******** *******@TK2MSFT NGP14.phx.gbl.. .
Amdrit, Curtis, Tom,

I haven't worked all these details out yet, but I didn't want too much
time to pass before I said thanks.

I have only had time to try Curtis & Tom's solution. The
Response.GetRes ponse() line is giving me a problem when I try to use
https:// sites because they cannot establish a trust relationship. This
may not be fatal - it may be acceptable to just inform my users that they
will have to manually check those.

I'm getting a problem when I hit a 404; Tom's line "If
Response.Status Code = 404 Then" just doesn't work.

I get the crash screen with the message "The remote server returned an
error: (404) Not Found"
So, I'm going to try to work with those methods, then I'll try Amdrit's
method if I can't get them good to go.

I really do appreciate the responses. I had no clue how to approach this
problem, and now I have plenty to work with.

Thanks again.

Dave


"Dave" <da************ *************** **@stic.net> wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. ..
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave


Nov 21 '05 #7
Tom & Curtis,

You guys are too much. Thanks.

Dave
Nov 21 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
3331
by: Chris Hemingway | last post by:
Hi I have an html file which links to word docs amongst other things; but these files and their location may change periodically. How can I adapt my html so that if the files do not exist, the links are hidden i.e. how do I hide broken links? Chris
18
1682
by: John at Free Design | last post by:
Thanks in advance for the help and let me know if this should be posted to another MSDN. I am developing a web-based application in VS.NET2003 using VB. When I insert a graphic into a project page it shows just fine in design mode. However, when I run the app in debug or directly from a browser all the graphic links in the app show they are broken. Checking the properties shows that they are pointing to the correct folder. I'm not...
3
2095
by: rbt | last post by:
How can I find broken links (links that point to files that do not exist) in a directory and remove them using Python? I'm working on RHEL4 Thanks, rbt
1
1864
by: talyabn | last post by:
Hi, I'm trying to invoke the 'Broken Hyperlinks' option in the FrontPage application. The problem is that I get all the links in a given HTML page instead of getting only the broken links. I'm using automation in my Visual Basic program and I'd like to know if there is any way to get only the broken links in a web page.
7
2265
by: Jacob | last post by:
Has anybody else encountered a problem when running your asp.net applications off your localhost and having broken image links? The weird thing is, the links aren't really broken. The reference is correct. And what's weirder that than, it will only do it sporadically. I can refresh the screen and those images that were once broken are now visible but some others may now be broken. Or sometimes they all work, sometimes none. This...
28
3358
by: Craig Cockburn | last post by:
I have a tool which tells me the number of times that visitors attempt to access a link from my site to an external site and what the response code received was. In the event of the remote site returning an error code, they are not sent to the remote site - why bother, it wouldn't work! Since I have over 1000 external links, this allows me to locate the broken links that people see the most often and fix those first. Conventional link...
2
1597
by: vbgunz | last post by:
Hello! this is the main error: http://img406.imageshack.us/img406/5218/screenshotxchmerror1ae.png navigation link images broken here: http://img406.imageshack.us/img406/2822/screenshotxchmv12python24docum.png when I first open up the docs, the main page and Global Module Index links in the tree are unaccessible. They give me errors. While
8
5245
by: sristhrashguy | last post by:
Hi everyone, i want .net(VB or C#) code for finding broken links in a website. The requirement is that the user will be able to type the url in a text box so once the button is clicked , it has to show whether there are any broken links in that particular page. Please help me out in this. Thanks Sridhar.S
2
2095
by: george.alliger | last post by:
Interdev used to be able to check the links on a website and generate a report with a broken links. How do I do this in VS2008? Thanks, G
0
9656
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10370
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10177
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10113
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9969
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7519
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6750
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4074
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2896
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.