By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,219 Members | 2,361 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,219 IT Pros & Developers. It's quick & easy.

Ferreting out broken links

P: n/a
Is it difficult to write a program that, given an array of URLs, will probe
each one, and return a status of Found or Not Found? How would you approach
it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave
Nov 21 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Try something like this. It is not the most intelligent nor elegant
solution, but it will get you what you want.

Dim aList As ArrayList
Dim qXML As Xml.XmlDocument
qXML = New Xml.XmlDocument

aList = New ArrayList

Dim oml As mylist
With oml
.SiteIndex = 1
.SiteURL = "http://www.microsoft.com"
.SiteValidFlagBoolean = False
End With

aList.Add(oml)

With oml
.SiteIndex = 1
.SiteURL = "http://www.yourdomain.com"
.SiteValidFlagBoolean = False
End With

aList.Add(oml)

For Each oml In aList

Try
qXML.Load(oml.SiteURL)
oml.SiteValidFlagBoolean = True
Catch exxml As System.Xml.XmlException
'Page loaded, but was not parsable by xml
oml.SiteValidFlagBoolean = True
Catch exweb As System.net.WebException
'Page Not Found
If exweb.ToString.IndexOf("404") > 0 Then
oml.SiteValidFlagBoolean = False
Else
'Some Other Net Message, prolly domain not found.
MsgBox(exweb.ToString)
End If
Catch ex As Exception
MsgBox(ex.ToString)
End Try

Next
"Dave" <da*****************************@stic.net> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave

Nov 21 '05 #2

P: n/a
Dave,

Not exactly sure what you are wanting but it might be similar to a function
I use in one of my apps. You can call this function within in a loop, and if
you don't receive a reponse it will catch the exception, it uses the MSHTML
class. It might be a little more than you need but might be what you are
looking for.

Public Function Send(ByVal URL As String, _
Optional ByVal PostData As String = "", _
Optional ByVal Method As HTTPMethod = HTTPMethod.HTTP_GET, _
Optional ByVal ContentType As String = "") As String
Dim Request As HttpWebRequest = WebRequest.Create(URL)
Dim Response As HttpWebResponse
Dim SW As StreamWriter
Dim SR As StreamReader
Dim ResponseData As String
Dim I As Integer
Dim RcookCon As New CookieContainer

' Prepare Request Object
Request.Method = Method.ToString().Substring(5)
Request.KeepAlive = True
Request.AllowAutoRedirect = True
If HldCookCon.Count > 0 Then
RcookCon = HldCookCon
End If
Request.CookieContainer = RcookCon

' Set form/post content-type if necessary
If (Method = HTTPMethod.HTTP_POST AndAlso PostData <> "" AndAlso
ContentType = "") Then
ContentType = "application/x-www-form-urlencoded"
End If

' Set Content-Type
If (ContentType <> "") Then
Request.ContentType = ContentType
Request.ContentLength = PostData.Length
End If

' Send Request, If Request
If (Method = HTTPMethod.HTTP_POST) Then
Try
SW = New StreamWriter(Request.GetRequestStream())
SW.Write(PostData)
Catch Err As WebException
MsgBox(Err.Message, MsgBoxStyle.Information, "Error")

Finally
Try
SW.Close()
Catch
'Don't process an error from SW not closing
End Try
End Try
End If
'Get Response
Try
Response = Request.GetResponse()
SR = New StreamReader(Response.GetResponseStream())
ResponseData = SR.ReadToEnd()
'Display cookies
For I = 0 To Response.Cookies.Count - 1
HldCookCon.Add(Response.Cookies.Item(I))
Next
Catch Err As WebException
Return False
Finally
Try
SR.Close()
Catch
'Don't process an error from SR not closing
End Try
End Try
Return ResponseData
End Function

Curtis

"Dave" <da*****************************@stic.net> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave

Nov 21 '05 #3

P: n/a
In article <#1**************@TK2MSFTNGP14.phx.gbl>, Dave wrote:
Is it difficult to write a program that, given an array of URLs, will probe
each one, and return a status of Found or Not Found? How would you approach
it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave


hmmm...

Should be fairly straight forward using the System.Net.WebClient class.
Or better yet, would probably be the System.Net.HttpWebRequest class...

Something like:

Dim request As HttpWebRequest
Dim response As HttpWebResponse

For Each url As String In urls
request = WebRequest.Create (url)
response = request.GetResponse ()

If Response.StatusCode = 404 Then
Console.WriteLine ("Not Found")
Else
Console.WriteLine ("Found")
End If
Next

Actually, you might want to do a more take a closer look at the
StatusCode :)
--
Tom Shelton [MVP]
Nov 21 '05 #4

P: n/a
Amdrit, Curtis, Tom,

I haven't worked all these details out yet, but I didn't want too much time
to pass before I said thanks.

I have only had time to try Curtis & Tom's solution. The
Response.GetResponse() line is giving me a problem when I try to use
https:// sites because they cannot establish a trust relationship. This may
not be fatal - it may be acceptable to just inform my users that they will
have to manually check those.

I'm getting a problem when I hit a 404; Tom's line "If Response.StatusCode
= 404 Then" just doesn't work.

I get the crash screen with the message "The remote server returned an
error: (404) Not Found"
So, I'm going to try to work with those methods, then I'll try Amdrit's
method if I can't get them good to go.

I really do appreciate the responses. I had no clue how to approach this
problem, and now I have plenty to work with.

Thanks again.

Dave


"Dave" <da*****************************@stic.net> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave

Nov 21 '05 #5

P: n/a
In article <#$*************@TK2MSFTNGP14.phx.gbl>, Dave wrote:
Amdrit, Curtis, Tom,

I haven't worked all these details out yet, but I didn't want too much time
to pass before I said thanks.

I have only had time to try Curtis & Tom's solution. The
Response.GetResponse() line is giving me a problem when I try to use
https:// sites because they cannot establish a trust relationship. This may
not be fatal - it may be acceptable to just inform my users that they will
have to manually check those.

I'm getting a problem when I hit a 404; Tom's line "If Response.StatusCode
= 404 Then" just doesn't work.

I get the crash screen with the message "The remote server returned an
error: (404) Not Found"
So, I'm going to try to work with those methods, then I'll try Amdrit's
method if I can't get them good to go.

I really do appreciate the responses. I had no clue how to approach this
problem, and now I have plenty to work with.

Thanks again.

Dave


"Dave" <da*****************************@stic.net> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave



Dave,

Sounds like you might need to refine what I wrote some :) I did that
off the top, and didn't test any of that - so it may not be exactly
right. Another method that's a bit more work, but not to bad, is to
simply open a socket connection to the server and make the http request
your self. This would avoid the "non-trust" issues and exceptions :)

--
Tom Shelton [MVP]
Nov 21 '05 #6

P: n/a
Dave,

This should fix the trusted relationship problem. Its kinda of a work around
that I found. Here is a link to an explanation of it:
http://gotdotnet.com/Community/Messa....aspx?id=40795. I have
implemented it in VB.net by creating a new class with the following code.

Imports System.Net
Imports System.Security.Cryptography.X509Certificates
Public Class myCertificatePolicy
Implements ICertificatePolicy
Public Function CheckValidationResult(ByVal srvPoint As ServicePoint, _
ByVal cert As X509Certificate, ByVal request As
WebRequest, _
ByVal certificateProblem As Integer) _
As Boolean Implements
ICertificatePolicy.CheckValidationResult
'Return True to force the certificate to be accepted.
Return True
End Function
End Class

You would then call the class with this line in your application:

'force the certificate to be accepted
System.Net.ServicePointManager.CertificatePolicy = New myCertificatePolicy

This basically overrides a "non-trusted connection" by making your
application alway accept the certificates.

Curtis

"Dave" <da*****************************@stic.net> wrote in message
news:%2***************@TK2MSFTNGP14.phx.gbl...
Amdrit, Curtis, Tom,

I haven't worked all these details out yet, but I didn't want too much
time to pass before I said thanks.

I have only had time to try Curtis & Tom's solution. The
Response.GetResponse() line is giving me a problem when I try to use
https:// sites because they cannot establish a trust relationship. This
may not be fatal - it may be acceptable to just inform my users that they
will have to manually check those.

I'm getting a problem when I hit a 404; Tom's line "If
Response.StatusCode = 404 Then" just doesn't work.

I get the crash screen with the message "The remote server returned an
error: (404) Not Found"
So, I'm going to try to work with those methods, then I'll try Amdrit's
method if I can't get them good to go.

I really do appreciate the responses. I had no clue how to approach this
problem, and now I have plenty to work with.

Thanks again.

Dave


"Dave" <da*****************************@stic.net> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave


Nov 21 '05 #7

P: n/a
Tom & Curtis,

You guys are too much. Thanks.

Dave
Nov 21 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.