473,387 Members | 1,611 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Ferreting out broken links

Is it difficult to write a program that, given an array of URLs, will probe
each one, and return a status of Found or Not Found? How would you approach
it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave
Nov 21 '05 #1
7 1796
Try something like this. It is not the most intelligent nor elegant
solution, but it will get you what you want.

Dim aList As ArrayList
Dim qXML As Xml.XmlDocument
qXML = New Xml.XmlDocument

aList = New ArrayList

Dim oml As mylist
With oml
.SiteIndex = 1
.SiteURL = "http://www.microsoft.com"
.SiteValidFlagBoolean = False
End With

aList.Add(oml)

With oml
.SiteIndex = 1
.SiteURL = "http://www.yourdomain.com"
.SiteValidFlagBoolean = False
End With

aList.Add(oml)

For Each oml In aList

Try
qXML.Load(oml.SiteURL)
oml.SiteValidFlagBoolean = True
Catch exxml As System.Xml.XmlException
'Page loaded, but was not parsable by xml
oml.SiteValidFlagBoolean = True
Catch exweb As System.net.WebException
'Page Not Found
If exweb.ToString.IndexOf("404") > 0 Then
oml.SiteValidFlagBoolean = False
Else
'Some Other Net Message, prolly domain not found.
MsgBox(exweb.ToString)
End If
Catch ex As Exception
MsgBox(ex.ToString)
End Try

Next
"Dave" <da*****************************@stic.net> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave

Nov 21 '05 #2
Dave,

Not exactly sure what you are wanting but it might be similar to a function
I use in one of my apps. You can call this function within in a loop, and if
you don't receive a reponse it will catch the exception, it uses the MSHTML
class. It might be a little more than you need but might be what you are
looking for.

Public Function Send(ByVal URL As String, _
Optional ByVal PostData As String = "", _
Optional ByVal Method As HTTPMethod = HTTPMethod.HTTP_GET, _
Optional ByVal ContentType As String = "") As String
Dim Request As HttpWebRequest = WebRequest.Create(URL)
Dim Response As HttpWebResponse
Dim SW As StreamWriter
Dim SR As StreamReader
Dim ResponseData As String
Dim I As Integer
Dim RcookCon As New CookieContainer

' Prepare Request Object
Request.Method = Method.ToString().Substring(5)
Request.KeepAlive = True
Request.AllowAutoRedirect = True
If HldCookCon.Count > 0 Then
RcookCon = HldCookCon
End If
Request.CookieContainer = RcookCon

' Set form/post content-type if necessary
If (Method = HTTPMethod.HTTP_POST AndAlso PostData <> "" AndAlso
ContentType = "") Then
ContentType = "application/x-www-form-urlencoded"
End If

' Set Content-Type
If (ContentType <> "") Then
Request.ContentType = ContentType
Request.ContentLength = PostData.Length
End If

' Send Request, If Request
If (Method = HTTPMethod.HTTP_POST) Then
Try
SW = New StreamWriter(Request.GetRequestStream())
SW.Write(PostData)
Catch Err As WebException
MsgBox(Err.Message, MsgBoxStyle.Information, "Error")

Finally
Try
SW.Close()
Catch
'Don't process an error from SW not closing
End Try
End Try
End If
'Get Response
Try
Response = Request.GetResponse()
SR = New StreamReader(Response.GetResponseStream())
ResponseData = SR.ReadToEnd()
'Display cookies
For I = 0 To Response.Cookies.Count - 1
HldCookCon.Add(Response.Cookies.Item(I))
Next
Catch Err As WebException
Return False
Finally
Try
SR.Close()
Catch
'Don't process an error from SR not closing
End Try
End Try
Return ResponseData
End Function

Curtis

"Dave" <da*****************************@stic.net> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave

Nov 21 '05 #3
In article <#1**************@TK2MSFTNGP14.phx.gbl>, Dave wrote:
Is it difficult to write a program that, given an array of URLs, will probe
each one, and return a status of Found or Not Found? How would you approach
it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave


hmmm...

Should be fairly straight forward using the System.Net.WebClient class.
Or better yet, would probably be the System.Net.HttpWebRequest class...

Something like:

Dim request As HttpWebRequest
Dim response As HttpWebResponse

For Each url As String In urls
request = WebRequest.Create (url)
response = request.GetResponse ()

If Response.StatusCode = 404 Then
Console.WriteLine ("Not Found")
Else
Console.WriteLine ("Found")
End If
Next

Actually, you might want to do a more take a closer look at the
StatusCode :)
--
Tom Shelton [MVP]
Nov 21 '05 #4
Amdrit, Curtis, Tom,

I haven't worked all these details out yet, but I didn't want too much time
to pass before I said thanks.

I have only had time to try Curtis & Tom's solution. The
Response.GetResponse() line is giving me a problem when I try to use
https:// sites because they cannot establish a trust relationship. This may
not be fatal - it may be acceptable to just inform my users that they will
have to manually check those.

I'm getting a problem when I hit a 404; Tom's line "If Response.StatusCode
= 404 Then" just doesn't work.

I get the crash screen with the message "The remote server returned an
error: (404) Not Found"
So, I'm going to try to work with those methods, then I'll try Amdrit's
method if I can't get them good to go.

I really do appreciate the responses. I had no clue how to approach this
problem, and now I have plenty to work with.

Thanks again.

Dave


"Dave" <da*****************************@stic.net> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave

Nov 21 '05 #5
In article <#$*************@TK2MSFTNGP14.phx.gbl>, Dave wrote:
Amdrit, Curtis, Tom,

I haven't worked all these details out yet, but I didn't want too much time
to pass before I said thanks.

I have only had time to try Curtis & Tom's solution. The
Response.GetResponse() line is giving me a problem when I try to use
https:// sites because they cannot establish a trust relationship. This may
not be fatal - it may be acceptable to just inform my users that they will
have to manually check those.

I'm getting a problem when I hit a 404; Tom's line "If Response.StatusCode
= 404 Then" just doesn't work.

I get the crash screen with the message "The remote server returned an
error: (404) Not Found"
So, I'm going to try to work with those methods, then I'll try Amdrit's
method if I can't get them good to go.

I really do appreciate the responses. I had no clue how to approach this
problem, and now I have plenty to work with.

Thanks again.

Dave


"Dave" <da*****************************@stic.net> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave



Dave,

Sounds like you might need to refine what I wrote some :) I did that
off the top, and didn't test any of that - so it may not be exactly
right. Another method that's a bit more work, but not to bad, is to
simply open a socket connection to the server and make the http request
your self. This would avoid the "non-trust" issues and exceptions :)

--
Tom Shelton [MVP]
Nov 21 '05 #6
Dave,

This should fix the trusted relationship problem. Its kinda of a work around
that I found. Here is a link to an explanation of it:
http://gotdotnet.com/Community/Messa....aspx?id=40795. I have
implemented it in VB.net by creating a new class with the following code.

Imports System.Net
Imports System.Security.Cryptography.X509Certificates
Public Class myCertificatePolicy
Implements ICertificatePolicy
Public Function CheckValidationResult(ByVal srvPoint As ServicePoint, _
ByVal cert As X509Certificate, ByVal request As
WebRequest, _
ByVal certificateProblem As Integer) _
As Boolean Implements
ICertificatePolicy.CheckValidationResult
'Return True to force the certificate to be accepted.
Return True
End Function
End Class

You would then call the class with this line in your application:

'force the certificate to be accepted
System.Net.ServicePointManager.CertificatePolicy = New myCertificatePolicy

This basically overrides a "non-trusted connection" by making your
application alway accept the certificates.

Curtis

"Dave" <da*****************************@stic.net> wrote in message
news:%2***************@TK2MSFTNGP14.phx.gbl...
Amdrit, Curtis, Tom,

I haven't worked all these details out yet, but I didn't want too much
time to pass before I said thanks.

I have only had time to try Curtis & Tom's solution. The
Response.GetResponse() line is giving me a problem when I try to use
https:// sites because they cannot establish a trust relationship. This
may not be fatal - it may be acceptable to just inform my users that they
will have to manually check those.

I'm getting a problem when I hit a 404; Tom's line "If
Response.StatusCode = 404 Then" just doesn't work.

I get the crash screen with the message "The remote server returned an
error: (404) Not Found"
So, I'm going to try to work with those methods, then I'll try Amdrit's
method if I can't get them good to go.

I really do appreciate the responses. I had no clue how to approach this
problem, and now I have plenty to work with.

Thanks again.

Dave


"Dave" <da*****************************@stic.net> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Is it difficult to write a program that, given an array of URLs, will
probe each one, and return a status of Found or Not Found? How would you
approach it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave


Nov 21 '05 #7
Tom & Curtis,

You guys are too much. Thanks.

Dave
Nov 21 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Chris Hemingway | last post by:
Hi I have an html file which links to word docs amongst other things; but these files and their location may change periodically. How can I adapt my html so that if the files do not exist, the...
18
by: John at Free Design | last post by:
Thanks in advance for the help and let me know if this should be posted to another MSDN. I am developing a web-based application in VS.NET2003 using VB. When I insert a graphic into a project...
3
by: rbt | last post by:
How can I find broken links (links that point to files that do not exist) in a directory and remove them using Python? I'm working on RHEL4 Thanks, rbt
1
by: talyabn | last post by:
Hi, I'm trying to invoke the 'Broken Hyperlinks' option in the FrontPage application. The problem is that I get all the links in a given HTML page instead of getting only the broken links. ...
7
by: Jacob | last post by:
Has anybody else encountered a problem when running your asp.net applications off your localhost and having broken image links? The weird thing is, the links aren't really broken. The reference...
28
by: Craig Cockburn | last post by:
I have a tool which tells me the number of times that visitors attempt to access a link from my site to an external site and what the response code received was. In the event of the remote site...
2
by: vbgunz | last post by:
Hello! this is the main error: http://img406.imageshack.us/img406/5218/screenshotxchmerror1ae.png navigation link images broken here:...
8
by: sristhrashguy | last post by:
Hi everyone, i want .net(VB or C#) code for finding broken links in a website. The requirement is that the user will be able to type the url in a text box so once the button is...
2
by: george.alliger | last post by:
Interdev used to be able to check the links on a website and generate a report with a broken links. How do I do this in VS2008? Thanks, G
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.