By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,247 Members | 1,287 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,247 IT Pros & Developers. It's quick & easy.

A broken link preventer

P: n/a
I have a tool which tells me the number of times that visitors attempt
to access a link from my site to an external site and what the response
code received was. In the event of the remote site returning an error
code, they are not sent to the remote site - why bother, it wouldn't
work!

Since I have over 1000 external links, this allows me to locate the
broken links that people see the most often and fix those first.
Conventional link checkers offer a complimentary service and detect
instances of broken links rather than instances of frequency seen.
The output from the program can generate reports based on time, link
accessed, page on my site where the link occurred and so on.

This means that on my site, I now have much better control over what
happens if the visitor would see a 404 on an external link and I can
offer them more options.

Try it out here
http://www.siliconglen.com/Scotland/2_2.html

Whilst accepting that broken links are a generally bad thing, this tool
at least helps me to manage them more effectively.

comments, feedback welcome. This is an early release so there may be
bugs but I hope not :-)

--
Craig Cockburn ("coburn"). http://www.SiliconGlen.com/
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, weddings, website design, stop spam and more!
Dec 6 '05 #1
Share this Question
Share on Google+
28 Replies


P: n/a
Craig Cockburn wrote:
I have a tool which tells me the number of times that visitors attempt
to access a link from my site to an external site and what the response
code received was. In the event of the remote site returning an error
code, they are not sent to the remote site - why bother, it wouldn't work!

Since I have over 1000 external links, this allows me to locate the
broken links that people see the most often and fix those first.
Conventional link checkers offer a complimentary service and detect
instances of broken links rather than instances of frequency seen.
The output from the program can generate reports based on time, link
accessed, page on my site where the link occurred and so on.

This means that on my site, I now have much better control over what
happens if the visitor would see a 404 on an external link and I can
offer them more options.

Try it out here
http://www.siliconglen.com/Scotland/2_2.html

Whilst accepting that broken links are a generally bad thing, this tool
at least helps me to manage them more effectively.

comments, feedback welcome. This is an early release so there may be
bugs but I hope not :-)

Craig,

Seems to work here and the suggestions provided to the user are helpful.

1. Rather than depend on UCSD, I'd suggest you provide your own
explanation of the error, showing only the one appropriate to the
immediate situation.

2. I used Netscape 7.1. When I see a list of links like those in your
example, I tend to keep the page with the list open in one tab, then
right click on each link I'm interested in and select "Open in new tab"
from the resulting popup menu. But something in your code prevents that
option (and several others) from appearing in the popup.

Chris Beall

Dec 6 '05 #2

P: n/a
Krustov wrote:
TMK if a website uses custom 404 pages then it wont show up as a broken
link .


Sometimes yes. But well configured web servers return 404 headers even
when displaying a custom 404 page. There are, of course, many badly
configured web servers out there.

Steve

Dec 6 '05 #3

P: n/a
"Steve Pugh" wrote:
Krustov wrote:
TMK if a website uses custom 404 pages then it wont show up as a broken
link .


Sometimes yes. But well configured web servers return 404 headers even
when displaying a custom 404 page. There are, of course, many badly
configured web servers out there.


I think the most common mistake is to use a fully qualified URL in the
ErrorDocument directive. For example:

ErrorDocument 404 http://example.com/error-docs/not_found.html

will cause the server to issue a 301 redirect header to the error page when
it can't find the requested document. The eror page will then be served with
a '200 OK" header.

It's all explained in the Apache documentation.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/

Dec 6 '05 #4

P: n/a
On Tue, 06 Dec 2005 14:15:50 GMT, Philip Ronan
<in*****@invalid.invalid> wrote:
"Steve Pugh" wrote:
Krustov wrote:
TMK if a website uses custom 404 pages then it wont show up as a broken
link .


Sometimes yes. But well configured web servers return 404 headers even
when displaying a custom 404 page. There are, of course, many badly
configured web servers out there.


I think the most common mistake is to use a fully qualified URL in the
ErrorDocument directive. For example:

ErrorDocument 404 http://example.com/error-docs/not_found.html

will cause the server to issue a 301 redirect header to the error page when
it can't find the requested document. The eror page will then be served with
a '200 OK" header.

It's all explained in the Apache documentation.


Explained? That's an interesting term to use with regard to the Apache
documentation! I find the Apache documentation to be slightly less
intelligible than if it were written in Ancient Greek.

And as this is being widely cross-posted, perhaps a challenge could go
out for another techinal author - one who can decipher the Apache
documentation - to produce a version which can be widely understood.

Matt
--
The Probert Encyclopaedia - Beyond Britannica
http://www.probertencyclopaedia.com
Dec 6 '05 #5

P: n/a
"Matt Probert" wrote:
Explained? That's an interesting term to use with regard to the Apache
documentation! I find the Apache documentation to be slightly less
intelligible than if it were written in Ancient Greek.


This seems perfectly clear to me:
Note that when you specify an ErrorDocument that points to a remote URL
(ie. anything with a method such as "http" in front of it), Apache will
send a redirect to the client to tell it where to find the document,
even if the document ends up being on the same server. This has several
implications, the most important being that the client will not receive
the original error status code, but instead will receive a redirect
status code. This in turn can confuse web robots and other clients
which try to determine if a URL is valid using the status code.


<http://httpd.apache.org/docs/1.3/mod/core.html#errordocument>

Where's the problem?

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/

Dec 6 '05 #6

P: n/a
Matt Probert wrote:
Explained? That's an interesting term to use with regard to the Apache
documentation! I find the Apache documentation to be slightly less
intelligible than if it were written in Ancient Greek.


OK, here's a simple challenge. Find another complex product with
documentation that's more readable than Apache's, while not being
misleading or downright wrong.

--
Nick Kew
Dec 6 '05 #7

P: n/a
Nick Kew wrote:
Matt Probert wrote:
Explained? That's an interesting term to use with regard to the Apache
documentation! I find the Apache documentation to be slightly less
intelligible than if it were written in Ancient Greek.

OK, here's a simple challenge. Find another complex product with
documentation that's more readable than Apache's, while not being
misleading or downright wrong.


MySQL
Microsoft's Visual Studio products
AutoCad
Websphere
Exim

To start.

Apache's documentation is some of the worst I've ever seen.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Dec 7 '05 #8

P: n/a
Jerry Stuckle wrote:
Nick Kew wrote:
Matt Probert wrote:
Explained? That's an interesting term to use with regard to the Apache
documentation! I find the Apache documentation to be slightly less
intelligible than if it were written in Ancient Greek.
OK, here's a simple challenge. Find another complex product with
documentation that's more readable than Apache's, while not being
misleading or downright wrong.


MySQL


Hmmm, that's a very readable manual, too.
Microsoft's Visual Studio products
You must be joking! Where do you find anything that isn't just a
longwinded explanation of how to use GUI menus? It certainly
never told me anything that wasn't bleedin' obvious.

Unlike back in the 1980s, when a microsoft manual was somewhat
helpful in learning C.
AutoCad
never used it.
Websphere
Put off even looking by the webpages and ambiguous license
(not sure if that's changed since IBM started to get more
serious about opensource).
Exim
Well, I chose postfix in preference when I last changed MTA,
and find postfix's documentation much harder than Apache's -
though nevertheless adequately workable.

To start.

Apache's documentation is some of the worst I've ever seen.


How so? Instead of whinging, how about some constructive criticism
that might offer some ideas for improving it?

--
Nick Kew
Dec 7 '05 #9

P: n/a
Nick Kew wrote:
Jerry Stuckle wrote:
Nick Kew wrote:
Matt Probert wrote:

Explained? That's an interesting term to use with regard to the Apache
documentation! I find the Apache documentation to be slightly less
intelligible than if it were written in Ancient Greek.


OK, here's a simple challenge. Find another complex product with
documentation that's more readable than Apache's, while not being
misleading or downright wrong.


MySQL

Hmmm, that's a very readable manual, too.
Microsoft's Visual Studio products

You must be joking! Where do you find anything that isn't just a
longwinded explanation of how to use GUI menus? It certainly
never told me anything that wasn't bleedin' obvious.

Unlike back in the 1980s, when a microsoft manual was somewhat
helpful in learning C.
AutoCad

never used it.
Websphere

Put off even looking by the webpages and ambiguous license
(not sure if that's changed since IBM started to get more
serious about opensource).
Exim

Well, I chose postfix in preference when I last changed MTA,
and find postfix's documentation much harder than Apache's -
though nevertheless adequately workable.

To start.

Apache's documentation is some of the worst I've ever seen.


How so? Instead of whinging, how about some constructive criticism
that might offer some ideas for improving it?


Let's see...

More examples on how to do things. More information on how different
commands interrelate. How to effectively use .htaccess (or place those
commands in your httpd.conf file if you have access to it).

And how about some developer documentation? There isn't anything other
than an old Apache 1.x book mainly written for Perl with C as a second
thought.

If the documentation is so good, why are there so many messages on
usenet by people trying to figure out how to do things?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Dec 7 '05 #10

P: n/a
Nick Kew wrote:
Jerry Stuckle wrote:
Nick Kew wrote:
Matt Probert wrote:

Explained? That's an interesting term to use with regard to the Apache
documentation! I find the Apache documentation to be slightly less
intelligible than if it were written in Ancient Greek.


OK, here's a simple challenge. Find another complex product with
documentation that's more readable than Apache's, while not being
misleading or downright wrong.


MySQL

Hmmm, that's a very readable manual, too.
Microsoft's Visual Studio products

You must be joking! Where do you find anything that isn't just a
longwinded explanation of how to use GUI menus? It certainly
never told me anything that wasn't bleedin' obvious.

Unlike back in the 1980s, when a microsoft manual was somewhat
helpful in learning C.
AutoCad

never used it.
Websphere

Put off even looking by the webpages and ambiguous license
(not sure if that's changed since IBM started to get more
serious about opensource).
Exim

Well, I chose postfix in preference when I last changed MTA,
and find postfix's documentation much harder than Apache's -
though nevertheless adequately workable.

To start.

Apache's documentation is some of the worst I've ever seen.


How so? Instead of whinging, how about some constructive criticism
that might offer some ideas for improving it?


Oh, and yes, I've found the Visual C++ documentation to be much better
than Apache's. I've taught a lot of classes in it, and by the end of
the week (they're one-week corporate classes) the students know enough
to get good information from the help files.

Sure, there's a lot on how to use the IDE. But there's a huge amount on
the Microsoft Foundation Classes, also - and it's very well organized.

Not to say I'm fond of MFC - I don't think its a great OO
implementation. But it's workable and well documented (if you load the
correct help files).
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Dec 7 '05 #11

P: n/a
In message <m3************@asgard.webthing.com>, Nick Kew
<ni**@asgard.webthing.com> writes
Matt Probert wrote:
Explained? That's an interesting term to use with regard to the Apache
documentation! I find the Apache documentation to be slightly less
intelligible than if it were written in Ancient Greek.


OK, here's a simple challenge. Find another complex product with
documentation that's more readable than Apache's, while not being
misleading or downright wrong.

This is all very well but back at the base article, how about some
feedback on my broken link preventer?

--
Craig Cockburn ("coburn"). http://www.SiliconGlen.com/
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, weddings, website design, stop spam and more!
Dec 7 '05 #12

P: n/a
Writing in
news:alt.http://www.webmaster,alt.html,comp.i...ftware.testing
From the safety of the Silicon Glen - Scotland's Internet cafeteria
Craig Cockburn <cr***@siliconglen.com> said:
...
This is all very well but back at the base article, how about some
feedback on my broken link preventer?


Welcome to usenet :)

--
William Tasso

Save the drama
for your Mama.
Dec 7 '05 #13

P: n/a
Jerry Stuckle wrote:

And how about some developer documentation? There isn't anything other
than an old Apache 1.x book mainly written for Perl with C as a second
thought.
Yep, that's a gap. Wait for the new book in the new year:-)
Meanwhile, some people find www.apachetutor.org helpful.
If the documentation is so good, why are there so many messages on
usenet by people trying to figure out how to do things?


For every person asking on usenet, there are a million just getting
on with it. Bear in mind, Apache is a product with three times
Microsoft's market share, and an altogether more helpful community
standing behind it. Of course there are all kinds of users, from
the expert, through the newbie capable of reading TFM, to the no-hoper.

--
Nick Kew
Dec 8 '05 #14

P: n/a
In article <g1************@asgard.webthing.com>,
Nick Kew <ni**@asgard.webthing.com> wrote:
For every person asking on usenet, there are a million just getting
on with it. Bear in mind, Apache is a product with three times
Microsoft's market share, and an altogether more helpful community
standing behind it. Of course there are all kinds of users, from
the expert, through the newbie capable of reading TFM, to the no-hoper.


Plus, I've never found a more helpful config file than httpd.conf. A
whole lot is explained there directly.

leo

--
<http://web0.greatbasin.net/~leo/>
Dec 8 '05 #15

P: n/a
Writing in
news:alt.http://www.webmaster,alt.html,comp.i...ftware.testing
From the safety of the Studio H cafeteria
Leonard Blaisdell <le*@greatbasin.com> said:
In article <g1************@asgard.webthing.com>,
Nick Kew <ni**@asgard.webthing.com> wrote:


[apache web server]
Of course there are all kinds of users, from
the expert, through the newbie capable of reading TFM, to the no-hoper.


Plus, I've never found a more helpful config file than httpd.conf. A
whole lot is explained there directly.


ok chaps, not quite sure what category of user you want to put me in - I
can take the knocks <g>, but while we have a collection of apache gurus
clustered around a hot steaming monitor, please indulge me while I repeat
an earlier question.

: Greetings One and All
:
: I have a domain hosted on linux/apache: http://example.com
:
: I use ProxyPass / http://192.168.1.111/
: and ProxyPassReverse / http://192.168.1.111/
:
: to deliver http://site2.example.com from a secondary server that is
: otherwise not connected to the internet.
:
: Is there a similar thingie I can use to deliver
: http://example.com/resource3 from that same secondary server?
:
: Thanks for reading - please let me know if I haven't made myself clear.

--
William Tasso

Save the drama
for your Mama.
Dec 8 '05 #16

P: n/a
In article <op*******************@tbdata.com>,
"William Tasso" <Sp*********@tbdata.com> wrote:
ok chaps, not quite sure what category of user you want to put me in - I
can take the knocks <g>, but while we have a collection of apache gurus
clustered around a hot steaming monitor, please indulge me while I repeat
an earlier question.

: Greetings One and All
:
: I have a domain hosted on linux/apache: http://example.com
:
: I use ProxyPass / http://192.168.1.111/
: and ProxyPassReverse / http://192.168.1.111/
:
: to deliver http://site2.example.com from a secondary server that is
: otherwise not connected to the internet.
:
: Is there a similar thingie I can use to deliver
: http://example.com/resource3 from that same secondary server?
:
: Thanks for reading - please let me know if I haven't made myself clear.


I know far less than you do. I hope Mr. Kew replies. If not, consider
posting the question in <news:comp.infosystems.www.servers.unix>.

leo

--
<http://web0.greatbasin.net/~leo/>
Dec 8 '05 #17

P: n/a
William Tasso wrote:
I repeat an earlier question.


http://www.apacheweek.com/features/reverseproxies

--
Nick Kew
Dec 8 '05 #18

P: n/a
Thanks to all for the feedback here and via email on the broken link
preventer.

There is now a product page here for more information:
http://www.siliconglen.com/software/links.html

Having scratched my head for a bit to come up with a name for a program
that prevents broken links, I've called it The Broken Link Preventer :-)

There is also a news release out today:
http://www.prweb.com/releases/2005/12/prweb321865.htm

thanks for all the support, I have received a lot of praise for the tool
via email.

Craig

--
Craig Cockburn ("coburn"). http://www.SiliconGlen.com/
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, weddings, website design, stop spam and more!
Dec 14 '05 #19

P: n/a
Does anyone know if there is a definitive list of 6xx http return codes
anywhere? Some of my links with the link preventer are returning 6xx
codes and although I have a list of what these mean I am wondering how
much of a standard this is.

I'd like to sort out this issue before moving onto my next project, the
Spam Petition,
http://www.siliconglen.com/spampetition/

thanks
--
Craig Cockburn ("coburn"). http://www.SiliconGlen.com/
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, weddings, website design, stop spam and more!
Dec 20 '05 #20

P: n/a
Craig Cockburn wrote
I'd like to sort out this issue before moving onto my next project, the
Spam Petition,
http://www.siliconglen.com/spampetition/


Fucking typical, the anti-spammers spam like the best of them. What do
http return codes have to do with soc.culture.scottish???

Your spam worked though, I looked at your "petition". Who are you going to
send it to, you don't say? Who are you petitioning?

You'll be lucky to get a dozen signatures.

Complete waste of time, but then it's got nothing to do with stopping spam.

--
Charles Sweeney
http://CharlesSweeney.com
Dec 20 '05 #21

P: n/a
Craig Cockburn wrote
Scottish FAQ, weddings, website design, stop spam and more!


Oh the irony.

--
Charles Sweeney
http://CharlesSweeney.com
Dec 20 '05 #22

P: n/a
Craig Cockburn wrote
I'd like to sort out this issue before moving onto my next project, the
Spam Petition,
http://www.siliconglen.com/spampetition/


Why is the first thing on your "spam petition" page, a prominent link to
your "Silicon Glen Homepage"?

--
Charles Sweeney
http://CharlesSweeney.com
Dec 20 '05 #23

P: n/a
On Tue, 20 Dec 2005 07:45:46 +0000, Craig Cockburn
<cr***@siliconglen.com> posted something that included:
Does anyone know if there is a definitive list of 6xx http return codes
anywhere?


To answer your question, yes. I know, and many others know as well.

Although you didn't ask for it, I might mention that the definitive
list of HTTP return codes is, as you might expect, in the RFC for the
HTTP protocol. The latest version of HTTP is 1.1 and the document is
RFC 2616.

And although you *really* didn't ask it, here's a list of the 6xx
return codes in HTTP:
And just in case you didn't catch on, here is a definitive list of all
http error codes in all ranges:

1xx return codes are informational
100 Continue
101 Switching Protocols

2xx return codes indicate success
200 OK
201 Created
202 Accepted
203 Non-Authoritative Information
204 No Content
205 Reset Content
206 Partial Content

3xx return codes require redirection
300 Multiple Choices
301 Moved Permanently
302 Found
303 See Other
304 Not Modified
305 Use Proxy
306 (Unused)
307 Temporary Redirect

4xx return codes indicate client error
400 Bad Request
401 Unauthorized
402 Payment Required
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable
407 Proxy Authentication Required
408 Request Timeout
409 Conflict
410 Gone
411 Length Required
412 Precondition Failed
413 Request Entity Too Large
414 Request-URI Too Long
415 Unsupported Media Type
416 Requested Range Not Satisfiable
417 Expectation Failed

5xx error codes indicate server error
500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
505 HTTP Version Not Supported

--
If we're losing 40-130 species a day,
How come nobody can itemize them?
And why can't fruitflies be one of them?
Dec 20 '05 #24

P: n/a
In message <tf********************************@4ax.com>, Paul Ding
<la*******@paulding.net> writes
On Tue, 20 Dec 2005 07:45:46 +0000, Craig Cockburn
<cr***@siliconglen.com> posted something that included:
Does anyone know if there is a definitive list of 6xx http return codes
anywhere?


To answer your question, yes. I know, and many others know as well.

So there is no definitive list and a 6xx code's meaning is entirely up
to the website?

Craig
--
Craig Cockburn ("coburn"). http://www.SiliconGlen.com/
Please sign the Spam Petition: http://www.siliconglen.com/spampetition/
Home to the first online guide to Scotland, founded 1994.
Scottish FAQ, weddings, website design, stop spam and more!
Dec 20 '05 #25

P: n/a
Writing in
news:alt.http://www.webmaster,alt.html,comp.i...ftware.testing
From the safety of the Silicon Glen - Scotland's Internet cafeteria
Craig Cockburn <cr***@siliconglen.com> said:
In message <tf********************************@4ax.com>, Paul Ding
<la*******@paulding.net> writes
On Tue, 20 Dec 2005 07:45:46 +0000, Craig Cockburn
<cr***@siliconglen.com> posted something that included:
Does anyone know if there is a definitive list of 6xx http return codes
anywhere?


To answer your question, yes. I know, and many others know as well.

So there is no definitive list and a 6xx code's meaning is entirely up
to the website?


well - their meaning is not defined and therefore regardless of the
intentions of the webmaster/admin, UA behaviour is completely
unpredictable.

--
William Tasso

Save the drama
for your Mama.
Dec 20 '05 #26

P: n/a
On Tue, 20 Dec 2005 20:05:08 +0000, Craig Cockburn
<cr***@siliconglen.com> posted something that included:
In message <tf********************************@4ax.com>, Paul Ding
<la*******@paulding.net> writes
On Tue, 20 Dec 2005 07:45:46 +0000, Craig Cockburn
<cr***@siliconglen.com> posted something that included:
Does anyone know if there is a definitive list of 6xx http return codes
anywhere?
To answer your question, yes. I know, and many others know as well.
So there is no definitive list and a 6xx code's meaning is entirely up
to the website?


No. There *is* a definitive list. It's a null set.

A 5xx return code would indicate that it's a server trying to do its
best. A 6xx return code would indicate that it's failing through
error, but deliberately failing and being noncompliant about it.

Obviously a 6xx return codes would indicate that the server was
programmed by an incompetent, or by Microsoft. Not that there's much
difference in practice.

--
If we're losing 40-130 species a day,
How come nobody can itemize them?
And why can't fruitflies be one of them?
Dec 20 '05 #27

P: n/a
Writing in
news:alt.http://www.webmaster,alt.html,comp.i...ftware.testing
From the safety of the cafeteria
Paul Ding <la*******@paulding.net> said:

f'ups reaaranged
...
A 6xx return code would indicate that it's failing through
error, but deliberately failing and being noncompliant about it.

Obviously a 6xx return codes would indicate that the server was
programmed by an incompetent,
it would appear so.
or by Microsoft.


did I miss something?

http://support.microsoft.com/?id=318380

--
William Tasso

Save the drama
for your Mama.
Dec 20 '05 #28

P: n/a
Chris Beall wrote:
Craig Cockburn wrote:
I have a tool which tells me the number of times that visitors attempt
to access a link from my site to an external site and what the response
code received was. In the event of the remote site returning an error
code, they are not sent to the remote site - why bother, it wouldn't work!

.....

Try it out here
http://www.siliconglen.com/Scotland/2_2.html

Craig,

Seems to work here and the suggestions provided to the user are helpful.

....

2. I used Netscape 7.1. When I see a list of links like those in your
example, I tend to keep the page with the list open in one tab, then
right click on each link I'm interested in and select "Open in new tab"
from the resulting popup menu. But something in your code prevents that
option (and several others) from appearing in the popup.


This is now fixed, thanks for the feedback.

Craig

--
Craig Cockburn ("coburn"). http://www.SiliconGlen.com/
Please sign the Spam Petition: http://www.siliconglen.com/spampetition/

Dec 29 '05 #29

This discussion thread is closed

Replies have been disabled for this discussion.