473,404 Members | 2,213 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,404 software developers and data experts.

Link Checking

Hi Guys;

You have been giving me a lot of useful information in the other two
threads. Thanks! Very interesting.

Here is my situation. My friend is writing a book. He has 3100
citations, 1225 of which have URLs.
He wants to Q/C the urls. He removed the http:// from the urls when he
prepared his materials. I wrote a script that extracted the citation
number and the urls from his citation list, prepended an "http", and
printed it on an HTML page as a hyperlink. I used a firefox extension
( http://www.kevinfreitas.net/extensions/linkchecker/ )to check the
links which reports on bad links, skipped (by the checker) links, and
forwarded/forbidden links. It color codes the links it checks.

At this point I am wondering if there is anything better I can do as
far as Q/Cing links short of doing manual checks?
>From the other threads I know that the link checkers are not perfect
and that I should tell my friend that. Is there a more accurate
description I can give him( 90% accurate, mostly good, coin toss etc)

Thanks

Aug 21 '06 #1
8 1822
ken
Hi Steve,

You could use LinkScan/QuickCheck which is a free web service to check
for broken links. It is located at

http://www.elsop.com/linkscan/quickcheck.html

Or download a free trial of LinkScan good for 15 days that is more
comprehensive and very powerful. That can be found at

http://www.elsop.com/linkscan/dleval.cgi

Ken

Aug 21 '06 #2
Steve <st**********@yahoo.comscripsit:
Here is my situation. My friend is writing a book. He has 3100
citations, 1225 of which have URLs.
In a printed book? Even if you check all the links just before the book
starts printing, many of them will have stopped working by the time
customers get the book. So what is the purpose of the URLs? If the book is
scientific, citations and URLs might be needed. In that case, remember to
include both the date of checking and the title or main heading or some
other content identification for the page. That way, people will have a
sporting chance of visiting, now or after 10 years, the page as it was at
the time of citation (assuming they know how to use www.archive.org and it
will remain avaulable).
>From the other threads I know that the link checkers are not perfect
and that I should tell my friend that. Is there a more accurate
description I can give him( 90% accurate, mostly good, coin toss etc)
Throwing percentages isn't useful; 97.5 of all percentages have just been
made up, and the remaining 3.5 % have been miscalculated.

The point is that no link checker can check what the link really points to.
The checkers won't notice anything if the server sends a normal OK response.
Yet the page may contain porn instead of the expected content or, more
typically, a page that tells that the domain is for sale and/or contains
lots of links to pages that someone wants to advertize.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Aug 21 '06 #3

Jukka K. Korpela wrote:
The point is that no link checker can check what the link really points to.
The checkers won't notice anything if the server sends a normal OK response.
Yet the page may contain porn instead of the expected content or, more
typically, a page that tells that the domain is for sale and/or contains
lots of links to pages that someone wants to advertize.
That is what I thought and what I have been advising my friend. I
asked what I did as one last chance of their being something more I can
do for him.

He has volunteers standing by to manually check the URLs. The link
checker reports and HTML pages I generated should give them a good
start.

He is in good shape. The book will be printed but will also be online.
He has real citations, the urls are redundant so he is in a good
spot.

Aug 22 '06 #4
ken
Steve wrote:
Jukka K. Korpela wrote:
The point is that no link checker can check what the link really pointsto.
The checkers won't notice anything if the server sends a normal OK response.
Yet the page may contain porn instead of the expected content or, more
typically, a page that tells that the domain is for sale and/or contains
lots of links to pages that someone wants to advertize.
Not true. LinkScan Profiler™ detects adult content links or any other
type of content the user profiles. It enables webmasters and content
managers to identify pages with "inappropriate" (e.g. adult) content
linked to from their site so they can determine if they wish to
continue to link to a specific site. Furthermore, the product has the
capability of e-mailing an alarm to content managers when such a page
is found. This capability was developed when users discovered that a
link from a medical/education site to a page on breast cancer was
hijacked and replaced with a porno page. More information on this
capability can be found at:

http://www.elsop.com/linkscan/overview.html

Ken

Aug 22 '06 #5
Xenu has lots of useful options to make link checking easier:
http://home.snafu.de/tilman/xenulink.html#Download

Alexander.
--
Alexander Huber
mailto:al*****@gmx.net
Aug 24 '06 #6
ke*@elsop.com <ke*@elsop.comscripsit:
LinkScan ProfilerT detects adult content links or any other
type of content the user profiles.
And you are selling it, right? It would be fair to say that explicitly,
wouldn't it?

What you saying is seriously half-truth, or quarter-truth, or something like
that.

Of course you can set up some purported content recognition for the linked
resources, or use filters based on some previous recognition or settings.

Attempts at recognizing porn, which you prefer calling "adult content" using
the common euphemistic and misleading phrase, have been failures. They can
easily be defeated, and they easily filter out non-porn content. But of
course people who sell software for it will hardly admit that.
It enables webmasters and content
managers to identify pages with "inappropriate" (e.g. adult) content
linked to from their site so they can determine if they wish to
continue to link to a specific site.
Even if that worked, and it doesn't, it would matter rather little.
Actually, the _only_ situation where any of my links (and I have lots of
them) became a porn link was about six or seven years ago, I guess, when a
reputable validator's domain was lost and bought by someone who wanted to
sell porn. I have seen hundreds if not thousands of my links turn to dust
when site administrators redesign a site breaking all links (sometimes
making old URLs work but point to completely different resources), just mess
things up, replace some content by something else, etc., or change domain so
that the old address keeps "working" but now points e.g. to a domain name
salesman's page.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Aug 24 '06 #7
Jukka K. Korpela wrote:
ke*@elsop.com <ke*@elsop.comscripsit:
>LinkScan ProfilerT detects adult content links or any other
type of content the user profiles.

And you are selling it, right? It would be fair to say that explicitly,
wouldn't it?
ISTR ken@elsop was a regular spammer around here some years ago,
and I was pleased to note this looked distinctly less spammy.
What you saying is seriously half-truth, or quarter-truth, or something
like that.
It looked to me more like marketing department spin. And less
harmful than many precisely because ...
Attempts at recognizing porn, which you prefer calling "adult content"
using the common euphemistic and misleading phrase, have been failures.
They can easily be defeated, and they easily filter out non-porn
content. But of course people who sell software for it will hardly admit
that.
.... the claim was sufficiently outlandish that buyers are
unlikely to take it as anything more than marketing crap.
Actually, the _only_ situation where any of my links (and I have lots of
them) became a porn link was about six or seven years ago, I guess, when
a reputable validator's domain was lost and bought by someone who wanted
to sell porn.
Yeah, I got caught by that one.

I have seen hundreds if not thousands of my links turn to
dust when site administrators redesign a site breaking all links
(sometimes making old URLs work but point to completely different
resources),
That's why detecting changes in contents is what really matters.
And using metrics to detect when change is *significant*.

--
Nick Kew
Aug 25 '06 #8

Nick Kew wrote:
... the claim was sufficiently outlandish that buyers are
unlikely to take it as anything more than marketing crap.
The claim might be outlandish to you and I, but I'm more concerned
about the less technically savvy who might fall for it, such as those
with managerial oversight of public bodies such as schools or
libraries. "Pr0n filtering" is a tempting technology to be selling
them, and it's easy to set up demos that do indeed show it "working".
Apart from the fact it simply doesn't work that well, I'm also not keen
on My Tax Dollars (tm) being wasted on snake oil from spammers.

Aug 29 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: aa | last post by:
I use the following fragment of code to output datf from MySQL: ====================================================== $chan = mysql_connect ($db_host, $username, $password); mysql_select_db...
2
by: Nigo | last post by:
Hi there Is it posible to make a link (to a php-file on a server) in an email that, when you click on it, sends info about who the reciver of the email is? Lets say I send out an email thru...
13
by: Razzbar | last post by:
Is there any to tell via javascript the "state" of a link? I mean, I'd like to be able to tell if a link has been followed or not. (hehe, I can hear shrill crys of "privacy!" in the distance...)
1
by: James Edwards | last post by:
There seems to be a bug in the link-library dependency checking in VS.NET 2003. Under the Configuration->Linker->General->Additional Library Directories property, if you specify a directory name...
9
by: Jeff | last post by:
This a little strange. The link in a FE database get corrupted when the BE get compacted, and the FE is not even open. I have checked. The data is fine in the BE. The data is fine in the link...
1
by: sri2097 | last post by:
Hi all, I have written a Link list implementation in Python (Although it's not needed with Lists and Dictionaries present. I tried it just for the kicks !). Anyway here is the code - # Creating...
8
by: Steve | last post by:
Hi; I had a big link checking job to do and it has been years since I have done anything like that so I found a test page to use that I knew had bad links on it( a friends site ) and I decided...
3
by: =?Utf-8?B?UGxhdGVyaW90?= | last post by:
I have link buttons in a Gridview that, depending on the value in another column (Not the Key column) will need to be disabled. For example, (Column2 has link buttons) Column1 ...
10
by: pt36 | last post by:
Hi I have a page with a form and a textbox. before to submit the form I want to chek if the inserted value in the textbox is already present in a database. So I need to pass the textbox value...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.