473,659 Members | 2,965 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Link Checking

Hi Guys;

You have been giving me a lot of useful information in the other two
threads. Thanks! Very interesting.

Here is my situation. My friend is writing a book. He has 3100
citations, 1225 of which have URLs.
He wants to Q/C the urls. He removed the http:// from the urls when he
prepared his materials. I wrote a script that extracted the citation
number and the urls from his citation list, prepended an "http", and
printed it on an HTML page as a hyperlink. I used a firefox extension
( http://www.kevinfreitas.net/extensions/linkchecker/ )to check the
links which reports on bad links, skipped (by the checker) links, and
forwarded/forbidden links. It color codes the links it checks.

At this point I am wondering if there is anything better I can do as
far as Q/Cing links short of doing manual checks?
>From the other threads I know that the link checkers are not perfect
and that I should tell my friend that. Is there a more accurate
description I can give him( 90% accurate, mostly good, coin toss etc)

Thanks

Aug 21 '06 #1
8 1838
ken
Hi Steve,

You could use LinkScan/QuickCheck which is a free web service to check
for broken links. It is located at

http://www.elsop.com/linkscan/quickcheck.html

Or download a free trial of LinkScan good for 15 days that is more
comprehensive and very powerful. That can be found at

http://www.elsop.com/linkscan/dleval.cgi

Ken

Aug 21 '06 #2
Steve <st**********@y ahoo.comscripsi t:
Here is my situation. My friend is writing a book. He has 3100
citations, 1225 of which have URLs.
In a printed book? Even if you check all the links just before the book
starts printing, many of them will have stopped working by the time
customers get the book. So what is the purpose of the URLs? If the book is
scientific, citations and URLs might be needed. In that case, remember to
include both the date of checking and the title or main heading or some
other content identification for the page. That way, people will have a
sporting chance of visiting, now or after 10 years, the page as it was at
the time of citation (assuming they know how to use www.archive.org and it
will remain avaulable).
>From the other threads I know that the link checkers are not perfect
and that I should tell my friend that. Is there a more accurate
description I can give him( 90% accurate, mostly good, coin toss etc)
Throwing percentages isn't useful; 97.5 of all percentages have just been
made up, and the remaining 3.5 % have been miscalculated.

The point is that no link checker can check what the link really points to.
The checkers won't notice anything if the server sends a normal OK response.
Yet the page may contain porn instead of the expected content or, more
typically, a page that tells that the domain is for sale and/or contains
lots of links to pages that someone wants to advertize.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Aug 21 '06 #3

Jukka K. Korpela wrote:
The point is that no link checker can check what the link really points to.
The checkers won't notice anything if the server sends a normal OK response.
Yet the page may contain porn instead of the expected content or, more
typically, a page that tells that the domain is for sale and/or contains
lots of links to pages that someone wants to advertize.
That is what I thought and what I have been advising my friend. I
asked what I did as one last chance of their being something more I can
do for him.

He has volunteers standing by to manually check the URLs. The link
checker reports and HTML pages I generated should give them a good
start.

He is in good shape. The book will be printed but will also be online.
He has real citations, the urls are redundant so he is in a good
spot.

Aug 22 '06 #4
ken
Steve wrote:
Jukka K. Korpela wrote:
The point is that no link checker can check what the link really pointsto.
The checkers won't notice anything if the server sends a normal OK response.
Yet the page may contain porn instead of the expected content or, more
typically, a page that tells that the domain is for sale and/or contains
lots of links to pages that someone wants to advertize.
Not true. LinkScan Profiler™ detects adult content links or any other
type of content the user profiles. It enables webmasters and content
managers to identify pages with "inappropri ate" (e.g. adult) content
linked to from their site so they can determine if they wish to
continue to link to a specific site. Furthermore, the product has the
capability of e-mailing an alarm to content managers when such a page
is found. This capability was developed when users discovered that a
link from a medical/education site to a page on breast cancer was
hijacked and replaced with a porno page. More information on this
capability can be found at:

http://www.elsop.com/linkscan/overview.html

Ken

Aug 22 '06 #5
Xenu has lots of useful options to make link checking easier:
http://home.snafu.de/tilman/xenulink.html#Download

Alexander.
--
Alexander Huber
mailto:al*****@ gmx.net
Aug 24 '06 #6
ke*@elsop.com <ke*@elsop.coms cripsit:
LinkScan ProfilerT detects adult content links or any other
type of content the user profiles.
And you are selling it, right? It would be fair to say that explicitly,
wouldn't it?

What you saying is seriously half-truth, or quarter-truth, or something like
that.

Of course you can set up some purported content recognition for the linked
resources, or use filters based on some previous recognition or settings.

Attempts at recognizing porn, which you prefer calling "adult content" using
the common euphemistic and misleading phrase, have been failures. They can
easily be defeated, and they easily filter out non-porn content. But of
course people who sell software for it will hardly admit that.
It enables webmasters and content
managers to identify pages with "inappropri ate" (e.g. adult) content
linked to from their site so they can determine if they wish to
continue to link to a specific site.
Even if that worked, and it doesn't, it would matter rather little.
Actually, the _only_ situation where any of my links (and I have lots of
them) became a porn link was about six or seven years ago, I guess, when a
reputable validator's domain was lost and bought by someone who wanted to
sell porn. I have seen hundreds if not thousands of my links turn to dust
when site administrators redesign a site breaking all links (sometimes
making old URLs work but point to completely different resources), just mess
things up, replace some content by something else, etc., or change domain so
that the old address keeps "working" but now points e.g. to a domain name
salesman's page.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Aug 24 '06 #7
Jukka K. Korpela wrote:
ke*@elsop.com <ke*@elsop.coms cripsit:
>LinkScan ProfilerT detects adult content links or any other
type of content the user profiles.

And you are selling it, right? It would be fair to say that explicitly,
wouldn't it?
ISTR ken@elsop was a regular spammer around here some years ago,
and I was pleased to note this looked distinctly less spammy.
What you saying is seriously half-truth, or quarter-truth, or something
like that.
It looked to me more like marketing department spin. And less
harmful than many precisely because ...
Attempts at recognizing porn, which you prefer calling "adult content"
using the common euphemistic and misleading phrase, have been failures.
They can easily be defeated, and they easily filter out non-porn
content. But of course people who sell software for it will hardly admit
that.
.... the claim was sufficiently outlandish that buyers are
unlikely to take it as anything more than marketing crap.
Actually, the _only_ situation where any of my links (and I have lots of
them) became a porn link was about six or seven years ago, I guess, when
a reputable validator's domain was lost and bought by someone who wanted
to sell porn.
Yeah, I got caught by that one.

I have seen hundreds if not thousands of my links turn to
dust when site administrators redesign a site breaking all links
(sometimes making old URLs work but point to completely different
resources),
That's why detecting changes in contents is what really matters.
And using metrics to detect when change is *significant*.

--
Nick Kew
Aug 25 '06 #8

Nick Kew wrote:
... the claim was sufficiently outlandish that buyers are
unlikely to take it as anything more than marketing crap.
The claim might be outlandish to you and I, but I'm more concerned
about the less technically savvy who might fall for it, such as those
with managerial oversight of public bodies such as schools or
libraries. "Pr0n filtering" is a tempting technology to be selling
them, and it's easy to set up demos that do indeed show it "working".
Apart from the fact it simply doesn't work that well, I'm also not keen
on My Tax Dollars (tm) being wasted on snake oil from spammers.

Aug 29 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
4656
by: aa | last post by:
I use the following fragment of code to output datf from MySQL: ====================================================== $chan = mysql_connect ($db_host, $username, $password); mysql_select_db ($DB_name, $chan); $resultid = mysql_query ("select name_ru, description_ru, retail, dealer from lasershot WHERE le='1'", $chan); ........ ======================================================
2
1641
by: Nigo | last post by:
Hi there Is it posible to make a link (to a php-file on a server) in an email that, when you click on it, sends info about who the reciver of the email is? Lets say I send out an email thru an alias to a group of people and I want to give them the opportunity to sign up for an event using that link. Those that click the link are added to a post in a database (MySQL), eg. with their email adress, initials, name... whatever. I
13
1508
by: Razzbar | last post by:
Is there any to tell via javascript the "state" of a link? I mean, I'd like to be able to tell if a link has been followed or not. (hehe, I can hear shrill crys of "privacy!" in the distance...)
1
1155
by: James Edwards | last post by:
There seems to be a bug in the link-library dependency checking in VS.NET 2003. Under the Configuration->Linker->General->Additional Library Directories property, if you specify a directory name such as "$(SolutionDir)\..\lib", it will link properly, but when you try to run in the IDE will complain that the "project configuration is out of date." It turns out that $(SolutionDir) already contains a trailing '\', so that changing the...
9
1788
by: Jeff | last post by:
This a little strange. The link in a FE database get corrupted when the BE get compacted, and the FE is not even open. I have checked. The data is fine in the BE. The data is fine in the link in the FE before compacting the BE. I shutdown the FE and then compact the BE. Open the FE and the data 'looks' corrupt. In data sheet view the first columns on the left are okay. Then the last several rows have been shifted to the right and some...
1
2079
by: sri2097 | last post by:
Hi all, I have written a Link list implementation in Python (Although it's not needed with Lists and Dictionaries present. I tried it just for the kicks !). Anyway here is the code - # Creating a class comprising of node in Link List. class linklist: def __init__(self, data=None,link=None): self.data = data self.link = link
8
2438
by: Steve | last post by:
Hi; I had a big link checking job to do and it has been years since I have done anything like that so I found a test page to use that I knew had bad links on it( a friends site ) and I decided to test the various free services out. I tried about 5 different link checkers on the test page I had , including Xenu and NetMechanic. I got 5 sets of identical results.
3
7937
by: =?Utf-8?B?UGxhdGVyaW90?= | last post by:
I have link buttons in a Gridview that, depending on the value in another column (Not the Key column) will need to be disabled. For example, (Column2 has link buttons) Column1 Column2 Books 54 Videos 6 Audio 3
10
21532
by: pt36 | last post by:
Hi I have a page with a form and a textbox. before to submit the form I want to chek if the inserted value in the textbox is already present in a database. So I need to pass the textbox value by a javascript link to another page to check. my code is <a href="javascript:location.href='page_check.php' "Go check </a> and this work and open page_check.php
0
8428
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8335
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8851
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8627
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7356
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6179
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4175
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4335
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
1976
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.