Connecting Tech Pros Worldwide Forums | Help | Site Map

Checking if referrer is web crawler

StevePBurgess@gmail.com
Guest
 
Posts: n/a
#1: Jun 28 '06
Hi. I have a book affiliate website. Whenever a visitor clicks on one
of the books, a script adds one to a field in a mysql database and then
takes the visitor to the shopping basket on the book website.

I have noticed that the book links are getting lots of hit. At first, I
was pleased about the potential income this might mean - but then it
occurred to me that many of these hits are web crawlers (this was
confirmed by webaliser).

Any suggestions of ways of checking if the link is being "clicked" by a
webcrawler so that I can not increment the field in the sql database?

I've checked HTTP_REFERER but it seems to be empty for what I assume
are crawled clicks.

Cheers

Steve


Kevin Wells
Guest
 
Posts: n/a
#2: Jun 28 '06

re: Checking if referrer is web crawler


In message <1151495324.852580.282450@75g2000cwc.googlegroups. com>
StevePBurgess@gmail.com wrote:
[color=blue]
>Hi. I have a book affiliate website. Whenever a visitor clicks on one
>of the books, a script adds one to a field in a mysql database and then
>takes the visitor to the shopping basket on the book website.
>
>I have noticed that the book links are getting lots of hit. At first, I
>was pleased about the potential income this might mean - but then it
>occurred to me that many of these hits are web crawlers (this was
>confirmed by webaliser).
>
>Any suggestions of ways of checking if the link is being "clicked" by a
>webcrawler so that I can not increment the field in the sql database?
>
>I've checked HTTP_REFERER but it seems to be empty for what I assume
>are crawled clicks.
>
>Cheers
>
>Steve
>[/color]

Under user agent they appear to identify themselves as either:

bot
crawler
spider
slurp
crawling

--
Kev Wells http://kevsoft.topcities.com
http://kevsoft.co.uk/
ICQ 238580561
Useless Fact 04 The number of islands around mainland Britain is 6289.
ctclibby
Guest
 
Posts: n/a
#3: Jun 29 '06

re: Checking if referrer is web crawler



StevePBurgess@gmail.com wrote:
[snip][color=blue]
> I have noticed that the book links are getting lots of hit. At first, I
> was pleased about the potential income this might mean - but then it
> occurred to me that many of these hits are web crawlers (this was
> confirmed by webaliser).
>
> Any suggestions of ways of checking if the link is being "clicked" by a
> webcrawler so that I can not increment the field in the sql database?[/color]

Steve

You should be able to take care of this in your robots.txt file ( root
of your web directory ). I had a problem with a login link and the
bots were getting to it and I would see one reference to the login
file, but no POST. scratched me head for a bit... Anyway, Google it
and you will find a wealth of info.

todh

StevePBurgess@gmail.com
Guest
 
Posts: n/a
#4: Jun 29 '06

re: Checking if referrer is web crawler


Hi thanks - but I don't want to prevent the search engines from
crawling my site - I just want a (relatively) accurate figure on how
many clicks through to the book site I've had.

I've written a script to check the user agent to see if it contains
bot, crawling,crawler,slurp etc as indicated by a previous poster. This
seems to work.


Thanks, all.

Steve


ctclibby wrote:
[color=blue]
> Steve
>
> You should be able to take care of this in your robots.txt file ( root
> of your web directory ). I had a problem with a login link and the
> bots were getting to it and I would see one reference to the login
> file, but no POST. scratched me head for a bit... Anyway, Google it
> and you will find a wealth of info.
>
> todh[/color]

ctclibby
Guest
 
Posts: n/a
#5: Jul 1 '06

re: Checking if referrer is web crawler



StevePBurgess@gmail.com wrote:[color=blue]
> Hi thanks - but I don't want to prevent the search engines from
> crawling my site - I just want a (relatively) accurate figure on how
> many clicks through to the book site I've had.[/color]

Robots.txt has a bunch of configurable options. You can tell the bots
NOT to index the 'login' portion of your site, and allow the rest of
it.

todh

Closed Thread


Similar PHP bytes