Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old June 28th, 2006, 12:45 PM
StevePBurgess@gmail.com
Guest
 
Posts: n/a
Default Checking if referrer is web crawler

Hi. I have a book affiliate website. Whenever a visitor clicks on one
of the books, a script adds one to a field in a mysql database and then
takes the visitor to the shopping basket on the book website.

I have noticed that the book links are getting lots of hit. At first, I
was pleased about the potential income this might mean - but then it
occurred to me that many of these hits are web crawlers (this was
confirmed by webaliser).

Any suggestions of ways of checking if the link is being "clicked" by a
webcrawler so that I can not increment the field in the sql database?

I've checked HTTP_REFERER but it seems to be empty for what I assume
are crawled clicks.

Cheers

Steve

  #2  
Old June 28th, 2006, 03:35 PM
Kevin Wells
Guest
 
Posts: n/a
Default Re: Checking if referrer is web crawler

In message <1151495324.852580.282450@75g2000cwc.googlegroups. com>
StevePBurgess@gmail.com wrote:
[color=blue]
>Hi. I have a book affiliate website. Whenever a visitor clicks on one
>of the books, a script adds one to a field in a mysql database and then
>takes the visitor to the shopping basket on the book website.
>
>I have noticed that the book links are getting lots of hit. At first, I
>was pleased about the potential income this might mean - but then it
>occurred to me that many of these hits are web crawlers (this was
>confirmed by webaliser).
>
>Any suggestions of ways of checking if the link is being "clicked" by a
>webcrawler so that I can not increment the field in the sql database?
>
>I've checked HTTP_REFERER but it seems to be empty for what I assume
>are crawled clicks.
>
>Cheers
>
>Steve
>[/color]

Under user agent they appear to identify themselves as either:

bot
crawler
spider
slurp
crawling

--
Kev Wells http://kevsoft.topcities.com
http://kevsoft.co.uk/
ICQ 238580561
Useless Fact 04 The number of islands around mainland Britain is 6289.
  #3  
Old June 29th, 2006, 11:35 AM
ctclibby
Guest
 
Posts: n/a
Default Re: Checking if referrer is web crawler


StevePBurgess@gmail.com wrote:
[snip][color=blue]
> I have noticed that the book links are getting lots of hit. At first, I
> was pleased about the potential income this might mean - but then it
> occurred to me that many of these hits are web crawlers (this was
> confirmed by webaliser).
>
> Any suggestions of ways of checking if the link is being "clicked" by a
> webcrawler so that I can not increment the field in the sql database?[/color]

Steve

You should be able to take care of this in your robots.txt file ( root
of your web directory ). I had a problem with a login link and the
bots were getting to it and I would see one reference to the login
file, but no POST. scratched me head for a bit... Anyway, Google it
and you will find a wealth of info.

todh

  #4  
Old June 29th, 2006, 02:05 PM
StevePBurgess@gmail.com
Guest
 
Posts: n/a
Default Re: Checking if referrer is web crawler

Hi thanks - but I don't want to prevent the search engines from
crawling my site - I just want a (relatively) accurate figure on how
many clicks through to the book site I've had.

I've written a script to check the user agent to see if it contains
bot, crawling,crawler,slurp etc as indicated by a previous poster. This
seems to work.


Thanks, all.

Steve


ctclibby wrote:
[color=blue]
> Steve
>
> You should be able to take care of this in your robots.txt file ( root
> of your web directory ). I had a problem with a login link and the
> bots were getting to it and I would see one reference to the login
> file, but no POST. scratched me head for a bit... Anyway, Google it
> and you will find a wealth of info.
>
> todh[/color]

  #5  
Old July 1st, 2006, 10:55 AM
ctclibby
Guest
 
Posts: n/a
Default Re: Checking if referrer is web crawler


StevePBurgess@gmail.com wrote:[color=blue]
> Hi thanks - but I don't want to prevent the search engines from
> crawling my site - I just want a (relatively) accurate figure on how
> many clicks through to the book site I've had.[/color]

Robots.txt has a bunch of configurable options. You can tell the bots
NOT to index the 'login' portion of your site, and allow the rest of
it.

todh

 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles