469,338 Members | 8,615 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,338 developers. It's quick & easy.

Checking if referrer is web crawler

Hi. I have a book affiliate website. Whenever a visitor clicks on one
of the books, a script adds one to a field in a mysql database and then
takes the visitor to the shopping basket on the book website.

I have noticed that the book links are getting lots of hit. At first, I
was pleased about the potential income this might mean - but then it
occurred to me that many of these hits are web crawlers (this was
confirmed by webaliser).

Any suggestions of ways of checking if the link is being "clicked" by a
webcrawler so that I can not increment the field in the sql database?

I've checked HTTP_REFERER but it seems to be empty for what I assume
are crawled clicks.

Cheers

Steve

Jun 28 '06 #1
4 13153
In message <11**********************@75g2000cwc.googlegroups. com>
St***********@gmail.com wrote:
Hi. I have a book affiliate website. Whenever a visitor clicks on one
of the books, a script adds one to a field in a mysql database and then
takes the visitor to the shopping basket on the book website.

I have noticed that the book links are getting lots of hit. At first, I
was pleased about the potential income this might mean - but then it
occurred to me that many of these hits are web crawlers (this was
confirmed by webaliser).

Any suggestions of ways of checking if the link is being "clicked" by a
webcrawler so that I can not increment the field in the sql database?

I've checked HTTP_REFERER but it seems to be empty for what I assume
are crawled clicks.

Cheers

Steve


Under user agent they appear to identify themselves as either:

bot
crawler
spider
slurp
crawling

--
Kev Wells http://kevsoft.topcities.com
http://kevsoft.co.uk/
ICQ 238580561
Useless Fact 04 The number of islands around mainland Britain is 6289.
Jun 28 '06 #2

St***********@gmail.com wrote:
[snip]
I have noticed that the book links are getting lots of hit. At first, I
was pleased about the potential income this might mean - but then it
occurred to me that many of these hits are web crawlers (this was
confirmed by webaliser).

Any suggestions of ways of checking if the link is being "clicked" by a
webcrawler so that I can not increment the field in the sql database?


Steve

You should be able to take care of this in your robots.txt file ( root
of your web directory ). I had a problem with a login link and the
bots were getting to it and I would see one reference to the login
file, but no POST. scratched me head for a bit... Anyway, Google it
and you will find a wealth of info.

todh

Jun 29 '06 #3
Hi thanks - but I don't want to prevent the search engines from
crawling my site - I just want a (relatively) accurate figure on how
many clicks through to the book site I've had.

I've written a script to check the user agent to see if it contains
bot, crawling,crawler,slurp etc as indicated by a previous poster. This
seems to work.
Thanks, all.

Steve
ctclibby wrote:
Steve

You should be able to take care of this in your robots.txt file ( root
of your web directory ). I had a problem with a login link and the
bots were getting to it and I would see one reference to the login
file, but no POST. scratched me head for a bit... Anyway, Google it
and you will find a wealth of info.

todh


Jun 29 '06 #4

St***********@gmail.com wrote:
Hi thanks - but I don't want to prevent the search engines from
crawling my site - I just want a (relatively) accurate figure on how
many clicks through to the book site I've had.


Robots.txt has a bunch of configurable options. You can tell the bots
NOT to index the 'login' portion of your site, and allow the rest of
it.

todh

Jul 1 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Gomez | last post: by
6 posts views Thread by arenaTR | last post: by
1 post views Thread by Steve Ocsic | last post: by
13 posts views Thread by abhinav | last post: by
5 posts views Thread by Gargoyle | last post: by
3 posts views Thread by mh121 | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by Marylou17 | last post: by
1 post views Thread by Marylou17 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.