By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,035 Members | 1,471 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,035 IT Pros & Developers. It's quick & easy.

Checking if referrer is web crawler

P: n/a
Hi. I have a book affiliate website. Whenever a visitor clicks on one
of the books, a script adds one to a field in a mysql database and then
takes the visitor to the shopping basket on the book website.

I have noticed that the book links are getting lots of hit. At first, I
was pleased about the potential income this might mean - but then it
occurred to me that many of these hits are web crawlers (this was
confirmed by webaliser).

Any suggestions of ways of checking if the link is being "clicked" by a
webcrawler so that I can not increment the field in the sql database?

I've checked HTTP_REFERER but it seems to be empty for what I assume
are crawled clicks.

Cheers

Steve

Jun 28 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
In message <11**********************@75g2000cwc.googlegroups. com>
St***********@gmail.com wrote:
Hi. I have a book affiliate website. Whenever a visitor clicks on one
of the books, a script adds one to a field in a mysql database and then
takes the visitor to the shopping basket on the book website.

I have noticed that the book links are getting lots of hit. At first, I
was pleased about the potential income this might mean - but then it
occurred to me that many of these hits are web crawlers (this was
confirmed by webaliser).

Any suggestions of ways of checking if the link is being "clicked" by a
webcrawler so that I can not increment the field in the sql database?

I've checked HTTP_REFERER but it seems to be empty for what I assume
are crawled clicks.

Cheers

Steve


Under user agent they appear to identify themselves as either:

bot
crawler
spider
slurp
crawling

--
Kev Wells http://kevsoft.topcities.com
http://kevsoft.co.uk/
ICQ 238580561
Useless Fact 04 The number of islands around mainland Britain is 6289.
Jun 28 '06 #2

P: n/a

St***********@gmail.com wrote:
[snip]
I have noticed that the book links are getting lots of hit. At first, I
was pleased about the potential income this might mean - but then it
occurred to me that many of these hits are web crawlers (this was
confirmed by webaliser).

Any suggestions of ways of checking if the link is being "clicked" by a
webcrawler so that I can not increment the field in the sql database?


Steve

You should be able to take care of this in your robots.txt file ( root
of your web directory ). I had a problem with a login link and the
bots were getting to it and I would see one reference to the login
file, but no POST. scratched me head for a bit... Anyway, Google it
and you will find a wealth of info.

todh

Jun 29 '06 #3

P: n/a
Hi thanks - but I don't want to prevent the search engines from
crawling my site - I just want a (relatively) accurate figure on how
many clicks through to the book site I've had.

I've written a script to check the user agent to see if it contains
bot, crawling,crawler,slurp etc as indicated by a previous poster. This
seems to work.
Thanks, all.

Steve
ctclibby wrote:
Steve

You should be able to take care of this in your robots.txt file ( root
of your web directory ). I had a problem with a login link and the
bots were getting to it and I would see one reference to the login
file, but no POST. scratched me head for a bit... Anyway, Google it
and you will find a wealth of info.

todh


Jun 29 '06 #4

P: n/a

St***********@gmail.com wrote:
Hi thanks - but I don't want to prevent the search engines from
crawling my site - I just want a (relatively) accurate figure on how
many clicks through to the book site I've had.


Robots.txt has a bunch of configurable options. You can tell the bots
NOT to index the 'login' portion of your site, and allow the rest of
it.

todh

Jul 1 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.