Christine Genzer <Ch**********@web.dewrote:
Duncan Booth schrieb:
>Christine Genzer <Ch**********@web.dewrote:
>>I wonder - how do tools like Google Analytics differ real unique
users from all those millions of bots, crawlers, and so on?
Is it based on IP ranges, or "Javascript=active", or what????
Google analytics and similar use javascript to fetch a url from the
analytics server. Crawlers don't execute javascript so they never
touch the analytics server.
Well, this was my idea too, but is that really true? Everyone says
that crawlers and bots could not execute javascript - but why?! If a
browser can do it, why shouldn't a bot be able to do the same? The
harvesters and spam bots get more and more intelligent I guess - a lot
of email adresses are "hidden" by Javascript - so wouldn't a good spam
bot also execute Javascript? Javascript is source code, instructions
just like html and so on - why could this not be executed?!
It isn't that they 'could not', rather in general they 'do not'. It's
quite easy to write a bot which executes all javascript
: just drive IE
through its automation interface, but it will be much slower and doesn't
gain you much.
If your crawler is something like Google then you want to respect
people's wishes, so there isn't any point looking for hard-to-find
pages: if someone wants you to find all their pages they can create a
sitemap with non-javascript links.
If your crawler is a harvester looking for email addresses then you'll
get more by crawling fast and missing some than by crawling slowly to
grab them all. Perhaps this will change in the future as more sites
obfuscate email addresses. You might find this article of interest:
http://nadeausoftware.com/articles/2...esses_spammers
Also, even if you did decide to execute Javascript, making a bot
specifically exclude the common analytics sites would make sense: why
advertise what you are doing?