Connecting Tech Pros Worldwide Help | Site Map

how to track unqiue vistors

Christine Genzer
Guest
 
Posts: n/a
#1: Dec 11 '07
Hi,

I wonder - how do tools like Google Analytics differ real unique users
from all those millions of bots, crawlers, and so on?
Is it based on IP ranges, or "Javascript=active", or what????

Thanks in advance,

Christine
Evertjan.
Guest
 
Posts: n/a
#2: Dec 11 '07

re: how to track unqiue vistors


Christine Genzer wrote on 11 dec 2007 in comp.lang.javascript:
Quote:
I wonder - how do tools like Google Analytics differ real unique users
from all those millions of bots, crawlers, and so on?
Is it based on IP ranges, or "Javascript=active", or what????
Most bots have
bot,crawler,slurp,netcraft,Jeeves, etc
in their HTTP_USER_AGENT header string.

You can never catch them all, as some, like live.com,
also try to mimic normal browsing behavour,
to see if one feeds them different pages as the "normal" folks,
or so I believe, while coming from the same IP range as the same crawler
visit moments before.

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Duncan Booth
Guest
 
Posts: n/a
#3: Dec 11 '07

re: how to track unqiue vistors


Christine Genzer <Christine239@web.dewrote:
Quote:
I wonder - how do tools like Google Analytics differ real unique users
from all those millions of bots, crawlers, and so on?
Is it based on IP ranges, or "Javascript=active", or what????
>
Google analytics and similar use javascript to fetch a url from the
analytics server. Crawlers don't execute javascript so they never touch the
analytics server.

Likewise people who use adblockers to block the analytics server or
doubleclick or whatever are invisible to those particular services.

In addition you can configure analytics to ignore particular IP addresses
which is a handy way to exclude your own hits from the reports.
Christine Genzer
Guest
 
Posts: n/a
#4: Dec 12 '07

re: how to track unqiue vistors


Duncan Booth schrieb:
Quote:
Christine Genzer <Christine239@web.dewrote:
>
Quote:
>I wonder - how do tools like Google Analytics differ real unique users
>from all those millions of bots, crawlers, and so on?
>Is it based on IP ranges, or "Javascript=active", or what????
>>
Google analytics and similar use javascript to fetch a url from the
analytics server. Crawlers don't execute javascript so they never touch the
analytics server.
Well, this was my idea too, but is that really true? Everyone says that
crawlers and bots could not execute javascript - but why?! If a browser
can do it, why shouldn't a bot be able to do the same? The harvesters
and spam bots get more and more intelligent I guess - a lot of email
adresses are "hidden" by Javascript - so wouldn't a good spam bot
also execute Javascript? Javascript is source code, instructions just
like html and so on - why could this not be executed?!

Thanks in advance, Christine
Duncan Booth
Guest
 
Posts: n/a
#5: Dec 12 '07

re: how to track unqiue vistors


Christine Genzer <Christine239@web.dewrote:
Quote:
Duncan Booth schrieb:
Quote:
>Christine Genzer <Christine239@web.dewrote:
>>
Quote:
>>I wonder - how do tools like Google Analytics differ real unique
>>users from all those millions of bots, crawlers, and so on?
>>Is it based on IP ranges, or "Javascript=active", or what????
>>>
>Google analytics and similar use javascript to fetch a url from the
>analytics server. Crawlers don't execute javascript so they never
>touch the analytics server.
>
Well, this was my idea too, but is that really true? Everyone says
that crawlers and bots could not execute javascript - but why?! If a
browser can do it, why shouldn't a bot be able to do the same? The
harvesters and spam bots get more and more intelligent I guess - a lot
of email adresses are "hidden" by Javascript - so wouldn't a good spam
bot also execute Javascript? Javascript is source code, instructions
just like html and so on - why could this not be executed?!
>
It isn't that they 'could not', rather in general they 'do not'. It's
quite easy to write a bot which executes all javascript: just drive IE
through its automation interface, but it will be much slower and doesn't
gain you much.

If your crawler is something like Google then you want to respect
people's wishes, so there isn't any point looking for hard-to-find
pages: if someone wants you to find all their pages they can create a
sitemap with non-javascript links.

If your crawler is a harvester looking for email addresses then you'll
get more by crawling fast and missing some than by crawling slowly to
grab them all. Perhaps this will change in the future as more sites
obfuscate email addresses. You might find this article of interest:
http://nadeausoftware.com/articles/2...esses_spammers

Also, even if you did decide to execute Javascript, making a bot
specifically exclude the common analytics sites would make sense: why
advertise what you are doing?

Thomas 'PointedEars' Lahn
Guest
 
Posts: n/a
#6: Dec 12 '07

re: how to track unqiue vistors


Christine Genzer wrote:
Quote:
Duncan Booth schrieb:
Quote:
>Christine Genzer <Christine239@web.dewrote:
Quote:
>>I wonder - how do tools like Google Analytics differ real unique users
>>from all those millions of bots, crawlers, and so on?
>>Is it based on IP ranges, or "Javascript=active", or what????
>>
>Google analytics and similar use javascript to fetch a url from the
>analytics server. Crawlers don't execute javascript so they never touch the
>analytics server.
>
Well, this was my idea too, but is that really true? Everyone says that
crawlers and bots could not execute javascript - but why?! If a browser
can do it, why shouldn't a bot be able to do the same?
For simple code, it would need not only a script parser but also a script
engine, i.e. an ECMAScript implementation. For more complex code, it would
also need to implement a AOM/DOM and an engine for that, too. That would by
far outweigh the expected gain of finding URLs in script code. Reasonable
authors also write Web sites that degrade gracefully, so there is not really
a need to crawl scripts.

Please don't multi-post to several JavaScript newsgroups, and have your
keyboard repaired. Thanks in advance.


PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$8300dec7@news.demon.co.uk>
Closed Thread