Connecting Tech Pros Worldwide Help | Site Map

detection of a robot in php

  #1  
Old August 9th, 2006, 01:35 AM
giminik@gmail.com
Guest
 
Posts: n/a
Hello everybody :)

A friend recently showed me an odd thing while playing with the command
wget under linux, I don't know why... But the result has surprised me :
$ wget http://www.prizee.com/parole.php
--02:35:29-- http://www.prizee.com/parole.php
=`parole.php'
Resolution de www.prizee.com... 213.186.63.5
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...302 Found
Emplacement: /index.php?joueur=1 [suivant]
--02:35:30-- http://www.prizee.com/index.php?joueur=1
=`index.php?joueur=1.1'
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...200 OK
Longueur: non specifie [text/html]

[ <=>

] 12,521
--.--K/s

02:35:30 (103.57 KB/s) - ? index.php?joueur=1.1 a sauvegarde [12521]


Then, he obtains an http error code (302) which redirect him on the
index page of the site.
With a browser like firefox, ie, safari we get the good page without
any redirection.
After that, I've made some tests. I tried to change the user agent
string with wget to identify it as mozilla, but I have the same result
(redirection). I tried links (command line browser) and curl but same
problem.
Here is the result of curl command :

$ curl -v http://www.prizee.com/parole.php
* About to connect() to www.prizee.com port 80
* Trying 213.186.63.5... connected
* Connected to www.prizee.com (213.186.63.5) port 80
Quote:
GET /parole.php HTTP/1.1
User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15
Host: www.prizee.com
Accept: */*
>
< HTTP/1.1 302 Found
< Date: Wed, 09 Aug 2006 00:02:57 GMT
< Server: Apache/1.3.33 (Unix) PHP/4.3.10
< X-Powered-By: PHP/4.3.10
< X-Accelerated-By: PHPA/1.3.3r2
< Expires: Mon, 26 Jul 1997 05:00:00 GMT
< Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
< Cache-Control: no-cache, must-revalidate
< Pragma: no-cache
< Set-Cookie: COOKIEis_accepted=1; path=/; domain=.prizee.com
< Location: /index.php?joueur=1
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
* Closing connection #0


So, my question is : How we can detect the use of a command line tool
on a web site ? Like the site above. Thank you for your answers.

Sorry for my bad english, i'm french ;)

  #2  
Old August 9th, 2006, 02:05 AM
Chris Hope
Guest
 
Posts: n/a

re: detection of a robot in php


giminik@gmail.com wrote:
Quote:
Hello everybody :)
>
A friend recently showed me an odd thing while playing with the
command wget under linux, I don't know why... But the result has
surprised me : $ wget http://www.prizee.com/parole.php
--02:35:29-- http://www.prizee.com/parole.php
=`parole.php'
Resolution de www.prizee.com... 213.186.63.5
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...302 Found
Emplacement: /index.php?joueur=1 [suivant]
--02:35:30-- http://www.prizee.com/index.php?joueur=1
=`index.php?joueur=1.1'
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...200 OK
Longueur: non specifie [text/html]
>
[ <=>
>
] 12,521
--.--K/s
>
02:35:30 (103.57 KB/s) - ? index.php?joueur=1.1 a sauvegarde [12521]
>
>
Then, he obtains an http error code (302) which redirect him on the
index page of the site.
With a browser like firefox, ie, safari we get the good page without
any redirection.
After that, I've made some tests. I tried to change the user agent
string with wget to identify it as mozilla, but I have the same result
(redirection). I tried links (command line browser) and curl but same
problem.
Here is the result of curl command :
>
$ curl -v http://www.prizee.com/parole.php
* About to connect() to www.prizee.com port 80
* Trying 213.186.63.5... connected
* Connected to www.prizee.com (213.186.63.5) port 80
Quote:
>GET /parole.php HTTP/1.1
>User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1
>GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15 Host: www.prizee.com
>Accept: */*
>>
< HTTP/1.1 302 Found
< Date: Wed, 09 Aug 2006 00:02:57 GMT
< Server: Apache/1.3.33 (Unix) PHP/4.3.10
< X-Powered-By: PHP/4.3.10
< X-Accelerated-By: PHPA/1.3.3r2
< Expires: Mon, 26 Jul 1997 05:00:00 GMT
< Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
< Cache-Control: no-cache, must-revalidate
< Pragma: no-cache
< Set-Cookie: COOKIEis_accepted=1; path=/; domain=.prizee.com
< Location: /index.php?joueur=1
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
* Closing connection #0
>
>
So, my question is : How we can detect the use of a command line tool
on a web site ? Like the site above. Thank you for your answers.
I tried both Firefox and Konqueror and they both redirected me to the
second page, so there doesn't appear to be anything different between
using wget and using a graphical browser, at least to me.

You can't detect the use of a command line tool if they set the user
agent correctly. For example:

wget --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

followed by the url will tell the website you're using IE on Windows XP.

--
Chris Hope | www.electrictoolbox.com | www.linuxcdmall.com
  #3  
Old August 9th, 2006, 04:25 AM
flamer die.spam@hotmail.com
Guest
 
Posts: n/a

re: detection of a robot in php



giminik@gmail.com wrote:
Quote:
Hello everybody :)
>
A friend recently showed me an odd thing while playing with the command
wget under linux, I don't know why... But the result has surprised me :
$ wget http://www.prizee.com/parole.php
--02:35:29-- http://www.prizee.com/parole.php
=`parole.php'
Resolution de www.prizee.com... 213.186.63.5
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...302 Found
Emplacement: /index.php?joueur=1 [suivant]
--02:35:30-- http://www.prizee.com/index.php?joueur=1
=`index.php?joueur=1.1'
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...200 OK
Longueur: non specifie [text/html]
>
[ <=>
>
] 12,521
--.--K/s
>
02:35:30 (103.57 KB/s) - ? index.php?joueur=1.1 a sauvegarde [12521]
>
>
Then, he obtains an http error code (302) which redirect him on the
index page of the site.
With a browser like firefox, ie, safari we get the good page without
any redirection.
After that, I've made some tests. I tried to change the user agent
string with wget to identify it as mozilla, but I have the same result
(redirection). I tried links (command line browser) and curl but same
problem.
Here is the result of curl command :
>
$ curl -v http://www.prizee.com/parole.php
* About to connect() to www.prizee.com port 80
* Trying 213.186.63.5... connected
* Connected to www.prizee.com (213.186.63.5) port 80
Quote:
GET /parole.php HTTP/1.1
User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15
Host: www.prizee.com
Accept: */*
< HTTP/1.1 302 Found
< Date: Wed, 09 Aug 2006 00:02:57 GMT
< Server: Apache/1.3.33 (Unix) PHP/4.3.10
< X-Powered-By: PHP/4.3.10
< X-Accelerated-By: PHPA/1.3.3r2
< Expires: Mon, 26 Jul 1997 05:00:00 GMT
< Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
< Cache-Control: no-cache, must-revalidate
< Pragma: no-cache
< Set-Cookie: COOKIEis_accepted=1; path=/; domain=.prizee.com
< Location: /index.php?joueur=1
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
* Closing connection #0
>
>
So, my question is : How we can detect the use of a command line tool
on a web site ? Like the site above. Thank you for your answers.
>
Sorry for my bad english, i'm french ;)
probably just redirects for linux users and not ms by checking the
agent-type.

Flamer.

  #4  
Old August 9th, 2006, 10:45 AM
giminik@gmail.com
Guest
 
Posts: n/a

re: detection of a robot in php


Thank for your answers.
I found the problem. It was a session cookie problem.
I've just used the wget option : --keep-session-cookies with
--load-cookies to solve the problem.

Closed Thread


Similar Threads
Thread Thread Starter Forum Replies Last Post
CFP: GAMEON 2007, November 20-22, 2007, University of Bologna, Bologna,Italy Philippe Geril answers 0 June 15th, 2007 11:15 AM