Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old August 9th, 2006, 01:35 AM
giminik@gmail.com
Guest
 
Posts: n/a
Default detection of a robot in php

Hello everybody :)

A friend recently showed me an odd thing while playing with the command
wget under linux, I don't know why... But the result has surprised me :
$ wget http://www.prizee.com/parole.php
--02:35:29-- http://www.prizee.com/parole.php
=`parole.php'
Resolution de www.prizee.com... 213.186.63.5
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...302 Found
Emplacement: /index.php?joueur=1 [suivant]
--02:35:30-- http://www.prizee.com/index.php?joueur=1
=`index.php?joueur=1.1'
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...200 OK
Longueur: non specifie [text/html]

[ <=>

] 12,521
--.--K/s

02:35:30 (103.57 KB/s) - ? index.php?joueur=1.1 a sauvegarde [12521]


Then, he obtains an http error code (302) which redirect him on the
index page of the site.
With a browser like firefox, ie, safari we get the good page without
any redirection.
After that, I've made some tests. I tried to change the user agent
string with wget to identify it as mozilla, but I have the same result
(redirection). I tried links (command line browser) and curl but same
problem.
Here is the result of curl command :

$ curl -v http://www.prizee.com/parole.php
* About to connect() to www.prizee.com port 80
* Trying 213.186.63.5... connected
* Connected to www.prizee.com (213.186.63.5) port 80
Quote:
GET /parole.php HTTP/1.1
User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15
Host: www.prizee.com
Accept: */*
>
< HTTP/1.1 302 Found
< Date: Wed, 09 Aug 2006 00:02:57 GMT
< Server: Apache/1.3.33 (Unix) PHP/4.3.10
< X-Powered-By: PHP/4.3.10
< X-Accelerated-By: PHPA/1.3.3r2
< Expires: Mon, 26 Jul 1997 05:00:00 GMT
< Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
< Cache-Control: no-cache, must-revalidate
< Pragma: no-cache
< Set-Cookie: COOKIEis_accepted=1; path=/; domain=.prizee.com
< Location: /index.php?joueur=1
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
* Closing connection #0


So, my question is : How we can detect the use of a command line tool
on a web site ? Like the site above. Thank you for your answers.

Sorry for my bad english, i'm french ;)

  #2  
Old August 9th, 2006, 02:05 AM
Chris Hope
Guest
 
Posts: n/a
Default Re: detection of a robot in php

giminik@gmail.com wrote:
Quote:
Hello everybody :)
>
A friend recently showed me an odd thing while playing with the
command wget under linux, I don't know why... But the result has
surprised me : $ wget http://www.prizee.com/parole.php
--02:35:29-- http://www.prizee.com/parole.php
=`parole.php'
Resolution de www.prizee.com... 213.186.63.5
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...302 Found
Emplacement: /index.php?joueur=1 [suivant]
--02:35:30-- http://www.prizee.com/index.php?joueur=1
=`index.php?joueur=1.1'
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...200 OK
Longueur: non specifie [text/html]
>
[ <=>
>
] 12,521
--.--K/s
>
02:35:30 (103.57 KB/s) - ? index.php?joueur=1.1 a sauvegarde [12521]
>
>
Then, he obtains an http error code (302) which redirect him on the
index page of the site.
With a browser like firefox, ie, safari we get the good page without
any redirection.
After that, I've made some tests. I tried to change the user agent
string with wget to identify it as mozilla, but I have the same result
(redirection). I tried links (command line browser) and curl but same
problem.
Here is the result of curl command :
>
$ curl -v http://www.prizee.com/parole.php
* About to connect() to www.prizee.com port 80
* Trying 213.186.63.5... connected
* Connected to www.prizee.com (213.186.63.5) port 80
Quote:
>GET /parole.php HTTP/1.1
>User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1
>GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15 Host: www.prizee.com
>Accept: */*
>>
< HTTP/1.1 302 Found
< Date: Wed, 09 Aug 2006 00:02:57 GMT
< Server: Apache/1.3.33 (Unix) PHP/4.3.10
< X-Powered-By: PHP/4.3.10
< X-Accelerated-By: PHPA/1.3.3r2
< Expires: Mon, 26 Jul 1997 05:00:00 GMT
< Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
< Cache-Control: no-cache, must-revalidate
< Pragma: no-cache
< Set-Cookie: COOKIEis_accepted=1; path=/; domain=.prizee.com
< Location: /index.php?joueur=1
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
* Closing connection #0
>
>
So, my question is : How we can detect the use of a command line tool
on a web site ? Like the site above. Thank you for your answers.
I tried both Firefox and Konqueror and they both redirected me to the
second page, so there doesn't appear to be anything different between
using wget and using a graphical browser, at least to me.

You can't detect the use of a command line tool if they set the user
agent correctly. For example:

wget --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

followed by the url will tell the website you're using IE on Windows XP.

--
Chris Hope | www.electrictoolbox.com | www.linuxcdmall.com
  #3  
Old August 9th, 2006, 04:25 AM
flamer die.spam@hotmail.com
Guest
 
Posts: n/a
Default Re: detection of a robot in php


giminik@gmail.com wrote:
Quote:
Hello everybody :)
>
A friend recently showed me an odd thing while playing with the command
wget under linux, I don't know why... But the result has surprised me :
$ wget http://www.prizee.com/parole.php
--02:35:29-- http://www.prizee.com/parole.php
=`parole.php'
Resolution de www.prizee.com... 213.186.63.5
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...302 Found
Emplacement: /index.php?joueur=1 [suivant]
--02:35:30-- http://www.prizee.com/index.php?joueur=1
=`index.php?joueur=1.1'
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...200 OK
Longueur: non specifie [text/html]
>
[ <=>
>
] 12,521
--.--K/s
>
02:35:30 (103.57 KB/s) - ? index.php?joueur=1.1 a sauvegarde [12521]
>
>
Then, he obtains an http error code (302) which redirect him on the
index page of the site.
With a browser like firefox, ie, safari we get the good page without
any redirection.
After that, I've made some tests. I tried to change the user agent
string with wget to identify it as mozilla, but I have the same result
(redirection). I tried links (command line browser) and curl but same
problem.
Here is the result of curl command :
>
$ curl -v http://www.prizee.com/parole.php
* About to connect() to www.prizee.com port 80
* Trying 213.186.63.5... connected
* Connected to www.prizee.com (213.186.63.5) port 80
Quote:
GET /parole.php HTTP/1.1
User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15
Host: www.prizee.com
Accept: */*
< HTTP/1.1 302 Found
< Date: Wed, 09 Aug 2006 00:02:57 GMT
< Server: Apache/1.3.33 (Unix) PHP/4.3.10
< X-Powered-By: PHP/4.3.10
< X-Accelerated-By: PHPA/1.3.3r2
< Expires: Mon, 26 Jul 1997 05:00:00 GMT
< Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
< Cache-Control: no-cache, must-revalidate
< Pragma: no-cache
< Set-Cookie: COOKIEis_accepted=1; path=/; domain=.prizee.com
< Location: /index.php?joueur=1
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
* Closing connection #0
>
>
So, my question is : How we can detect the use of a command line tool
on a web site ? Like the site above. Thank you for your answers.
>
Sorry for my bad english, i'm french ;)
probably just redirects for linux users and not ms by checking the
agent-type.

Flamer.

  #4  
Old August 9th, 2006, 10:45 AM
giminik@gmail.com
Guest
 
Posts: n/a
Default Re: detection of a robot in php

Thank for your answers.
I found the problem. It was a session cookie problem.
I've just used the wget option : --keep-session-cookies with
--load-cookies to solve the problem.

 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles