473,386 Members | 1,721 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

detection of a robot in php

Hello everybody :)

A friend recently showed me an odd thing while playing with the command
wget under linux, I don't know why... But the result has surprised me :
$ wget http://www.prizee.com/parole.php
--02:35:29-- http://www.prizee.com/parole.php
=`parole.php'
Resolution de www.prizee.com... 213.186.63.5
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...302 Found
Emplacement: /index.php?joueur=1 [suivant]
--02:35:30-- http://www.prizee.com/index.php?joueur=1
=`index.php?joueur=1.1'
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...200 OK
Longueur: non specifie [text/html]

[ <=>

] 12,521
--.--K/s

02:35:30 (103.57 KB/s) - ? index.php?joueur=1.1 a sauvegarde [12521]
Then, he obtains an http error code (302) which redirect him on the
index page of the site.
With a browser like firefox, ie, safari we get the good page without
any redirection.
After that, I've made some tests. I tried to change the user agent
string with wget to identify it as mozilla, but I have the same result
(redirection). I tried links (command line browser) and curl but same
problem.
Here is the result of curl command :

$ curl -v http://www.prizee.com/parole.php
* About to connect() to www.prizee.com port 80
* Trying 213.186.63.5... connected
* Connected to www.prizee.com (213.186.63.5) port 80
GET /parole.php HTTP/1.1
User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15
Host: www.prizee.com
Accept: */*
< HTTP/1.1 302 Found
< Date: Wed, 09 Aug 2006 00:02:57 GMT
< Server: Apache/1.3.33 (Unix) PHP/4.3.10
< X-Powered-By: PHP/4.3.10
< X-Accelerated-By: PHPA/1.3.3r2
< Expires: Mon, 26 Jul 1997 05:00:00 GMT
< Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
< Cache-Control: no-cache, must-revalidate
< Pragma: no-cache
< Set-Cookie: COOKIEis_accepted=1; path=/; domain=.prizee.com
< Location: /index.php?joueur=1
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
* Closing connection #0
So, my question is : How we can detect the use of a command line tool
on a web site ? Like the site above. Thank you for your answers.

Sorry for my bad english, i'm french ;)

Aug 9 '06 #1
3 2170
gi*****@gmail.com wrote:
Hello everybody :)

A friend recently showed me an odd thing while playing with the
command wget under linux, I don't know why... But the result has
surprised me : $ wget http://www.prizee.com/parole.php
--02:35:29-- http://www.prizee.com/parole.php
=`parole.php'
Resolution de www.prizee.com... 213.186.63.5
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...302 Found
Emplacement: /index.php?joueur=1 [suivant]
--02:35:30-- http://www.prizee.com/index.php?joueur=1
=`index.php?joueur=1.1'
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...200 OK
Longueur: non specifie [text/html]

[ <=>

] 12,521
--.--K/s

02:35:30 (103.57 KB/s) - ? index.php?joueur=1.1 a sauvegarde [12521]
Then, he obtains an http error code (302) which redirect him on the
index page of the site.
With a browser like firefox, ie, safari we get the good page without
any redirection.
After that, I've made some tests. I tried to change the user agent
string with wget to identify it as mozilla, but I have the same result
(redirection). I tried links (command line browser) and curl but same
problem.
Here is the result of curl command :

$ curl -v http://www.prizee.com/parole.php
* About to connect() to www.prizee.com port 80
* Trying 213.186.63.5... connected
* Connected to www.prizee.com (213.186.63.5) port 80
>GET /parole.php HTTP/1.1
User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1
GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15 Host: www.prizee.com
Accept: */*
< HTTP/1.1 302 Found
< Date: Wed, 09 Aug 2006 00:02:57 GMT
< Server: Apache/1.3.33 (Unix) PHP/4.3.10
< X-Powered-By: PHP/4.3.10
< X-Accelerated-By: PHPA/1.3.3r2
< Expires: Mon, 26 Jul 1997 05:00:00 GMT
< Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
< Cache-Control: no-cache, must-revalidate
< Pragma: no-cache
< Set-Cookie: COOKIEis_accepted=1; path=/; domain=.prizee.com
< Location: /index.php?joueur=1
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
* Closing connection #0
So, my question is : How we can detect the use of a command line tool
on a web site ? Like the site above. Thank you for your answers.
I tried both Firefox and Konqueror and they both redirected me to the
second page, so there doesn't appear to be anything different between
using wget and using a graphical browser, at least to me.

You can't detect the use of a command line tool if they set the user
agent correctly. For example:

wget --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

followed by the url will tell the website you're using IE on Windows XP.

--
Chris Hope | www.electrictoolbox.com | www.linuxcdmall.com
Aug 9 '06 #2

gi*****@gmail.com wrote:
Hello everybody :)

A friend recently showed me an odd thing while playing with the command
wget under linux, I don't know why... But the result has surprised me :
$ wget http://www.prizee.com/parole.php
--02:35:29-- http://www.prizee.com/parole.php
=`parole.php'
Resolution de www.prizee.com... 213.186.63.5
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...302 Found
Emplacement: /index.php?joueur=1 [suivant]
--02:35:30-- http://www.prizee.com/index.php?joueur=1
=`index.php?joueur=1.1'
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...200 OK
Longueur: non specifie [text/html]

[ <=>

] 12,521
--.--K/s

02:35:30 (103.57 KB/s) - ? index.php?joueur=1.1 a sauvegarde [12521]
Then, he obtains an http error code (302) which redirect him on the
index page of the site.
With a browser like firefox, ie, safari we get the good page without
any redirection.
After that, I've made some tests. I tried to change the user agent
string with wget to identify it as mozilla, but I have the same result
(redirection). I tried links (command line browser) and curl but same
problem.
Here is the result of curl command :

$ curl -v http://www.prizee.com/parole.php
* About to connect() to www.prizee.com port 80
* Trying 213.186.63.5... connected
* Connected to www.prizee.com (213.186.63.5) port 80
GET /parole.php HTTP/1.1
User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15
Host: www.prizee.com
Accept: */*
< HTTP/1.1 302 Found
< Date: Wed, 09 Aug 2006 00:02:57 GMT
< Server: Apache/1.3.33 (Unix) PHP/4.3.10
< X-Powered-By: PHP/4.3.10
< X-Accelerated-By: PHPA/1.3.3r2
< Expires: Mon, 26 Jul 1997 05:00:00 GMT
< Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
< Cache-Control: no-cache, must-revalidate
< Pragma: no-cache
< Set-Cookie: COOKIEis_accepted=1; path=/; domain=.prizee.com
< Location: /index.php?joueur=1
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
* Closing connection #0
So, my question is : How we can detect the use of a command line tool
on a web site ? Like the site above. Thank you for your answers.

Sorry for my bad english, i'm french ;)
probably just redirects for linux users and not ms by checking the
agent-type.

Flamer.

Aug 9 '06 #3
Thank for your answers.
I found the problem. It was a session cookie problem.
I've just used the wget option : --keep-session-cookies with
--load-cookies to solve the problem.

Aug 9 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Adrian Lumsden | last post by:
Hello, I have an app where I have to screen scrape to capture an image from a JMF film player. The user is given a dialog with a list of frames that can be exported as images. If the one they...
0
by: Jonathan Vance | last post by:
I am looking for a python robot that Van Rossum released with python 0.9.8. It may have been the first web robot (see http://www.webhistory.org/www.lists/www-talk.1993q1/0060.html). I've had no...
60
by: Fotios | last post by:
Hi guys, I have put together a flexible client-side user agent detector (written in js). I thought that some of you may find it useful. Code is here: http://fotios.cc/software/ua_detect.htm ...
6
by: Gustav Medler | last post by:
Hello, there is a known problem with Opera and the execution of content shown in <NOSCRIPT> tag. Everythings works fine, if there is only one simple script like:...
8
by: R. Smits | last post by:
I've have got this script, the only thing I want to be changed is the first part. It has to detect IE version 6 instead of just "Microsoft Internet Explorer". Can somebody help me out? I tried...
1
by: nnobakht | last post by:
Hi, I'm working on an assignment for school which i am a bit stuck on. The assignment is to make robot which i have been given the library for move around different boards and collecting "coins" and...
0
by: Shiv Kumar | last post by:
Rational Robot is a complete set of components for automating the testing of Microsoft Windows client/server and Internet applications running under Windows NT 4.0, Windows XP, Windows 2000, and...
0
by: origami.takarana | last post by:
Intrusion Detection Strategies ----------------------------------- Until now, we’ve primarily discussed monitoring in how it relates to intrusion detection, but there’s more to an overall...
10
by: Conrad Lender | last post by:
In a recent thread in this group, I said that in some cases object detection and feature tests weren't sufficient in the development of cross-browser applications, and that there were situations...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.