By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
458,067 Members | 949 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 458,067 IT Pros & Developers. It's quick & easy.

Crawling the site gives me captcha

P: 12
I was crawling http://www.somesite.com and it was working. I used PHP Symfony crawler, I even used proxies. I was testing it locally so I tried without proxies.

I added 5 seconds of sleep time between each requests, I tried random sleep between requests, I tried 10 seconds. After only few requests it gave me this page http://www.somesite.com/captcha.html

I tried incognito and it gave me the same page. I guess it ties my crawling with IP address. How does it even recognizes those requests as non human?
3 Weeks Ago #1
Share this Question
Share on Google+
2 Replies


Expert 100+
P: 260
I am not sure about this one but I've read people talking about HTTrack ignoring proxy while downloading HTTPS content. Might be the same case with this too.
3 Weeks Ago #2

Rabbit
Expert Mod 10K+
P: 12,427
Google's version, as an example, looks at mouse movements on the page to see if it's "natural."

There's tons of other potential factors one can look at. Scroll rates, mouse clicks, location of the click, http headers, key presses, key press rate, etc.
3 Weeks Ago #3

Post your reply

Sign in to post your reply or Sign up for a free account.