472,102 Members | 2,122 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,102 software developers and data experts.

Curl gives 403 forbidden

I'm trying to retrieve information of a website using PHP and Curl.
This is the code I use:

<?
$tturl = "http://teletekst.nos.nl/";
echo "opening $tturl ...\n";
$ch = curl_init();
if (! $ch) die( "Cannot allocate a new PHP-CURL handle\n" );
$fp = fopen("ttread.htm", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_URL, $tturl);
curl_exec($ch);
curl_close($ch);
fclose($fp);
echo "finished\n";
?>

This results in a 403 forbidden page. However if I type the url
http://teletekst.nos.nl/ in my browser then it works fine (also with
cookies disabled). If I change $tturl in the script to
http://www.nos.nl/ itw works. What is teh difference between typing
itin my browser or accessing it with curl? Is tehere a workaround for
this?

Greetingz Bas

Aug 26 '05 #1
5 19976
"Basta" wrote:
I'm trying to retrieve information of a website using PHP and Curl.
This is the code I use:
(snip)
This results in a 403 forbidden page. However if I type the url
http://teletekst.nos.nl/ in my browser then it works fine (also with
cookies disabled).


That's probably because the owners of teletekst.nos.nl are fed up with
having idiot robots crawling all over their site and stealing its content.

If you had bothered to visit <http://teletekst.nos.nl/robots.txt> you might
have noticed that robots are not permitted to access this website. You're
getting a 403 response because their website has identified that you're
accessing it improperly.

There are probably some things you could do to bypass the blocks on this
website, but I'm not going to tell you what they are. Create your own
content. Don't steal it from other websites.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/
Aug 26 '05 #2
> There are probably some things you could do to bypass the blocks on this
website, but I'm not going to tell you what they are. Create your own
content. Don't steal it from other websites.


Thanx for your help. So I'm stealing content from a website? I can read
it but then I have to forget it as soon as possible otherwise I'm a
thief. Interesting thought. I'm surpsised you didn't even bother to
inform for what purpose I needed it.

Aug 28 '05 #3
On 2005-08-26, Basta <ba*******@gmail.com> wrote:
I'm trying to retrieve information of a website using PHP and Curl.
This is the code I use:

<?
$tturl = "http://teletekst.nos.nl/";
echo "opening $tturl ...\n";
$ch = curl_init();
if (! $ch) die( "Cannot allocate a new PHP-CURL handle\n" );
$fp = fopen("ttread.htm", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_URL, $tturl);
curl_exec($ch);
curl_close($ch);
fclose($fp);
echo "finished\n";
?>

This results in a 403 forbidden page. However if I type the url
http://teletekst.nos.nl/ in my browser then it works fine (also with
cookies disabled). If I change $tturl in the script to
http://www.nos.nl/ itw works. What is teh difference between typing
itin my browser or accessing it with curl? Is tehere a workaround for
this?


Perhaps it checks on user-agent?

--
Cheers,
- Jacob Atzen
Aug 28 '05 #4
> Perhaps it checks on user-agent?

Setting the CURLOPT_USERAGENT to "Mozilla/5.0 (Windows; U; Windows NT
5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0" doesn't help.

Aug 29 '05 #5
Basta wrote:
Perhaps it checks on user-agent?


Setting the CURLOPT_USERAGENT to "Mozilla/5.0 (Windows; U; Windows NT
5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0" doesn't help.


Or, may be referrer or cookie issue. Better use verbose mode and post
the log file here.

Sample to verbose mode and log:
$fp_err = fopen('verbose_file.txt', 'ab+');
fwrite($fp_err, date('Y-m-d H:i:s')."\n\n"); //add timestamp to the
verbose log
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_STDERR, $fp_err);

Also, check
<http://curl.haxx.se/libcurl/php/examples/?ex=cookiejar.php> for cookie
handling.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Aug 30 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Haluk Durmus | last post: by
3 posts views Thread by Chris Fortune | last post: by
3 posts views Thread by Hans | last post: by
reply views Thread by nfhm2k | last post: by
4 posts views Thread by zorro | last post: by
reply views Thread by xerc | last post: by
3 posts views Thread by rottmanj | last post: by
3 posts views Thread by buzz2050 | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.