By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,200 Members | 1,754 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,200 IT Pros & Developers. It's quick & easy.

Error downloading page, some pages work great but cant seem to get this one

P: n/a
I am trying to download the source code for an array of differant
websites, usually i will get something like this from Dilbert.com:

HTTP/1.1 200 OK
Date: Fri, 23 Apr 2004 00:04:54 GMT
Server: Apache/1.3.27 (Unix) Resin/2.1.s030505 mod_ssl/2.8.14
OpenSSL/0.9.7b
Last-Modified: Thu, 22 Apr 2004 07:05:10 GMT
ETag: "182ba6-9d7b-40876ea6"
Accept-Ranges: bytes
Content-Length: 40315
Connection: close
Content-Type: text/html
then the whole html page prints
.....
the problem occurs when i try the same thing on www.kingsofchaos.com i
get the following header:

HTTP/1.1 200 OK
Date: Fri, 23 Apr 2004 00:16:49 GMT
Server: Apache/1.3.29 (Unix) (Gentoo/Linux)
Connection: close
Content-Type: text/html

with out the page attatched.
I was wondering if you had any ideas on why i cant access the page,
and any suggestions as to how i should do it. Right now i am using the
following code:
use IO::Socket::INET;
my $host = $_[0];
my $get = $_[1];
my $port= 80;
my $protocol = "tcp";
my $socket;
my @page;
$socket = IO::Socket::INET->new(PeerAddr => $host, PeerPort => $port,
Proto => $protocol) or die "Could not connect\n";
#sends request
$socket->send("GET $get HTTP/1.0\nHOST: $host\n\n");
#recieve desired file
@page=<$socket>;
Jul 19 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Jack Schafer wrote:
the problem occurs when i try the same thing on www.kingsofchaos.com
$socket->send("GET $get HTTP/1.0\nHOST: $host\n\n");
@page=<$socket>;


1) You're doing it the hard way. Use the LWP modules instead.
2) Because of 1, you're not sending all of the HTTP headers the
web server wants to see.

According to the Web Scraping Proxy (http://www.research.att.com/~hpk/wsp/)
you'll need to store and send cookies, and execute javascript.

# Request: http://www.kingsofchaos.com/
$request = new HTTP::Request('GET' => "http://www.kingsofchaos.com/");
# Set-Cookie: koc_session=ea30aa58e36; path=/; domain=www.kingsofchaos.com
# Set-Cookie: security_hash=323466; expires=Sun, 23-May-2004 08:17:26 GMT;
path=/; domain=.kingsofchaos.com
# Set-Cookie: cookie_hash=801f782dce8147; path=/

3) Post to comp.lang.perl.misc (instead of comp.lang.perl) next time.
-Joe
Jul 19 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.