473,385 Members | 1,474 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Error downloading page, some pages work great but cant seem to get this one

I am trying to download the source code for an array of differant
websites, usually i will get something like this from Dilbert.com:

HTTP/1.1 200 OK
Date: Fri, 23 Apr 2004 00:04:54 GMT
Server: Apache/1.3.27 (Unix) Resin/2.1.s030505 mod_ssl/2.8.14
OpenSSL/0.9.7b
Last-Modified: Thu, 22 Apr 2004 07:05:10 GMT
ETag: "182ba6-9d7b-40876ea6"
Accept-Ranges: bytes
Content-Length: 40315
Connection: close
Content-Type: text/html
then the whole html page prints
.....
the problem occurs when i try the same thing on www.kingsofchaos.com i
get the following header:

HTTP/1.1 200 OK
Date: Fri, 23 Apr 2004 00:16:49 GMT
Server: Apache/1.3.29 (Unix) (Gentoo/Linux)
Connection: close
Content-Type: text/html

with out the page attatched.
I was wondering if you had any ideas on why i cant access the page,
and any suggestions as to how i should do it. Right now i am using the
following code:
use IO::Socket::INET;
my $host = $_[0];
my $get = $_[1];
my $port= 80;
my $protocol = "tcp";
my $socket;
my @page;
$socket = IO::Socket::INET->new(PeerAddr => $host, PeerPort => $port,
Proto => $protocol) or die "Could not connect\n";
#sends request
$socket->send("GET $get HTTP/1.0\nHOST: $host\n\n");
#recieve desired file
@page=<$socket>;
Jul 19 '05 #1
1 1984
Jack Schafer wrote:
the problem occurs when i try the same thing on www.kingsofchaos.com
$socket->send("GET $get HTTP/1.0\nHOST: $host\n\n");
@page=<$socket>;


1) You're doing it the hard way. Use the LWP modules instead.
2) Because of 1, you're not sending all of the HTTP headers the
web server wants to see.

According to the Web Scraping Proxy (http://www.research.att.com/~hpk/wsp/)
you'll need to store and send cookies, and execute javascript.

# Request: http://www.kingsofchaos.com/
$request = new HTTP::Request('GET' => "http://www.kingsofchaos.com/");
# Set-Cookie: koc_session=ea30aa58e36; path=/; domain=www.kingsofchaos.com
# Set-Cookie: security_hash=323466; expires=Sun, 23-May-2004 08:17:26 GMT;
path=/; domain=.kingsofchaos.com
# Set-Cookie: cookie_hash=801f782dce8147; path=/

3) Post to comp.lang.perl.misc (instead of comp.lang.perl) next time.
-Joe
Jul 19 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: WindAndWaves | last post by:
Hi Gurus I am keen to make a search page on a website, but I have absolutely zero experience with PHP. I am going to hire an expert, but I thought that it may pay to try it a bit first myself...
8
by: middletree | last post by:
I had some text links at the top of all my pages (in one include file), which worked just fine. But I was asked to make it so that people in a certain department, (this is an Intranet app) would...
2
by: Stoney | last post by:
Hello All I have an application I am working on that uses "compiled" asp pages. Once the asp pages are done, they are moved into a VB6 dll to be Response.Written from there. This work great, and...
9
by: Jacquie | last post by:
hello- I am hoping someone can help me solve this mystery. My pages always say Done with error on page on the bottom. They appear fine on my computer but other people can't all see it on their...
9
by: Nicole | last post by:
Okay, so I was working primarily in dreamweaver and the site looks very good in both IE (our customers primarily use this) and Firefox (my new 'thing'), but I ran it through the validator and...
3
by: JP | last post by:
I need to be able to trap errors at the application level. I added this code to the Global.asax file. The code I wrote is supposed to get the last error that was generated and write to the event...
2
by: Toralf | last post by:
Greetings I have trouble catching the actual HTML Source code of a HttpWebRequest when the result of a successful login is a Frameset (I cant seem to get hold of the Frameset page Source code)....
7
by: sasquatch | last post by:
Hi, I've a a site with nested master pages and content pages. I tried using a theme with a stylesheet in the app_themes directory referencing it in the web.config file from a pages tag theme...
9
by: AES | last post by:
I fairly often make PDF copies of web pages or sites by copying the web page link from the web page itself and pasting it into the Acrobat 7.0 Standard "Create PDF From Web Page" command. (Not...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.