473,387 Members | 1,574 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

fetching webpage

I am trying to crawl webpages in citeseer domain (a collection of research
papers mostly in computer science).

I have used the following code snippet.

#####
import urllib

sock = urllib.urlopen("http://citeseer.ist.psu.edu")
webcontent = sock.read().split('\n')
sock.close()
print webcontent
########

Then I get the following error message.
['<!--#set var="TITLE" value="Server error!"', '--><!--#include
virtual="include/top.html" -->', '', ' <!--#if
expr="$REDIRECT_ERROR_NOTES" -->', '', ' The server encountered an
internal error and was ', ' unable to complete your request.', '', '
<!--#include virtual="include/spacer.html" -->', '', ' Error message:', '
<br /><!--#echo encoding="none" var="REDIRECT_ERROR_NOTES" -->', '', '
<!--#else -->', '', ' The server encountered an internal error and was ',
' unable to complete your request. Either the server is', ' overloaded
or there was an error in a CGI script.', '', ' <!--#endif -->', '',
'<!--#include virtual="include/bottom.html" -->', '']

However, the url is valid and it works fine if I open the url in my web
browser.
Or, if I use a different url (http://www.google.com instead of
http://citeseer.ist.psu.edu),
then it works.

What is wrong?
Could it be that the citeseer webserver checks the http request, and it sees
something
that it doesn't like and reject the request?
What should I do?

Thank you.

Best regards,
Yookyung

Dec 30 '05 #1
1 1205
I went to the URL you posted, and it looks like that error is the
content you should be recieving. Try refreshing your browser cache, you
could be loading a cached page.

Charles

yookyung wrote:
I am trying to crawl webpages in citeseer domain (a collection of research
papers mostly in computer science).

I have used the following code snippet.

#####
import urllib

sock = urllib.urlopen("http://citeseer.ist.psu.edu")
webcontent = sock.read().split('\n')
sock.close()
print webcontent
########

Then I get the following error message.
['<!--#set var="TITLE" value="Server error!"', '--><!--#include
virtual="include/top.html" -->', '', ' <!--#if
expr="$REDIRECT_ERROR_NOTES" -->', '', ' The server encountered an
internal error and was ', ' unable to complete your request.', '', '
<!--#include virtual="include/spacer.html" -->', '', ' Error message:', '
<br /><!--#echo encoding="none" var="REDIRECT_ERROR_NOTES" -->', '', '
<!--#else -->', '', ' The server encountered an internal error and was ',
' unable to complete your request. Either the server is', ' overloaded
or there was an error in a CGI script.', '', ' <!--#endif -->', '',
'<!--#include virtual="include/bottom.html" -->', '']

However, the url is valid and it works fine if I open the url in my web
browser.
Or, if I use a different url (http://www.google.com instead of
http://citeseer.ist.psu.edu),
then it works.

What is wrong?
Could it be that the citeseer webserver checks the http request, and it sees
something
that it doesn't like and reject the request?
What should I do?

Thank you.

Best regards,
Yookyung


Dec 30 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: C3 | last post by:
Hi everone. I'm trying to write a shell script that fetches a number of images from a map website (www.whereis.com.au). Basically, the page asks you for a street address and then queries the...
1
by: Srinivasan R | last post by:
Hi, I am trying to get asynchronous fetching of data from sql server database. I would like to show the progress of fetching records in GUI. Is there any simple/complex steps for the same in C#...
0
by: Shujun Huang | last post by:
Hi, I am working on converting Informix database to Postgre. I have one question for fetching records using PostgreSQL. The record I am fetching is a variable size text string. Before fetching...
22
by: Sandman | last post by:
So, I have this content management system I've developed myself. The system has a solid community part where members can register and then participate in forums, write weblogs and a ton of other...
8
by: nazeers | last post by:
Hi All, I am new to XML and I need some help from you all. we have a requirement like... we want to fetch the base URL that is present in the XML file , and getting it displayed in the...
5
by: Bhavesh | last post by:
Hello genious people, I m trying to insert a LARGE text from Multiline Textbox into my table of sqlserver2000. I m using vs-2005. Please note that I dont want to store blob data From FILE...
1
by: Bhavesh | last post by:
Hi Bruce, Thanks For Reply. U were right, Needed to pass string , but also need to pass size of Data( instead of 16, passed actual length of data). So that worked for me & didn't get any...
3
by: beautifulcarcass | last post by:
Hi, in this school project im making, im having a problem if i could display the column names from a table on a MYSQL database to a webpage through PHP with a loop is there a function to display...
1
by: itChirag | last post by:
Hi I want to know how data is fetched from web page & stored in sql data base. I have the design of webpage but cannot save data in sql table. please help!
2
by: SunshineInTheRain | last post by:
I'm trying to modify a long long code within a button click by make the insert/update/delete/select using the same transaction. Purpose is to make sure every operation can be rollback instead of some...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.