473,387 Members | 1,493 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Trying to read google sercah page from python

Hi,

My program reads as follows
import urllib
print "-------- Google Web Page --------"
print urllib.urlopen('http://www.google.com//').read()

print "-------- Google Search Web Page --------"
print urllib.urlopen('http://www.google.com/search?
hl=en&q=ted').read()

The first urlib read works fine. The second one, when I am trying to
read in googles serach results, I get a web page saying I do not have
permission.
"Your client does not have permission to get URL "
Is there a way to do this? I am trying to write a program to read in
googles esercah results.

-Ted
Aug 19 '08 #1
5 1974
On Aug 19, 9:47 am, "tedpot...@gmail.com" <tedpot...@gmail.comwrote:
Hi,

My program reads as follows
import urllib

print "-------- Google Web Page --------"
print urllib.urlopen('http://www.google.com//').read()

print "-------- Google Search Web Page --------"
print urllib.urlopen('http://www.google.com/search?
hl=en&q=ted').read()

The first urlib read works fine. The second one, when I am trying to
read in googles serach results, I get a web page saying I do not have
permission.
"Your client does not have permission to get URL "
Is there a way to do this? I am trying to write a program to read in
googles esercah results.

-Ted
This is a PHP discussion group - not a Python group - but I'll answer.

It's against Google's Terms of Service to do what you're doing, so
they're blocking you. (Not you specifically, but anyone who requests
their search results in that manner.)

If you want to do it anyway, you'd have to trick Google into thinking
you're an actual web user. So you'd have to do some spoofing. I'll
leave that as an exercise for the reader.

Walter
Aug 19 '08 #2
te*******@gmail.com wrote:
Hi,

My program reads as follows
import urllib
print "-------- Google Web Page --------"
print urllib.urlopen('http://www.google.com//').read()

print "-------- Google Search Web Page --------"
print urllib.urlopen('http://www.google.com/search?
hl=en&q=ted').read()

The first urlib read works fine. The second one, when I am trying to
read in googles serach results, I get a web page saying I do not have
permission.
"Your client does not have permission to get URL "
Is there a way to do this? I am trying to write a program to read in
googles esercah results.

-Ted
Actually, the problem is not google blocking you. Your request is
incorrect. But as Walter indicated, this is not a Python support group.
Try comp.lang.python.

And also, as Walter indicated, it is against Google's TOS. They aren't
blocking you now - but they will if they catch you.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Aug 20 '08 #3
Actually, the problem is not google blocking you. Your request is
incorrect.
http://www.google.com// is strangely formed, but it works. Google
doesn't appear to block automated requests to their front page.

Google _is_ blocking the other request.

Viewing http://www.google.com/search?hl=en&q=ted in Firefox works
fine.

"curl http://www.google.com/search?hl=en&q=ted" returns the error he
mentioned previously. Probably returns the error via Python for the
same reason.

Walter
Aug 20 '08 #4
WalterGR wrote:
>Actually, the problem is not google blocking you. Your request is
incorrect.

http://www.google.com// is strangely formed, but it works. Google
doesn't appear to block automated requests to their front page.

Google _is_ blocking the other request.

Viewing http://www.google.com/search?hl=en&q=ted in Firefox works
fine.

"curl http://www.google.com/search?hl=en&q=ted" returns the error he
mentioned previously. Probably returns the error via Python for the
same reason.

Walter
Your crystal ball must be working better than mine. I can't tell that.
But I could see a lot of other possibilities.

But this is not a python group, so I won't discuss them here.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Aug 20 '08 #5
On Aug 19, 7:10 pm, Jerry Stuckle <jstuck...@attglobal.netwrote:
Your crystal ball must be working better than mine. I can't tell that.
But I could see a lot of other possibilities.
Sorry to hear about your crystal ball. But you don't need one in this
particular case.

All one needs is knowledge of user agents and user agent overriding,
and then one can test my hypothesis. (Which, given that I've now
tested it, is in fact, fact.)

Walter
Aug 20 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

42
by: Bicho Verde | last post by:
I have now free time and money to do what I want :-) I have some basic skills in programming (C, Pascal, Macromedia Actionscript) but don't know exactly what to do in the world of programming. And...
13
by: fuzzyman | last post by:
I've hacked together a 'GoogleCacheServer'. It is based on SimpleHTTPServer. Run the following script (hopefully google groups won't mangle the indentation) and set your browser proxy settings to...
18
by: jas | last post by:
Hi, I would like to start a new process and be able to read/write from/to it. I have tried things like... import subprocess as sp p = sp.Popen("cmd.exe", stdout=sp.PIPE)...
34
by: Ross Reyes | last post by:
HI - Sorry for maybe a too simple a question but I googled and also checked my reference O'Reilly Learning Python book and I did not find a satisfactory answer. When I use readlines, what...
16
by: Duncan Booth | last post by:
Google have announced a new service called 'Google App Engine' which may be of interest to some of the people here (although if you want to sign up you'll have to join the queue behind me): From...
4
by: Stef Mientki | last post by:
hello, In a program I want to download (updated) files from google code (not the svn section). I could find a python script to upload files, but not for downloading. Anyone has a hint or a...
1
by: tedpottel | last post by:
Hi, I am trying to write a BOT to read the search results from Google. When I read from the www.google.com, the code works fine, loads in the web page. When I try to load in a url with the...
1
by: tedpottel | last post by:
Hi, I am trying to install the mechanize lib so I can use python to do webbrowseing. First I set up easy_install When I ran the script, it download the files ok, then I got these error...
1
by: tedpottel | last post by:
Hi, I can read the home page using the mechanize lib. Is there a way to load in web pages using filename.html instad of servername/ filename.html. Lots of time the links just have the file...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.