473,800 Members | 2,476 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Trying to read google sercah page from python

Hi,

My program reads as follows
import urllib
print "-------- Google Web Page --------"
print urllib.urlopen( 'http://www.google.com//').read()

print "-------- Google Search Web Page --------"
print urllib.urlopen( 'http://www.google.com/search?
hl=en&q=ted').r ead()

The first urlib read works fine. The second one, when I am trying to
read in googles serach results, I get a web page saying I do not have
permission.
"Your client does not have permission to get URL "
Is there a way to do this? I am trying to write a program to read in
googles esercah results.

-Ted
Aug 19 '08 #1
5 1996
On Aug 19, 9:47 am, "tedpot...@gmai l.com" <tedpot...@gmai l.comwrote:
Hi,

My program reads as follows
import urllib

print "-------- Google Web Page --------"
print urllib.urlopen( 'http://www.google.com//').read()

print "-------- Google Search Web Page --------"
print urllib.urlopen( 'http://www.google.com/search?
hl=en&q=ted').r ead()

The first urlib read works fine. The second one, when I am trying to
read in googles serach results, I get a web page saying I do not have
permission.
"Your client does not have permission to get URL "
Is there a way to do this? I am trying to write a program to read in
googles esercah results.

-Ted
This is a PHP discussion group - not a Python group - but I'll answer.

It's against Google's Terms of Service to do what you're doing, so
they're blocking you. (Not you specifically, but anyone who requests
their search results in that manner.)

If you want to do it anyway, you'd have to trick Google into thinking
you're an actual web user. So you'd have to do some spoofing. I'll
leave that as an exercise for the reader.

Walter
Aug 19 '08 #2
te*******@gmail .com wrote:
Hi,

My program reads as follows
import urllib
print "-------- Google Web Page --------"
print urllib.urlopen( 'http://www.google.com//').read()

print "-------- Google Search Web Page --------"
print urllib.urlopen( 'http://www.google.com/search?
hl=en&q=ted').r ead()

The first urlib read works fine. The second one, when I am trying to
read in googles serach results, I get a web page saying I do not have
permission.
"Your client does not have permission to get URL "
Is there a way to do this? I am trying to write a program to read in
googles esercah results.

-Ted
Actually, the problem is not google blocking you. Your request is
incorrect. But as Walter indicated, this is not a Python support group.
Try comp.lang.pytho n.

And also, as Walter indicated, it is against Google's TOS. They aren't
blocking you now - but they will if they catch you.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Aug 20 '08 #3
Actually, the problem is not google blocking you. Your request is
incorrect.
http://www.google.com// is strangely formed, but it works. Google
doesn't appear to block automated requests to their front page.

Google _is_ blocking the other request.

Viewing http://www.google.com/search?hl=en&q=ted in Firefox works
fine.

"curl http://www.google.com/search?hl=en&q= ted" returns the error he
mentioned previously. Probably returns the error via Python for the
same reason.

Walter
Aug 20 '08 #4
WalterGR wrote:
>Actually, the problem is not google blocking you. Your request is
incorrect.

http://www.google.com// is strangely formed, but it works. Google
doesn't appear to block automated requests to their front page.

Google _is_ blocking the other request.

Viewing http://www.google.com/search?hl=en&q=ted in Firefox works
fine.

"curl http://www.google.com/search?hl=en&q= ted" returns the error he
mentioned previously. Probably returns the error via Python for the
same reason.

Walter
Your crystal ball must be working better than mine. I can't tell that.
But I could see a lot of other possibilities.

But this is not a python group, so I won't discuss them here.
--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Aug 20 '08 #5
On Aug 19, 7:10 pm, Jerry Stuckle <jstuck...@attg lobal.netwrote:
Your crystal ball must be working better than mine. I can't tell that.
But I could see a lot of other possibilities.
Sorry to hear about your crystal ball. But you don't need one in this
particular case.

All one needs is knowledge of user agents and user agent overriding,
and then one can test my hypothesis. (Which, given that I've now
tested it, is in fact, fact.)

Walter
Aug 20 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

42
3722
by: Bicho Verde | last post by:
I have now free time and money to do what I want :-) I have some basic skills in programming (C, Pascal, Macromedia Actionscript) but don't know exactly what to do in the world of programming. And also I don't know exactly why would I learn Python rather than C#, C++ or Perl. Basicaly I don't know where to start, if there is much to do or if it is has it seems and there is software to everything nowadays and so doesn't make sense to spend...
13
2946
by: fuzzyman | last post by:
I've hacked together a 'GoogleCacheServer'. It is based on SimpleHTTPServer. Run the following script (hopefully google groups won't mangle the indentation) and set your browser proxy settings to 'localhost:8000'. It will let you browse the internet using google's cache. Obviously you'll miss images, javascript, css files, etc. See the world as google sees it ! (This is actually an 'inventive' short term measure to get round a...
18
4895
by: jas | last post by:
Hi, I would like to start a new process and be able to read/write from/to it. I have tried things like... import subprocess as sp p = sp.Popen("cmd.exe", stdout=sp.PIPE) p.stdin.write("hostname\n") however, it doesn't seem to work. I think the cmd.exe is catching it.
34
2817
by: Ross Reyes | last post by:
HI - Sorry for maybe a too simple a question but I googled and also checked my reference O'Reilly Learning Python book and I did not find a satisfactory answer. When I use readlines, what happens if the number of lines is huge? I have a very big file (4GB) I want to read in, but I'm sure there must be some limitation to readlines and I'd like to know how it is handled by python. I am using it like this:
16
2585
by: Duncan Booth | last post by:
Google have announced a new service called 'Google App Engine' which may be of interest to some of the people here (although if you want to sign up you'll have to join the queue behind me): From the introduction: http://code.google.com/appengine
4
3393
by: Stef Mientki | last post by:
hello, In a program I want to download (updated) files from google code (not the svn section). I could find a python script to upload files, but not for downloading. Anyone has a hint or a solution ? thanks,
1
987
by: tedpottel | last post by:
Hi, I am trying to write a BOT to read the search results from Google. When I read from the www.google.com, the code works fine, loads in the web page. When I try to load in a url with the search results, http://www.google.com/search?hl=en&q=ted', I get a web page that says I do not have permissions. Is theree a way around this, or is Google just to smart???? the program looks like this
1
1932
by: tedpottel | last post by:
Hi, I am trying to install the mechanize lib so I can use python to do webbrowseing. First I set up easy_install When I ran the script, it download the files ok, then I got these error messages sun is not reganized as a internal command I did a sercah on sun.* and the sercah came up empty, am I missing
1
2238
by: tedpottel | last post by:
Hi, I can read the home page using the mechanize lib. Is there a way to load in web pages using filename.html instad of servername/ filename.html. Lots of time the links just have the file name. I'm trying to read in the links name and then vsit those pages. here is the sample code I am ussing.
0
9690
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9551
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10504
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10274
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10251
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10033
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6811
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5469
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3764
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.