471,318 Members | 1,879 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,318 software developers and data experts.

simple spider in python

Hi everybody, i'm new to the forum so: hello everybody (should I say
"world"?) ^_^
I'm trying to do a simple spider in python which:

1) ask google a query
2) parse the data

I'm a python newbie so *any* help would be very, very welcommed.
Thanks in advice!

cheers!

Aug 23 '07 #1
5 1626
On Aug 23, 8:33 am, gmcalen...@gmail.com wrote:
Hi everybody, i'm new to the forum so: hello everybody (should I say
"world"?) ^_^
I'm trying to do a simple spider in python which:

1) ask google a query
2) parse the data

I'm a python newbie so *any* help would be very, very welcommed.
Thanks in advice!

cheers!
Take a look at the docs for urllib2.urlopen(). The examples should
give you most of what you need.
Aug 23 '07 #2
I'm trying to do a simple spider in python which:
>
1) ask google a query
2) parse the data
While you could use urllib2.urlopen() as Frederick mentioned, there is
actually a Python module built JUST for getting info from Google
queries! So check out PyGoogle: http://pygoogle.sourceforge.net/

After you install and import it like a normal Python module, you can
do things like:
doGoogleSearch("thing to query") and get results! Very easy to use.

One thing that might throw you at first: You need to get an API key
from Google, and use that when you setup the classes. The link I
pasted above has all the details.

Good luck!

-Sam

Aug 23 '07 #3
Well, it turned out that google since Dec 2006 is not giving out SOAP
api keys anymore.
What a shame! any tip? ;-)

Aug 23 '07 #4

On Aug 23, 2007, at 6:33 AM, gm********@gmail.com wrote:
Hi everybody, i'm new to the forum so: hello everybody (should I say
"world"?) ^_^
I'm trying to do a simple spider in python which:

1) ask google a query
2) parse the data

I'm a python newbie so *any* help would be very, very welcommed.
Thanks in advice!
First thing to know is that google doesn't like the User-agent header
urllib2 uses by default -- you'll have to masquerade as a browser
(google throws a 403 error if you connect as 'User-Agent: Python-
urllib/2.5': look into urllib2.build_opener()). Second thing to know
is that the interesting results have class attribute set to "l".

hope this helps,
Michael

---
Asking a person who he *is* ... is not Pythonic! --Anton Vredegoor


Aug 23 '07 #5
In message <ma**************************************@python.o rg>, Michael
Bentley wrote:
First thing to know is that google doesn't like the User-agent header
urllib2 uses by default -- you'll have to masquerade as a browser
(google throws a 403 error if you connect as 'User-Agent: Python-
urllib/2.5': look into urllib2.build_opener()).
A bit small-minded of Google, don't you think. They also block the default
user-agent header for wget.
Sep 1 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

65 posts views Thread by Anthony_Barker | last post: by
159 posts views Thread by petantik | last post: by
1 post views Thread by bob | last post: by
9 posts views Thread by Chris Pearl | last post: by
9 posts views Thread by Svein Seldal | last post: by
1 post views Thread by spython | last post: by
1 post views Thread by luca bertini | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.