473,237 Members | 1,274 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,237 software developers and data experts.

trouble getting google through urllib

So I'm writing a bot in python that will be able to do all kinds of
weird shit. One of those weird shit is the ability to translate text
from one language to another, which I figured I'd use google translate
to do. Here is the section for translation that I'm having trouble
with:

elif(line[abuindex+1]=="translate"): #if user inputs
translate
text=""
for i in range(abuindex+2, len(line)): #concantenate all
text to be translated
text=text+"%20"+line[i]

t_url="http://translate.google.com/translate_t?text='"+text+"'&hl=en&langpair=es|en&t bb=1"
print "url: %s" % t_url #debug msg
urlfi=urllib.urlopen(t_url) #make a file object from what
google sends
t_html=urlfi.read( ) #read from urlfi file
print "html: %s" % t_html #debug msg
print "text: %s" % text #debug msg

This uses urllib to open the url and abuindex+2 is the first word in
the string to be translated and line is an array of the message sent to
the bot from the server. After this I'll add something to parse through
the html and take out the part that is the translated text. The problem
is that when I run this the html output is the following (I asked it to
translate como estas here):

<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>403 Forbidden</title>
<style><!--
body {font-family: arial,sans-serif}
div.nav {margin-top: 1ex}
div.nav A {font-size: 10pt; font-family: arial,sans-serif}
span.nav {font-size: 10pt; font-family: arial,sans-serif; font-weight:
bold}
div.nav A,span.big {font-size: 12pt; color: #0000cc}
div.nav A {font-size: 10pt; color: black}
A.l:link {color: #6f6f6f}
A.u:link {color: green}
//--></style>
<script><!--
var rc=403;
//-->
</script>
</head>
<body text=#000000 bgcolor=#ffffff>
<table border=0 cellpadding=2 cellspacing=0 width=100%><tr><td
rowspan=3 width=1% nowrap>
<b><font face=times color=#0039b6 size=10>G</font><font face=times
color=#c41200 size=10>o</font><font face=times color=#f3c518
size=10>o</font><font face=times color=#0039b6 size=10>g</font><font
face=times color=#30a72f size=10>l</font><font face=times color=#c41200
size=10>e</font>&nbsp;&nbsp;</b>
<td>&nbsp;</td></tr>
<tr><td bgcolor=#3366cc><font face=arial,sans-serif
color=#ffffff><b>Error</b></td></tr>
<tr><td>&nbsp;</td></tr></table>
<blockquote>
<H1>Forbidden</H1>
Your client does not have permission to get URL
<code>/translate_t?text='%20como%20estas'&amp;hl=en&amp;l angpair=es%7Cen&amp;tbb=1</code>
from this server.

<p>
</blockquote>
<table width=100% cellpadding=0 cellspacing=0><tr><td
bgcolor=#3366cc><img alt="" width=1 height=4></td></tr></table>
</body></html>

Does anyone know how I would get the bot to have permission to get the
url? When I put the url in on firefox it works fine. I noticed that in
the output html that google gave me it replaced some of the characters
in the url with different stuff like the "&amp" and "%7C", so I'm
thinking thats the problem, does anyone know how I would make it keep
the url as I intended it to be?

Dec 19 '06 #1
9 3103
Dr. Locke Z2A wrote:
Does anyone know how I would get the bot to have permission to get the
url? When I put the url in on firefox it works fine. I noticed that in
the output html that google gave me it replaced some of the characters
in the url with different stuff like the "&amp" and "%7C", so I'm
thinking thats the problem, does anyone know how I would make it keep
the url as I intended it to be?
Google doesnt like Python scripts. You will need to pretend to be a
browser by setting the user-agent string in the HTTP header.

Will McGugan
--
blog: http://www.willmcgugan.com
Dec 19 '06 #2
Will McGugan <wi**@willNOmcguganSPAM.comwrote:
Dr. Locke Z2A wrote:
>Does anyone know how I would get the bot to have permission to get the
url? When I put the url in on firefox it works fine. I noticed that in
the output html that google gave me it replaced some of the characters
in the url with different stuff like the "&amp" and "%7C", so I'm
thinking thats the problem, does anyone know how I would make it keep
the url as I intended it to be?

Google doesnt like Python scripts. You will need to pretend to be a
browser by setting the user-agent string in the HTTP header.
and possibly also run the risk of having your system blocked by Google if
they figure out you are lying to them?
Dec 19 '06 #3
Dr. Locke Z2A wrote:
<H1>Forbidden</H1>
Your client does not have permission to get URL
<code>/translate_t?text='%20como%20estas'&amp;hl=en&amp;l angpair=es%7Cen&amp;tbb=1</code>
from this server.
Does anyone know how I would get the bot to have permission to get the
url?
http://www.google.com/terms_of_service.html

"You may not send automated queries of any sort to Google's
system without express permission in advance from Google."

official API:s are available here:

http://code.google.com/

</F>

Dec 19 '06 #4
"Dr. Locke Z2A" <Do****@gmail.comwrites:
Does anyone know how I would get the bot to have permission to get the url?
That's what this was for:

http://code.google.com/apis/soapsearch/
Dec 19 '06 #5

Duncan Booth wrote:

Google doesnt like Python scripts. You will need to pretend to be a
browser by setting the user-agent string in the HTTP header.
and possibly also run the risk of having your system blocked by Google if
they figure out you are lying to them?
It is possible. I wrote a 'googlewhack' (remember them?) script a while
ago, which pretty much downloaded as many google pages as my adsl could
handle. And they didn't punish me for it. Although apparently they do
issue short term bans on IP's that abuse their service.

It is best to play nice of course. I would recommend using their
official APIs if possible!
Will McGugan
--
http://www.willmcgugan.com

Dec 19 '06 #6
I looked at those APIs and it would appear that SOAP isn't around
anymore and there are no APIs for google translate :( Can anyone tell
me how to set the user-agent string in the HTTP header?

Dec 20 '06 #7
On 19 Dec 2006 16:12:59 -0800, Dr. Locke Z2A <Do****@gmail.comwrote:
I looked at those APIs and it would appear that SOAP isn't around
anymore and there are no APIs for google translate :( Can anyone tell
me how to set the user-agent string in the HTTP header?
import urllib2
req = urllib2.Request('http://www.google.com')
# add 'some' user agent header
req.add_header('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.7.8) Gecko/20050524 Fedora/1.5 Firefox/1.5')
up = urllib2.urlopen(req)

cheers,
amit
--
----
Amit Khemka -- onyomo.com
Home Page: www.cse.iitd.ernet.in/~csd00377
Endless the world's turn, endless the sun's Spinning, Endless the quest;
I turn again, back to my own beginning, And here, find rest.
Dec 20 '06 #8
Google doesnt like Python scripts. You will need to pretend to be a
browser by setting the user-agent string in the HTTP header.
>
and possibly also run the risk of having your system blocked by Google if
they figure out you are lying to them?

It is possible. I wrote a 'googlewhack' (remember them?) script a while
ago, which pretty much downloaded as many google pages as my adsl could
handle. And they didn't punish me for it. Although apparently they do
issue short term bans on IP's that abuse their service.
For Google, that load must be piss in the ocean. I bet for Google to
even notice the abuse, it must be something really, really severe.

--
mvh Björn
Dec 20 '06 #9
BJörn Lindqvist wrote:
For Google, that load must be piss in the ocean. I bet for Google to
even notice the abuse, it must be something really, really severe.
like, say, business?

http://scripting.wordpress.com/2006/...#comment-25891

</F>

Dec 20 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Jane Austine | last post by:
As you add more items, say text lines, in Text widget, it gets too slow and almost impractical to use on. Take idle for example. If the text gets bigger(e.g. print...
1
by: Steve Allgood | last post by:
I'm having trouble posting a form at the USPS web site. I've been successful using urllib at other sites, but I'm missing why this won't work: # begin code # get zip+4 import urllib def...
7
by: Nainto | last post by:
Hi, I'm just wondering if there is any way to get the number of bytes, or the percentage, that have been uploaded/downloaded when uploading/downloading a file throught ftp in Python. I have...
5
by: defcon8 | last post by:
How do I get all the URL's in a page?
1
by: Ben Edwards | last post by:
Have been experimenting with HTTP stuff in python 2.4 and am having a problem getting debug info. If I use utllib.utlopen I get debug but if I user utllib2 I do not. Below is the probram and the...
1
by: Alexnb | last post by:
Okay, what I want to do with this code is to got to thesaurus.reference.com and then search for a word and get the syns for it. Now, I can get the syns, but they are still in html form and some are...
4
by: Stef Mientki | last post by:
hello, In a program I want to download (updated) files from google code (not the svn section). I could find a python script to upload files, but not for downloading. Anyone has a hint or a...
0
by: Ghirai | last post by:
On Saturday 16 August 2008 12:16:14 Fredrik Lundh wrote: Thanks, that seems to be getting me very close of what i need. -- Regards, Ghirai.
5
by: tedpottel | last post by:
Hi, My program reads as follows import urllib print "-------- Google Web Page --------" print urllib.urlopen('http://www.google.com//').read() print "-------- Google Search Web Page...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.