472,374 Members | 1,576 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,374 software developers and data experts.

Python Google Server

I've hacked together a 'GoogleCacheServer'. It is based on
SimpleHTTPServer. Run the following script (hopefully google groups
won't mangle the indentation) and set your browser proxy settings to
'localhost:8000'. It will let you browse the internet using google's
cache. Obviously you'll miss images, javascript, css files, etc.

See the world as google sees it !

(This is actually an 'inventive' short term measure to get round a
restrictive internet policy at work :-) I'll probably put it in the
Python Cookbook as it's quite fun (so if line lengths or indentation is
mangled here, try there). Tested on Windows XP, with Python 2.3 and IE.

# Copyright Michael Foord, 2004 & 2005.
# Released subject to the BSD License
# Please see http://www.voidspace.org.uk/documents/BSD-LICENSE.txt

# For information about bugfixes, updates and support, please join the
Pythonutils mailing list.
# http://voidspace.org.uk/mailman/list...idspace.org.uk
# Comments, suggestions and bug reports welcome.
# Scripts maintained at http://www.voidspace.org.uk/python/index.shtml
# E-mail fu******@voidspace.org.uk

import google
import BaseHTTPServer
import shutil
from StringIO import StringIO
import urlparse

__version__ = '0.1.0'
"""
This is a simple implementation of a server that fetches web pages
from the google cache.

It lets you explore the internet from your browser, using the google
cache.

Run this script and then set your browser proxy settings to
localhost:8000

Needs google.py (and a google license key).
See http://pygoogle.sourceforge.net/
and http://www.google.com/apis/
"""

cached_types = ['txt', 'html', 'htm', 'shtml', 'shtm', 'cgi', 'pl',
'py']
google.setLicense(google.getLicense())
googlemarker = '''<i>Google is not affiliated with the authors of this
page nor responsible for its
content.</i></font></center></td></tr></table></td></tr></table>\n<hr>\n'''
markerlen = len(googlemarker)

class googleCacheHandler(BaseHTTPServer.BaseHTTPRequestH andler):
server_version = "googleCache/" + __version__
cached_types = cached_types
googlemarker = googlemarker
markerlen = markerlen

def do_GET(self):
f = self.send_head()
if f:
self.copyfile(f, self.wfile)
f.close()

def send_head(self):
"""Common code for GET and HEAD commands.

This sends the response code and MIME headers.

Return value is either a file object (which has to be copied
to the outputfile by the caller unless the command was HEAD,
and must be closed by the caller under all circumstances), or
None, in which case the caller has nothing further to do.

"""
print self.path
url = urlparse.urlparse(self.path)[2]
dotloc = url.find('.') + 1
if dotloc and url[dotloc:] not in self.cached_types:
return None # not a cached type - don't even try

thepage = google.doGetCachedPage(self.path)
headerpos = thepage.find(self.googlemarker)
if headerpos != -1: # remove the google header
pos = self.markerlen + headerpos
thepage = thepage[pos:]

f = StringIO(thepage)

self.send_response(200)
self.send_header("Content-type", 'text/html')
self.send_header("Content-Length", str(len(thepage)))
self.end_headers()
return f

def copyfile(self, source, outputfile):
shutil.copyfileobj(source, outputfile)
def test(HandlerClass = googleCacheHandler,
ServerClass = BaseHTTPServer.HTTPServer):
BaseHTTPServer.test(HandlerClass, ServerClass)
if __name__ == '__main__':
test()

Jul 18 '05 #1
13 2795
fu******@gmail.com wrote:

lol ,cool hack!! make a slashdot article about it!!
I've hacked together a 'GoogleCacheServer'. It is based on
SimpleHTTPServer. Run the following script (hopefully google groups
won't mangle the indentation) and set your browser proxy settings to
'localhost:8000'. It will let you browse the internet using google's
cache. Obviously you'll miss images, javascript, css files, etc.

See the world as google sees it !

(This is actually an 'inventive' short term measure to get round a
restrictive internet policy at work :-) I'll probably put it in the
Python Cookbook as it's quite fun (so if line lengths or indentation is
mangled here, try there). Tested on Windows XP, with Python 2.3 and IE.

# Copyright Michael Foord, 2004 & 2005.
# Released subject to the BSD License
# Please see http://www.voidspace.org.uk/documents/BSD-LICENSE.txt

# For information about bugfixes, updates and support, please join the
Pythonutils mailing list.
# http://voidspace.org.uk/mailman/list...idspace.org.uk
# Comments, suggestions and bug reports welcome.
# Scripts maintained at http://www.voidspace.org.uk/python/index.shtml
# E-mail fu******@voidspace.org.uk

import google
import BaseHTTPServer
import shutil
from StringIO import StringIO
import urlparse

__version__ = '0.1.0'
"""
This is a simple implementation of a server that fetches web pages
from the google cache.

It lets you explore the internet from your browser, using the google
cache.

Run this script and then set your browser proxy settings to
localhost:8000

Needs google.py (and a google license key).
See http://pygoogle.sourceforge.net/
and http://www.google.com/apis/
"""

cached_types = ['txt', 'html', 'htm', 'shtml', 'shtm', 'cgi', 'pl',
'py']
google.setLicense(google.getLicense())
googlemarker = '''<i>Google is not affiliated with the authors of this
page nor responsible for its
content.</i></font></center></td></tr></table></td></tr></table>\n<hr>\n''' markerlen = len(googlemarker)

class googleCacheHandler(BaseHTTPServer.BaseHTTPRequestH andler):
server_version = "googleCache/" + __version__
cached_types = cached_types
googlemarker = googlemarker
markerlen = markerlen

def do_GET(self):
f = self.send_head()
if f:
self.copyfile(f, self.wfile)
f.close()

def send_head(self):
"""Common code for GET and HEAD commands.

This sends the response code and MIME headers.

Return value is either a file object (which has to be copied
to the outputfile by the caller unless the command was HEAD,
and must be closed by the caller under all circumstances), or
None, in which case the caller has nothing further to do.

"""
print self.path
url = urlparse.urlparse(self.path)[2]
dotloc = url.find('.') + 1
if dotloc and url[dotloc:] not in self.cached_types:
return None # not a cached type - don't even try

thepage = google.doGetCachedPage(self.path)
headerpos = thepage.find(self.googlemarker)
if headerpos != -1: # remove the google header
pos = self.markerlen + headerpos
thepage = thepage[pos:]

f = StringIO(thepage)

self.send_response(200)
self.send_header("Content-type", 'text/html')
self.send_header("Content-Length", str(len(thepage)))
self.end_headers()
return f

def copyfile(self, source, outputfile):
shutil.copyfileobj(source, outputfile)
def test(HandlerClass = googleCacheHandler,
ServerClass = BaseHTTPServer.HTTPServer):
BaseHTTPServer.test(HandlerClass, ServerClass)
if __name__ == '__main__':
test()

Jul 18 '05 #2
it works on opera and firefox on linux, but you cant search in the cached
google! it would be more usefull if you could somehow search "only" in the
cache instead of putting the straight link. maybe you could put a magic url
to search in the cache, like search:"search terms"

fu******@gmail.com wrote:
I've hacked together a 'GoogleCacheServer'. It is based on
SimpleHTTPServer. Run the following script (hopefully google groups
won't mangle the indentation) and set your browser proxy settings to
'localhost:8000'. It will let you browse the internet using google's
cache. Obviously you'll miss images, javascript, css files, etc.

See the world as google sees it !

(This is actually an 'inventive' short term measure to get round a
restrictive internet policy at work :-) I'll probably put it in the
Python Cookbook as it's quite fun (so if line lengths or indentation is
mangled here, try there). Tested on Windows XP, with Python 2.3 and IE.

# Copyright Michael Foord, 2004 & 2005.
# Released subject to the BSD License
# Please see http://www.voidspace.org.uk/documents/BSD-LICENSE.txt

# For information about bugfixes, updates and support, please join the
Pythonutils mailing list.
# http://voidspace.org.uk/mailman/list...idspace.org.uk
# Comments, suggestions and bug reports welcome.
# Scripts maintained at http://www.voidspace.org.uk/python/index.shtml
# E-mail fu******@voidspace.org.uk

import google
import BaseHTTPServer
import shutil
from StringIO import StringIO
import urlparse

__version__ = '0.1.0'
"""
This is a simple implementation of a server that fetches web pages
from the google cache.

It lets you explore the internet from your browser, using the google
cache.

Run this script and then set your browser proxy settings to
localhost:8000

Needs google.py (and a google license key).
See http://pygoogle.sourceforge.net/
and http://www.google.com/apis/
"""

cached_types = ['txt', 'html', 'htm', 'shtml', 'shtm', 'cgi', 'pl',
'py']
google.setLicense(google.getLicense())
googlemarker = '''<i>Google is not affiliated with the authors of this
page nor responsible for its
content.</i></font></center></td></tr></table></td></tr></table>\n<hr>\n''' markerlen = len(googlemarker)

class googleCacheHandler(BaseHTTPServer.BaseHTTPRequestH andler):
server_version = "googleCache/" + __version__
cached_types = cached_types
googlemarker = googlemarker
markerlen = markerlen

def do_GET(self):
f = self.send_head()
if f:
self.copyfile(f, self.wfile)
f.close()

def send_head(self):
"""Common code for GET and HEAD commands.

This sends the response code and MIME headers.

Return value is either a file object (which has to be copied
to the outputfile by the caller unless the command was HEAD,
and must be closed by the caller under all circumstances), or
None, in which case the caller has nothing further to do.

"""
print self.path
url = urlparse.urlparse(self.path)[2]
dotloc = url.find('.') + 1
if dotloc and url[dotloc:] not in self.cached_types:
return None # not a cached type - don't even try

thepage = google.doGetCachedPage(self.path)
headerpos = thepage.find(self.googlemarker)
if headerpos != -1: # remove the google header
pos = self.markerlen + headerpos
thepage = thepage[pos:]

f = StringIO(thepage)

self.send_response(200)
self.send_header("Content-type", 'text/html')
self.send_header("Content-Length", str(len(thepage)))
self.end_headers()
return f

def copyfile(self, source, outputfile):
shutil.copyfileobj(source, outputfile)
def test(HandlerClass = googleCacheHandler,
ServerClass = BaseHTTPServer.HTTPServer):
BaseHTTPServer.test(HandlerClass, ServerClass)
if __name__ == '__main__':
test()

Jul 18 '05 #3

vegetax wrote:
it works on opera and firefox on linux, but you cant search in the cached google! it would be more usefull if you could somehow search "only" in the cache instead of putting the straight link. maybe you could put a magic url to search in the cache, like search:"search terms"

Thanks for the report. I've also tried it with firefox on windows.

Yeah - google search results aren't cached !! Perhaps anything in a
google domain ought to pass straight through. That could be done by
testing the domain and using urllib2 to fetch the page.

Have just tested the following which works.

Add the follwoing two lines to the start of the code :

import urllib2
txheaders = { 'User-agent' : 'Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)' }

Then change the start of the send_head method to this :

def send_head(self):
"""Only GET implemented for this.
This sends the response code and MIME headers.
Return value is a file object, or None.
"""
print 'Request :', self.path # traceback to sys.stdout
url_tuple = urlparse.urlparse(self.path)
url = url_tuple[2]
domain = url_tuple[1]
if domain.find('.google.') != -1: # bypass the cache for
google domains
req = urllib2.Request(self.path, None, txheaders)
return urllib2.urlopen(req)

fu******@gmail.com wrote:
I've hacked together a 'GoogleCacheServer'. It is based on
SimpleHTTPServer. Run the following script (hopefully google groups
won't mangle the indentation) and set your browser proxy settings to 'localhost:8000'. It will let you browse the internet using google's cache. Obviously you'll miss images, javascript, css files, etc.

See the world as google sees it !

[snip..]

Jul 18 '05 #4
Another change - change the line `dotloc = url.find('.') + 1` to
`dotloc = url.rfind('.') + 1`

This makes it find the last '.' in the url

Best Regards,

Fuzzy
http://www.voidspace.org.uk/python

Jul 18 '05 #5
Fuzzyman wrote:

Add the follwoing two lines to the start of the code :

import urllib2
txheaders = { 'User-agent' : 'Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)' }

Then change the start of the send_head method to this :

def send_head(self):
"""Only GET implemented for this.
This sends the response code and MIME headers.
Return value is a file object, or None.
"""
print 'Request :', self.path # traceback to sys.stdout
url_tuple = urlparse.urlparse(self.path)
url = url_tuple[2]
domain = url_tuple[1]
if domain.find('.google.') != -1: # bypass the cache for
google domains
req = urllib2.Request(self.path, None, txheaders)
return urllib2.urlopen(req)

Doesnt work,the browsers keeps asking me to save the page.

this one works =)
def send_head(self): print 'Request :', self.path #| traceback| to| sys.stdout
url_tuple = urlparse.urlparse(self.path)
url = url_tuple[2]
domain = url_tuple[1]
if domain.find('.google.') != -1: # bypass the cache for google domains
req = urllib2.Request(self.path, None, txheaders)
self.send_response(200)
self.send_header("Content-type", 'text/html')
self.end_headers()
return urllib2.urlopen(req) dotloc = url.rfind('.') + 1


Jul 18 '05 #6
Of course - sorry. Thanks for the fix. Out of interest - why are you
using this... just for curiosity, or is it helpful ?

Regards,
Fuzzy
http://www.voidspace.org.uk/python

Jul 18 '05 #7
fu******@gmail.com writes:
(This is actually an 'inventive' short term measure to get round a
restrictive internet policy at work :-)


If that means what I think, you're better off setting up a
url-rewriting proxy server on some other machine, that uses SSL on the
browser side. There's one written in perl at:

http://www.jmarshall.com/tools/cgiproxy/

Presumably you're surfing through some oppressive firewall, and the
SSL going into the proxy prevents the firewall from logging all the
destination URL's going past it (and the content too, for that matter).
Jul 18 '05 #8
Note - there are a couple of *minor* chanegs to this. See the online
python cookbok, the thread on comp.lang.python or
http://www.voidspace.org.uk/python/weblog/index.shtml

Jul 18 '05 #9
The difficulty is 'on some other machine'... there's a fantastic python
CGI proxy called approx -
http://www.voidspace.org.uk/python/cgi.shtml#approx

The trouble is the current policy is 'whitelist only'... so I need the
proxy installed on a server that is *on the whitelist*... which will
take a little time to arrange.

Best Regards,

Fuzzy
http://www.voidspace.org.uk/python

Jul 18 '05 #10
Fuzzyman wrote:
Of course - sorry. Thanks for the fix. Out of interest - why are you
using this... just for curiosity, or is it helpful ?


because is fun to surf on the google cache, =)
Jul 18 '05 #11

vegetax wrote:
Fuzzyman wrote:
Of course - sorry. Thanks for the fix. Out of interest - why are you using this... just for curiosity, or is it helpful ?


because is fun to surf on the google cache, =)


Ha - cool ! The bizarre thing is, that for me it's actually useful. I
doubt anyone else will be in the same situation though.

Best Regards,

Fuzzy
http://www.voidspace.org.uk/python

Jul 18 '05 #12
Fuzzyman wrote:
The trouble is the current policy is 'whitelist only'... so I need the
proxy installed on a server that is *on the whitelist*... which will
take a little time to arrange.


If you construct a noop translation (English to English for example)
Google becomes a (HTML only) proxy. Here's an example:

http://google.com/translate_c?langpa...://python.org/
--
Benji York
Jul 19 '05 #13
Thanks Benji,

It returns the results using an ip address - not the google domain.
This means IPCop bans it :-(

Thanks for the suggestion though. In actual fact the googleCacheServer
works quite well.

Best Regards,

Fuzzy
http://www.voidspace.org.uk/python/weblog

Jul 19 '05 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Andre Bernemann | last post by:
Hi, I have written a Python COM Server with the help of Mark Hammond's Book Programming Python on Win32. I used the CreateObjects method in VB to access the COM Methods and all worked fine....
0
by: Phillip J. Eby | last post by:
PEP: 333 Title: Python Web Server Gateway Interface v1.0 Version: $Revision: 1.1 $ Last-Modified: $Date: 2004/08/27 17:30:09 $ Author: Phillip J. Eby <pje at telecommunity.com> Discussions-To:...
1
by: Paul Keating | last post by:
I have written a Python COM server, which works fine, but VB/C# users expect to see the server in a drop-down list of objects, and they don't. I suspect that this is because there isn't a type...
6
by: Gary Kshepitzki | last post by:
Hello I am trying to send an event from a Python COM server to a VB (or VB.NET) COM client. I am a newbie both in VB and in python. Can anyone give me a simple (but complete) code example both of...
1
by: Leo Jay | last post by:
dear all, i have a python com server like this: import win32com.server.register class HelloWorld: _reg_clsid_ = "{B0EB5AAB-0465-4D54-9CF9-04ADF7F73E4E}" _reg_desc_ = 'Python...
4
by: m.errami | last post by:
Hello all. I am desperately in need for you help guys. Here is the story: 1- I have created a small simple COM serve with python (along the lines read in Win32 Programming with python). 2- If I...
9
by: eric | last post by:
Hi all, I want to setup simple python web server and I want it to just unzip and run, without any installation steps (have no right to do it). I've tried to write by myself, however, I find I...
1
by: Giampaolo Rodola' | last post by:
Hi, I'm pleased to announce release 0.3.0 of Python FTP Server library (pyftpdlib). http://code.google.com/p/pyftpdlib/ === About === Python FTP server library provides an high-level...
0
by: Giampaolo Rodola' | last post by:
Hi, I'm pleased to announce release 0.5.0 of Python FTP Server library (pyftpdlib). http://code.google.com/p/pyftpdlib/ === About === Python FTP server library provides an high-level...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
2
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...
1
by: Johno34 | last post by:
I have this click event on my form. It speaks to a Datasheet Subform Private Sub Command260_Click() Dim r As DAO.Recordset Set r = Form_frmABCD.Form.RecordsetClone r.MoveFirst Do If...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...
0
DizelArs
by: DizelArs | last post by:
Hi all) Faced with a problem, element.click() event doesn't work in Safari browser. Tried various tricks like emulating touch event through a function: let clickEvent = new Event('click', {...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.