473,765 Members | 1,968 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

THE IMPORTANCE OF MAKING THE GOOGLE INDEX DOWNLOADABLE

THE IMPORTANCE OF MAKING THE GOOGLE INDEX DOWNLOADABLE

I write here to make a request on behalf of all the programmers on
earth who have been or are intending to use the Google web search API
for either research purposes or for the development of real world
applications, that Google make their indexes downloadable.

Currently application programmers using the Google web search API are
limited to 1000 queries a day. This on the one hand is a reasonable
decision by Google because; limiting the queries will prevent harm on
the Google system by unnecessary automated queries; but it is also
limiting us programmers severely. The query limit limits the usefulness
of whatever applications we decide to craft out and even limits our
imagination on what is possible with a handful of indexes.

Firstly, I will commend the Google Corporation for opening their
preciously crawled indexes. This is a great service to humanity and
especially to the band of programmers who are interested in
epistemology and are using the Google web search API to enable them
achieve their goals.

Google would be doing another great service for us if they would make
their indexes downloadable to programmers with a good interface for
programmaticall y accessing the indexes.

The advantages of the above approach would be:

1. Decentralizing the Google system.
2. Reducing the overhead of queries on Google from programmers.
3. Enabling programmers to craft out applications that run on their
local systems (only requiring internet connection when a web page is
needed since the links return on a query are the most important in the
result set) thus enabling them have unlimited number of queries should
these applications go public.
4. Give Google the competing edge in search engine technology and user
satisfaction by gaining programmer loyalty.
5. Encouraging the global adoption and use of the API + INDEXES
provided by Google.
6. Another good thing may be here for Google if they create mechanisms
in the downloaded INDEXES + API that enable programmers update the
indexes from the web. An agreement can now be made that Google will
have unlimited access to the indexes whenever the user's computer is
online and IDLE. So Google update its own indexes from the ones on
various programmers' local machines. Thereby building a truly
distributed global crawler. This can be achieved using grid
technologies thereby possibly cutting down the 300year range for
crawling the world's crawlable information.
Google may still enforce their terms of service by enforcing some kind
of authentication for the use of index already residing on the
programmer's local machine. Though it may not require that the
programmer be on the internet every time he/she wishes to access the
system; since the programmer may wish to tinker with the API and
indexes locally without requiring an internet connection. Online
authentication may be required anytime the user gets online. The
non-commerciability of the indexes must be emphasized through several
schemes.
The Google API can be a tool for epistemological engineers to craft
future Infowares (Information Applications). The most important thing
in the indexes is the links to resources that are returned on queries.

2 versions of the API + INDEXES can be made available.
1. The one without cached pages attached. So that on querying the API
on the local machine with the locally stored indexes, the results are
like those on the regular internet API result set.

2. The one with the cached pages. This one is optional as it will be
large in size.

If you people were good enough to release your API's publicly then
you would also consider this request.

It would be good if the API + INDEX download is accessible by
programmers who program in the following languages:
(a) Python
(b) Java
(c) Perl
(d) Ruby

Or some language independent mechanism can be formulated so programmers
in various languages can access the API + INDEX download.

Page Rank may or may not be included in the package depending on
decisions at Google.

It may also be closed source / open source / or partial source (part
open part closed).

This will be a great service to humanity and to programmers especially.

Thanks,

Ogah Ejini,

Nigeria, West Africa.

Mobile: +234 802 601 5061

Jun 7 '06 #1
2 2041
gen_tricomi wrote:
Currently application programmers using the Google web search API are
limited to 1000 queries a day. This on the one hand is a reasonable
decision by Google because; limiting the queries will prevent harm on
the Google system by unnecessary automated queries; but it is also
limiting us programmers severely. The query limit limits the usefulness
of whatever applications we decide to craft out and even limits our
imagination on what is possible with a handful of indexes.

If you know which sites you are interested in searching then you can
already license hardware from Google which will let you index up to 15
million documents and an api to search the indexed content without
restriction.

If you simply want to compete with Google on searching the entire internet
then they are unlikely to want to help you.

If you fall somewhere in between what can be done with a Google Search
Appliance and competing with Google then you are talking about paying
Google sufficient money that they ought to be interested in sitting round a
table with you. As it says on their website: "For larger deployments,
contact us and we’ll be happy to talk to you about building a custom search
solution for your environment."
Jun 7 '06 #2
gen_tricomi wrote:
THE IMPORTANCE OF MAKING THE GOOGLE INDEX DOWNLOADABLE

I write here to make a request on behalf of all the programmers on
earth who have been or are intending to use the Google web search API
for either research purposes or for the development of real world
applications, that Google make their indexes downloadable.

Frankly I doubt whether the average programmer possesses sufficient
storage or has access to sufficient bandwidth to make downloading the
Google indexes a practical proposition - let alone the cached page
contents too.

There's also the tiny factoid that Google might regard their index
structure, not to mention its contents, as proprietary.

Finally, how frequently would you propose to update your local copy?
Google is adding the results of new spidering to their indices all the time.

A nice idea, perhaps, but surely completely impractical.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Love me, love my blog http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Jun 7 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
5514
by: Mike D | last post by:
The code below gives an error. Object doesn't support this property or method: 'myMail.Importance' How do I set importance on an email? Everywhere I have looked it looks like my syntax is right but is it? Thanks Mike Set myMail = CreateObject("CDO.Message")
19
3857
by: Christian Hvid | last post by:
Hello groups. I have a series of applet computer games on my homepage: http://vredungmand.dk/games/erik-spillet/index.html http://vredungmand.dk/games/nohats/index.html http://vredungmand.dk/games/platfoot/index.html http://vredungmand.dk/games/minorbug/index.html http://vredungmand.dk/games/timbuktu/index.html http://vredungmand.dk/games/taleban/index.html
3
3551
by: Jason Kistler | last post by:
I am having some serious issues trying to set the "importance" of an ASP email. I am using the CDO.Message object. Here is the code: <% Dim recipients recipients = Request.Form("Jason") & Request.Form("Debbie") Dim Groups Groups = Request.Form("Sales") & Request.Form("FieldServices") & Request.Form("CSC") & Request.Form("SupplyChain")
9
2215
by: Xah Lee | last post by:
is the Microsoft javascript doc downloadable? http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/js56jsoriJScript.asp or, is there some other downloadable complete javascript ref for offline reading? Xah xah@xahlee.org ∑ http://xahlee.org/
19
6785
by: Christian Fowler | last post by:
I have a VERY LARGE pile of geographic data that I am importing into a database (db of choice is postgres, though may hop to oracle if necessary). The data is strictly hierarchical - each node has one, and only one parent. The depth should not exceed 6 or 7 levels. The initial import will have about 6 million leaves, and 3 million branches. I would expect the leaves to grow significantly, in number easily tripling. However, the branches will...
25
1940
by: Amarendra GODBOLE | last post by:
Hi, I am working on a legacy user space app, which has been developed entirely in C, some 15 years ago. Needless to say, it does not even partially conform to any standard. My team is in the process of adding new features to this app. As a start off, I asked them to write clean code, which conforms to certain aspects of the ANSI standard. I am no authority on the standard, and my knowledge is based on C Unleashed.
19
3151
by: elzacho | last post by:
It is my understanding that prototypes in C are purely optional. To my experience, until today I guess, this has been the case (other than eliminating compiler warning messages). However, today I wrote a routine to search an array for values greater than a certain limiting value (double) and something weird happened. When debuging I noticed that I passed the limiting value to the routine yet inside the function this value was garbage. ...
17
2540
by: =?Utf-8?B?Y2F0aGFyaW51cyB2YW4gZGVyIHdlcmY=?= | last post by:
Hello, I have build a website with approximately 30 html-pages. When I search this website in Google, I see the index.html or home.html on this website, but also other html-pages on this website. When I click in Google on one of these pages (not index.html or home.html), I am only linked to that one html-page and not to the website itself. Does anyone know how to fix this. Is there for example a metatag? Thanks
9
3597
by: maheswaran | last post by:
Hi all, I developed one application. From that application i created dynamic pages contact us , about us...(like joomla, but application is not in joomla)... These all are comes from database.In this i have one doubt, how google can indexing these pages from database or is google can able to indexing these pages. For easy, i posted this question into bytes-> forum and its stored in database. But if am googled then i can able to see this...
0
9568
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9399
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10161
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10007
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9833
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6649
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5275
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5421
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3924
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.