473,395 Members | 1,680 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Makin search on the other site and getting data and writing in xml

Hi
is it possible to make search on for example on google without api with
a list of words
1- there is word list
2- the script will take the words from the list by turn
3-it iwll make the search
4-will get results
5-will write the results as xml file.

i dont mean only google, for other sites aswell

I hope we get a result

Sep 25 '06 #1
25 1787
<al**********@gmail.comwrote:
is it possible to make search on for example on google without api with
a list of words
1- there is word list
2- the script will take the words from the list by turn
3-it iwll make the search
4-will get results
5-will write the results as xml file.
http://www.google.com/terms_of_service.html

"You may not send automated queries of any sort to Google's system without express
permission in advance from Google."

</F>

Sep 25 '06 #2

I dont mean only google, also other sites aswell

Sep 25 '06 #3

al**********@gmail.com wrote:
I dont mean only google, also other sites aswell
Google expressly forbids doing any form of automated search outside of
their api. If you want to write a script that will run Google searches,
you have to use the api to do so. As far as I know most of the other
search sites have the same requirement.

Yes, it is possible to query a bunch of search sites and dump the
results into an xml file. It is not even all that hard. In fact, I bet
running a search on the relevant terms will probably produce something
that almost does what you want.

-Adam

Sep 25 '06 #4
Thank you very much for your explications. I dont mean a search engine.
for example a dictionary site for searching words.

Sep 25 '06 #5
For example i give you an example about making search on one of the
site and get the result.

# #!/usr/bin/python
# # -*- coding: windows-1254; -*-
#
# import urllib
# dictionary = {} # wow, it's actually a dictionary
# words = ['apple', 'banana', 'cheese']
# for word in words:
# dictionary[word] =
urllib.urlopen("http://www.example.com/look.php?w=" + word).read()
#
# print dictionary

i dont know how i can get the words from a txt file for searching by
turn

Sep 26 '06 #6

And also writing the result as a html or xml file

Sep 26 '06 #7
On Mon, 25 Sep 2006 13:51:55 +0200, Fredrik Lundh wrote:
http://www.google.com/terms_of_service.html

"You may not send automated queries of any sort to Google's system without express
permission in advance from Google."
I'm not just being a pedantic weasel here, but what's an automated query?
Google's ToS is a legal document (maybe), and if both parties don't agree
on the meanings of terms, well, then it is a lousy legal document and a
recipe for trouble.

Google don't define "automated query"it, and I don't think they can. In
fact, the closest they come to defining it is to list three things they
want to prevent, NONE of which have anything to do with the distinction
between automated and non-automated.

(What on earth is "meta-searching"? If you're going to use terms which
don't have a commonly understood meaning, define what they mean.)

If I want to search for "foo", and I type "foo" into the Firefox search
box, is that an automated query?

What if I type "gg: foo" into Konqueror's address bar, which expands to
"http://www.google.com/search?q=foo"? Is it okay if I type the URL by hand
myself?

Can I use the browser to save the search page to a local HTML file? If
Google says no, how can they possibly hope to stop me?

What if I type this command into my shell?

elinks --dump "http://www.google.com/search?q=foo" output.html

What if I type

wget "http://www.google.com/search?q=foo"

into the shell? Surely that's no more automated than typing "foo"
into Google's search box. (wget doesn't in fact work, as Google recognises
its user-agent string and blocks it, EVEN in cases where I am using wget
manually. What, can't Google themselves tell the difference between
automatic and non-automatic searching?)

Where is the line I must not cross?

The thing is, Google doesn't want people "reselling" their services, and I
respect Google's intention. But trying to draw a distinction between
"automated" and "non-automated" requests is difficult if not impossible,
as can be seen by the heavy-handed way Google blocks the manual use of
wget. I don't condone the gross abuse of Google's service, but I don't
think an artificial distinction between automated and non-automated is a
useful way to go about it.

Of course, what I think isn't important. If Google wants to write legal
contracts that won't stand up in court (speaking as somebody who isn't a
lawyer and whose legal advice is worthless), they can. But the point is, I
see no ethical nor legal reason why a user can't create a script which is
called MANUALLY by the user and does what a browser does, namely send and
receive data from websites (which may or may not include Google).

And that, it seems to me, is what the Original Poster wanted.

--
Steven D'Aprano

Sep 26 '06 #8
al**********@gmail.com wrote:
i dont know how i can get the words from a txt file for searching by
turn
checking the "reading and writing files" section in the tutorial might
be somewhat helpful:

http://docs.python.org/tut/node9.htm...00000000000000

</F>

Sep 26 '06 #9
Steven D'Aprano wrote:
On Mon, 25 Sep 2006 13:51:55 +0200, Fredrik Lundh wrote:

> http://www.google.com/terms_of_service.html

"You may not send automated queries of any sort to Google's system without express
permission in advance from Google."


I'm not just being a pedantic weasel here, but what's an automated query?
Google's ToS is a legal document (maybe), and if both parties don't agree
on the meanings of terms, well, then it is a lousy legal document and a
recipe for trouble.

Google don't define "automated query"it, and I don't think they can. In
fact, the closest they come to defining it is to list three things they
want to prevent, NONE of which have anything to do with the distinction
between automated and non-automated.
The fact remains that Google can chop your searching ability off at the
knees if *they* determine that you have broken the terms of service, so
whether you agree or not becomes slightly academic.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 26 '06 #10
Steven D'Aprano wrote:
Google don't define "automated query"it, and I don't think they can.
the phrases they use are well understood in the SE business. that's
good enough for everyone involved (including courts; see below).
(What on earth is "meta-searching"? If you're going to use terms which
don't have a commonly understood meaning, define what they mean.)
http://en.wikipedia.org/wiki/Metasearch_engine
If I want to search for "foo", and I type "foo" into the Firefox search
box, is that an automated query?
nope. unless you're a robot.
What if I type "gg: foo" into Konqueror's address bar, which expands to
"http://www.google.com/search?q=foo"? Is it okay if I type the URL by hand
myself?
nope. unless you're a robot.
Can I use the browser to save the search page to a local HTML file? If
Google says no, how can they possibly hope to stop me?
what you do with the search results once you've gotten them is outside
the scope of that clause.
What if I type this command into my shell?

elinks --dump "http://www.google.com/search?q=foo" output.html

What if I type

wget "http://www.google.com/search?q=foo"

into the shell? Surely that's no more automated than typing "foo"
into Google's search box.
neither is automated, unless you're a robot.
Where is the line I must not cross?
letting a program generate search requests based on something other than
"human wants to find something and types some keywords into a prompt
somewhere".
And that, it seems to me, is what the Original Poster wanted.
the OP wanted to read keywords from a text file generated in some
unknown fashion. that's bot behaviour, not human behaviour.
Of course, what I think isn't important. If Google wants to write legal
contracts that won't stand up in court (speaking as somebody who isn't a
lawyer and whose legal advice is worthless)
well, "here's some random guy who didn't understand the terms used in
the contract" isn't a valid defense in court; courts are more interested
in whether people with experience from the relevant field can reasonably
be expected to understand the contract. but this isn't about court
cases, of course; it's about getting banned by Google for abusing their
services.

</F>

Sep 26 '06 #11
GOOGLE IS NOT OUR SUBJECT ANY MORE.

MY GOAL IS NOT MAKING SEARCH ON GOOGLE:
MY GOAL IS MAKING A SEARCH ON
www.onelook.com, for example

Sep 26 '06 #12
al**********@gmail.com wrote:
GOOGLE IS NOT OUR SUBJECT ANY MORE.

MY GOAL IS NOT MAKING SEARCH ON GOOGLE:
MY GOAL IS MAKING A SEARCH ON
www.onelook.com, for example

"""
Can you send me the list of words in the index? May I extract it from your
site?
No, sorry. If you're thinking about writing a script to systematically copy
OneLook.com's word list, please don't. It's not yours to copy, for one
thing. But also, it wastes tremendous bandwidth and slows things down for
other users. We have software in place to detect the abuse of our service
and we'll alert your ISP if you violate our trust in you. If you're looking
for a decent-sized downloadable word list, try WordNet, which offers that
and much more. If you're working on a project for school or academic
research, let us know and we might be able to help steer you in the right
direction.
"""

Consider this: if you'd offered the courtesy of a occasional lemonade for
you neighbours, does that mean that you like them stomping around in your
kitchen?

Nearly all of sites that offer a service like this will have policies of
that kind. So - get a grip, stop shouting, and start thinking if what you
are trying to do is legal or social. If not, and you don't care - be my
guest, but don't ask for help here!

Diez
Sep 26 '06 #13
al**********@gmail.com wrote:
GOOGLE IS NOT OUR SUBJECT ANY MORE.

MY GOAL IS NOT MAKING SEARCH ON GOOGLE:
MY GOAL IS MAKING A SEARCH ON
www.onelook.com, for example
this is usenet; you don't "own" the threads you start. if there's a
subthread that you don't find relevant to your original question, just
ignore it.

</F>

Sep 26 '06 #14
I dont mean google
i dont mean onelook.com

these are only examples

i hop eyou understand what i mean

Sep 26 '06 #15
al**********@gmail.com wrote:
I dont mean google
i dont mean onelook.com

these are only examples

i hop eyou understand what i mean
Apparently, *you* don't understand what they're trying to tell you. It
roughly boils down to the following:

- All (except perhaps the most trivial small) sites disallow in their
Terms of Service the unregulated harvesting of their content by
webbots, both for legal and technical reasons. It's not just Google or
Onelook that does this.
- Yes, it is technically possible to attempt to violate their ToS,
running their risk to be caught (with whatever consequences this
implies).
- Yes, you *might* be able to get away with it (at least for some time)
running in stealth mode.
- No, people here are not willing to help you go down this road, you're
on your own.

Hope this helps,
George

Sep 27 '06 #16
In message <pa****************************@REMOVEME.cybersour ce.com.au>,
Steven D'Aprano wrote:
If Google wants to write legal
contracts that won't stand up in court (speaking as somebody who isn't a
lawyer and whose legal advice is worthless), they can.
What they define as their terms of service doesn't have to stand up in
court. They're not a public service, after all. If you do something that
they don't like, they are free to try to block you from their servers, they
don't need to appeal to any other authority.

wget --user-agent="I'm not Microsoft Internet Explorer, I'm Wget" -O - \
http://www.google.co.nz/search\?q=test
Sep 27 '06 #17
In message <ma**************************************@python.o rg>, Steve
Holden wrote:
The fact remains that Google can chop your searching ability off at the
knees ...
No they can't. They can only chop off your ability to use Google.

Sep 27 '06 #18
Lawrence D'Oliveiro wrote:
In message <ma**************************************@python.o rg>, Steve
Holden wrote:

>>The fact remains that Google can chop your searching ability off at the
knees ...


No they can't. They can only chop off your ability to use Google.
[sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 27 '06 #19
Steve Holden <st***@holdenweb.comwrites:
Lawrence D'Oliveiro wrote:
Steve Holden wrote:
>The fact remains that Google can chop your searching ability off
at the knees ...
No they can't. They can only chop off your ability to use Google.
[sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you.
Seems like a fairly important distinction. Google has the power to
"chop your searching ability off at the knees" only to the extent that
you grant them that power.

--
\ "[...] a Microsoft Certified System Engineer is to information |
`\ technology as a McDonalds Certified Food Specialist is to the |
_o__) culinary arts." -- Michael Bacarella |
Ben Finney

Sep 27 '06 #20
In message <ma**************************************@python.o rg>, Ben Finney
wrote:
Steve Holden <st***@holdenweb.comwrites:
>Lawrence D'Oliveiro wrote:
Steve Holden wrote:
The fact remains that Google can chop your searching ability off
at the knees ...
No they can't. They can only chop off your ability to use Google.
[sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you.

Seems like a fairly important distinction. Google has the power to
"chop your searching ability off at the knees" only to the extent that
you grant them that power.
Saying "search" when you mean "Google" is like saying "using a PC" when you
mean "using Microsoft Windows".
Sep 27 '06 #21
Lawrence D'Oliveiro wrote:
In message <ma**************************************@python.o rg>, Ben Finney
wrote:

>>Steve Holden <st***@holdenweb.comwrites:

>>>Lawrence D'Oliveiro wrote:

Steve Holden wrote:

>The fact remains that Google can chop your searching ability off
>at the knees ...

No they can't. They can only chop off your ability to use Google.
[sigh]. Right, Lawrence, sorry I wasn't quite explicit enough for you.

Seems like a fairly important distinction. Google has the power to
"chop your searching ability off at the knees" only to the extent that
you grant them that power.


Saying "search" when you mean "Google" is like saying "using a PC" when you
mean "using Microsoft Windows".
Well, I thought it was self-evident that since I was referring to Google
I wasn't talking about Alta Vista searching. If I said "Microsoft have
the ability to terminate your license" presumably you'd chastise me by
pointing out that they wouldn't be able to revoke my *Linux* license.
Whatever.

"There's none as thick as them that wants to be."

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 27 '06 #22
ok i close this discussion
i understand everybody no problem

Sep 27 '06 #23
al**********@gmail.com wrote:
ok i close this discussion
No, you don't.

Stefan
Sep 27 '06 #24
George Sakkis wrote:
al**********@gmail.com wrote:
I dont mean google
i dont mean onelook.com

these are only examples

i hop eyou understand what i mean

Apparently, *you* don't understand what they're trying to tell you. It
roughly boils down to the following:
If we just step back from the brink for a moment and give the
questioner the benefit of the doubt - that the exercise merely involves
automating some kind of interactions that would otherwise require lots
of manual messing around piloting a browser, rather than performing
some kind of bulk "suck down" of an entire site's information - then it
is obviously possible to use the following techniques:

* Use a well-known mirroring or archiving tool such as wget.
* Use various testing tools, some of which are written in Python.
* Use urllib, urllib2 or httplib plus an HTML or XML parser in your
own program.
* Automate a Web browser using some off-the-shelf program.
* Use various automation mechanisms provided by your environment
(eg. COM, DCOP), possibly with Python libraries (eg. PAMIE [1],
KPart Plugins [2]).

Various sites forbid wget and friends as a rule, understandably, but
there are sometimes reasons why you might want to use various tools to
automate a procedure involving lots of data which would waste a huge
amount of time if done manually. Perhaps you might have mail residing
in a Webmail system which can't be extracted via any process other than
reading all the messages in a browser, for example, or perhaps your
favourite Internet applications don't provide decent shortcuts to the
information you need, instead believing that it's all about the
"experience": surfing around watching all the animated adverts.
Automation and related technologies can legitimately help users regain
control of their Internet-resident data and make better use of the
services around it.

Paul

[1] http://pamie.sourceforge.net/
[2] http://www.boddie.org.uk/python/kpartplugins.html

Sep 27 '06 #25
In message <11**********************@e3g2000cwe.googlegroups. com>, Paul
Boddie wrote:
Various sites forbid wget and friends as a rule, understandably ...
No, that is not understandable.

Oct 6 '06 #26

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Amir Davoodi | last post by:
Dear Friends, I've got a web application using ASP, 65001 codepage and Microsft Access As DataStore. I have imported some data from FOX(DOS) into access and when I try to search on this data...
0
by: aa | last post by:
how to change provider for all site management data in ASP2.0 I see just one aspnetsql provider, but i cant add new
2
by: John | last post by:
Does anyone know how to how to performance case-insensitive search on XML data type in SQLServer 2005? Or I have to convert all the xml data to lower case before I store it? Thanks in advance....
4
by: gwtc | last post by:
Here is a google search site bookmarklet. This lets you search a certain website using google. What I want is the same thing, but to search a certain geocities site. When you use the current...
0
by: MatchSQL | last post by:
http://www.MatchSQL.com Match your Design site's data schema and SQL statement to Application site between MSSQL 2000 , SQLServer 2005 , SQL Server 2005 EXPRESS
1
slapshock
by: slapshock | last post by:
hi....can u help me with my problem??? i need the codes on how add, delete, search and view data from listview in vb.net... hope you can help me... i am new in .NET...and i dont know how to do...
9
by: nickyeng | last post by:
Hi My case is i get the error on runtime while getting data from a file. i get this error: Record of 12344 is found! 5 testing 4272 _cygtls::handle_exceptions: Error while dumping...
6
by: bushwacker | last post by:
Hello all, I'm a chemical engineering student. our teacher has given us a project to do some calculations based on some equations. those equations include constants, which are to be read from a...
7
by: =?Utf-8?B?SklNLkgu?= | last post by:
How to get search engines crawl data I have a web application that uses user controls and pulls data directly from database and shows it to users in the internet. So there is not html that has the...
5
by: abhi3211 | last post by:
i am using java inside java script page. in that page i want to use two dropdown list. in first dropdown list i am getting data from ms-access database. in second dropdown list i want to get data...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.