473,395 Members | 1,473 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

internet searching program

Is it possible to make a program to search a site on the internet,
then get certain information from the web pages that match and display
them? Like, you would put in keywords to be searched for on
youtube.com, then it would search youtube.com, get the names of the
videos, the links, and the embed information? Or something like that.
Aug 9 '08 #1
11 6425
On Fri, 08 Aug 2008 19:59:02 -0700, KillSwitch wrote:
Is it possible to make a program to search a site on the internet, then
get certain information from the web pages that match and display them?
Like, you would put in keywords to be searched for on youtube.com, then
it would search youtube.com, get the names of the videos, the links, and
the embed information? Or something like that.
Search the Internet? Hmmm... I'm not sure, but I think Google does
something quite like that, but I don't know if they do it with a computer
program or an army of trained monkeys.

--
Steven
Aug 9 '08 #2
No, I mean to search the internet really fast and display only REALLY
SPECIFIC information about certain web pages. Like, stuff Google
wouldn't have. For instance, in the youtube example, it would name the
names of the videos, the url's of them, and the EMBED information
without you having to view the site. And BTW, I am extremely new to
Python. I am going to buy a few books soon, and I can really start my
learning then. And I'll look up some tut's on the 'net in the
meantime.

Aug 9 '08 #3
I think you are talking about "screen scraping".

Your program can get the html for the page, and search for an
appropriate pattern.

Look at the source for a YouTube view page and you will see a string

var embedUrl = 'http://....

You can write code to search for that in the html text.

But embedUrl is not a standard javascript item; it is part of the
YouTube design. You will be relying on that, and if YouTube changes
how they provide such a string, you will be out of luck; at best you
will have to rewrite part of your code.

In other words, screen scraping relies on the site not doing a major
reworking and on you doing a little reverse engineering of the page
source.

So how to get the html into your program? In python the answer to that
is urllib.

http://docs.python.org/lib/module-urllib.html

mt
Aug 9 '08 #4
On Aug 9, 12:59 pm, KillSwitch <gu.yakahug...@gmail.comwrote:
Is it possible to make a program to search a site on the internet,
then get certain information from the web pages that match and display
them? Like, you would put in keywords to be searched for on
youtube.com, then it would search youtube.com, get the names of the
videos, the links, and the embed information? Or something like that.
You might find the mechanize module handy:
http://wwwsearch.sourceforge.net/mechanize/

And possibly BeautifulSoup:
http://www.crummy.com/software/BeautifulSoup/
Aug 9 '08 #5
Michael Tobis wrote:
I think you are talking about "screen scraping".

Your program can get the html for the page, and search for an
appropriate pattern.
However, it wouldn't be "really fast", because you
still have to fetch all the pages that might contain
data you're looking for.

Google searches are fast because they've already
fetched all the web pages in the world and indexed
them.

You might get somewhere using a program that does
a site-specific google search to find potentially
relevant pages, then goes and looks at those pages
for further information.

Another possibility might be to crawl the site and
build your own index based on the information you're
interested in.

--
Greg
Aug 10 '08 #6
Thanks a lot for all of everyones help, I am really looking forward to
learning the ins and ous of python or my first programming language.
Aug 10 '08 #7
I was doing something very similar on my windows XP machine a year ago
(with python 2.4) and used Mayukh Bose's Internet Explorer controller
(see http://www.mayukhbose.com/python/IEC/index.php for details/
download). It worked very nicely for my needs and was rather
intuitive (generally much easier and required fewer brain cells than
using urllib) ... here's some clips from the project:

# this window will be for our initial data pull-in
ie = IEController()
ie.Navigate('http://<whateverYourSiteNameIs>')
ie.ClickButton(caption='Advanced')
...
ie.SetInputValue('search_string',strUserID)
ie.ClickButton(name='image')
...
strAllText = ie.GetDocumentText() # gets all html source code
from current page.
...
ie.CloseWindow()

I wish someone could make a similar one for Firefox.
Aug 10 '08 #8
Google does'nt allow use of their API's anymore, I belive Yahoo has one or
you could do something like below.

searchstring = 'stuff here'

x = os.popen('lynx -dump http://www.google.com/search?q=%s' %
searchstring).readlines()
-----Original Message-----
From: Steven D'Aprano [mailto:st***@REMOVE-THIS-cybersource.com.au]
Sent: Friday, August 08, 2008 11:22 PM
To: py*********@python.org
Subject: Re: internet searching program

On Fri, 08 Aug 2008 19:59:02 -0700, KillSwitch wrote:
Is it possible to make a program to search a site on the internet, then
get certain information from the web pages that match and display them?
Like, you would put in keywords to be searched for on youtube.com, then
it would search youtube.com, get the names of the videos, the links, and
the embed information? Or something like that.
Search the Internet? Hmmm... I'm not sure, but I think Google does
something quite like that, but I don't know if they do it with a computer
program or an army of trained monkeys.

--
Steven
Aug 11 '08 #9
On Aug 12, 12:03*am, "Support Desk" <m...@ipglobal.netwrote:
Google does'nt allow use of their API's anymore, I belive Yahoo has one
Are you sure?

"Google Custom Search enables you to search over a website or a
collection of websites. You can harness the power of Google to create
a search engine tailored to your needs and interests, and you can
present the results in your website. Your custom search engine can
prioritize or restrict search results based on websites you specify."

http://code.google.com/apis/customsearch/
Aug 12 '08 #10
On 8月12日, 下午1时44分, alex23 <wuwe...@gmail.com>wrote:
On Aug 12, 12:03 am, "Support Desk" <m...@ipglobal.netwrote:
Google does'nt allow use of their API's anymore, I belive Yahoo has one

Are you sure?

"Google Custom Search enables you to search over a website or a
collection of websites. You can harness the power of Google to create
a search engine tailored to your needs and interests, and you can
present the results in your website. Your custom search engine can
prioritize or restrict search results based on websites you specify."

http://code.google.com/apis/customsearch/
http://www.muffler-silencer.com

V-4 type Series Muffler

B type Series Muffler
Aug 12 '08 #11

Yes, I believe the custom search allows you to embed a google search into your website and customize it, but they no longer allow you to use a script to access search results unless you go about it in a roundabout way

http://googlesystem.blogspot.com/200...no-longer.html
-----Original Message-----
From: ma***@tradeinfo.cn [mailto:ma***@tradeinfo.cn]
Sent: Tuesday, August 12, 2008 3:09 AM
To: py*********@python.org
Subject: Re: internet searching program

On 8鏈12鏃, 涓嬪崍1鏃44鍒, alex23 <wuwe...@gmail.comwrote:
On Aug 12, 12:03 am, "Support Desk" <m...@ipglobal.netwrote:
Google does'nt allow use of their API's anymore, I belive Yahoo has one

Are you sure?

"Google Custom Search enables you to search over a website or a
collection of websites. You can harness the power of Google to create
a search engine tailored to your needs and interests, and you can
present the results in your website. Your custom search engine can
prioritize or restrict search results based on websites you specify."

http://code.google.com/apis/customsearch/
http://www.muffler-silencer.com

V-4 type Series Muffler

B type Series Muffler
Aug 12 '08 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: J-miami | last post by:
I am just starting to learn Perl. I had an idea that there should be free open-source Internet Cafe management software for Linux. I searched around online but couldn't find anything. The...
6
by: harry | last post by:
Hi, I have a program that runs on multiple client pc's. Occasionally one or more of those pc's use VPN to connect to another corporate network. When using VPN they need to set proxy server in...
1
by: Al Christoph | last post by:
I'm running VB2003 on a new Dell Laptop where Dell in its infinite wisdom put on Norton Internet Security. UGH! Maybe I should have paid business rather than home prices:-) At any rate with...
5
by: Alan Mackenzie | last post by:
I've recently moved onto a C++ project with a large number of directories (several hundred) containing an even larger number of C++ source files. There are vastly more ways in C++ to obfuscate a...
3
by: crashonyou | last post by:
hello again..i've been searching for quite some time now already looking for a solution to printing word documents with python..same thing for internet explorer..i was experimenting around with some...
0
by: denishajoely1 | last post by:
Hi , Hope you are doing great. Recently I needed to conduct a survey on Employee Satisfaction at the work place to design an employee retention program. I was searching on the net for help...
5
by: lemlimlee | last post by:
hello, this is the task i need to do: For this task, you are to develop a Java program that allows a user to search or sort an array of numbers using an algorithm that the user chooses. The...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.