473,407 Members | 2,676 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

Extract URLs from search engine results

12
I want to write a program in java that it can parse search engine results and save URLs and titles in a file, but I don't know how can I do this?
Apr 23 '10 #1
6 3506
jkmyoung
2,057 Expert 2GB
You probably want to look up HTTPRequest and Regular Expressions.

eg something like:
<a href=".*{url}">.*{title}</a>
Apr 23 '10 #2
Nikkhah
12
@jkmyoung
Do you know what instructions do I use?
Apr 23 '10 #3
jkmyoung
2,057 Expert 2GB
? What task are you specifically referring to?

If you haven't already, break your problem into steps. It's easier to help with one step at a time then all of them at once.
Apr 23 '10 #4
Nikkhah
12
@jkmyoung
I want to connect to google and save URLs. what do I do?
Apr 23 '10 #5
Nikkhah
12
I find a sample code that extract links from an html page. now how can I save html source of search results?
Apr 23 '10 #6
jkmyoung
2,057 Expert 2GB
Here's a quick example from Sun's website:
http://74.125.113.132/search?q=cache...&ct=clnk&gl=ca
Expand|Select|Wrap|Line Numbers
  1.     public static void main(String[] args) throws Exception {
  2.         URL url = new URL("http://www.google.com/");
  3.         URLConnection urlc = url .openConnection();
  4.         BufferedReader in = new BufferedReader(
  5.                 new InputStreamReader(
  6.                         urlc.getInputStream()));
  7.         String inputLine;
  8.  
  9.         while ((inputLine = in.readLine()) != null) 
  10.             System.out.println(inputLine);
  11.         in.close();
  12.     }
  13.  
It's pretty easy to find with google.

Try modifying it to meet your needs, and come back if you have a problem.
Apr 23 '10 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: phpkid | last post by:
Howdy I've been given conflicting answers about search engines picking up urls like: http://mysite.com/index.php?var1=1&var2=2&var3=3 Do search engines pick up these urls? I've been considering...
5
by: George | last post by:
Hi, Anyone has the background for explaining? I have made a search on my name and I have got a link to another search engine. The link's title was the search phrase for the other search engine...
14
by: vic | last post by:
My manager wants me to develop a search program, that would work like they have it at edorado.com. She made up her requirements after having compared how search works at different websites, like...
4
by: John | last post by:
Greetings, all, Several days after adding personalized URLs to my "amazing" collection of "God Loves (yourname)" mazes, it occurred to me that if someone were to create an offcolor term, then...
2
by: jmensch | last post by:
Hello. I'm a reasonably new ASP.NET programmer with no prior ASP or web development experience, but a lot of general programming experience. I'm using Visual Web Developer Beta Express 2005. ...
7
by: teo | last post by:
hallo, I need to extract a word and few text that precedes and follows it (about 30 + 30 chars) from a long textual document. Like the description that Google returns when it has found a...
2
by: dbojan | last post by:
From this search engine: https://siteexplorer.search.yahoo.com/mysites when I make all subdomains search just by typing a domain url in a search box like blogspot.com -and after I verify my...
0
by: passion | last post by:
"Specialized Search Engines" along with Google Search Capability (2 in 1): http://specialized-search-engines.blogspot.com/ Billions of websites are available on the web and plenty of extremely...
2
by: anbaxter | last post by:
I have a small challenge and you'll have to excuse me because I haven’t touched JS for some time and have gotten a bit rusty. I have an intranet site at work that has roughly 500,000 htm pages...
8
by: Bruno Rafael Moreira de Barros | last post by:
I have this framework I'm building in PHP, and it has Search Engine Friendly URLs, with site.com/controller/page/args... And on my View files, I have <?=$this->baseURL;?to print the base URL on the...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.