Expand|Select|Wrap|Line Numbers
- #!/usr/bin/env perl
- ################################################################
- ### gsyn2
- ### Plamena Dragieva
- ### Oct 30, 2008
- ################################################################
- use strict;
- use warnings;
- use Carp;
- use English '-no_match_vars';
- use utf8;
- binmode(STDOUT, ":utf8");
- use LWP;
- use WWW::Mechanize 0.48;
- use Encode;
- my $word = decode_utf8($ARGV[0]);
- my $url = "http://www.google.com/search?hl=en&q=%7E$word&btnG=Search";
- my $mech = WWW::Mechanize->new(autocheck => 1);
- $mech->get($url);
- my $page = $mech->content;
- # Look for some bit of text occuring just before the real results.
- # Throw away all text before this.
- $page =~ s/.*?Save time//;
- my %found;
- # This matches anything in bold. Uses non-greedy matching.
- while ($page =~ m/<b>(.*?)<\/b>/g) {
- $found{$1}++;
- }
- # Get one more page from Google.
- $page = $mech->follow_link( text => "Next", n => 1 )->content;
- $page =~ s/.*?Results//;
- while ($page =~ m/<b>(.*?)<\/b>/g) {
- $found{$1}++;
- }
- foreach (keys %found) {
- my $key = $_;
- if ($key !~ m/\.\./ && $key !~ m/~$word/) {
- print "$_\n";
- }
- }
Here is the task.
Try running the gsyn program (located in /afs/sfs/lehre/dg/perl):
prompt> gsyn beer
11
budweiser
Beer
Ale
ale
20
beer
0.22
beers
724,000,000
You see that this program returns a number of words that are
related or synonymous with 'beer' (or whatever word you choose).
Try some other possibilities. Then try 'gsyn ale' and you get the
word "brewery". Similarly 'gsyn brewery' returns the word
"restaurant." So by transitivity, the word "beer" is related to the
word "restaurant." Modify the gsyn program so that it returns these
transitive relations (to some depth bound).