471,852 Members | 1,318 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,852 software developers and data experts.

How to lookup into dictionary and split sentences from file into words?

I'm dealing with a problem in which I have to scan through a text file (there are one or more sentences in this file) and print out all the separated words in a sentence

We input Sentence.txt file,and run Perl program.It will lookup in a dictionary file (.txt) and then print out the words in sentence(s) which appear in the dictionary

For example: "today is Saturday". After lookup in dictionary, if the words match, Perl will print out: today,is,saturday (each word in one line)

Hope that you can help me. [point in pic to see more in detail ^^]
Nov 29 '10 #1
18 4236
3,503 Expert Mod 2GB
My first request would be that you add the following two lines to the beginning of the script:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
Those will get rid of the simple errors, mostly syntactical and such, and allow you to specify the real error you are encountering.

Second, you need to please specify what you are seeing as wrong. You told us what you are trying to do, but failed to either ask a question or specify the error or even what is going wrong.


Nov 29 '10 #2
^^ thanks Jeff.As usual,I always put strict and warning in the code but this time i've forgot that things

Before putting those lines, i do not see any errors during compiling the code, but actually the code does not run.

Now it seems that i have many proplem with the variable and function decleration

Nov 30 '10 #3
589 Expert Mod 512MB
When posting code, it's best to use the code tags instead of posting a graphic image.

What happens when you fix those errors, or do you not know what those errors mean or how to fix them?
Nov 30 '10 #4
589 Expert Mod 512MB
Since you never declared and assigned anything to the @FILE1 array, what do you expect this line to do?
Expand|Select|Wrap|Line Numbers
  1. foreach $word (@FILE1)
Nov 30 '10 #5
@RonB:I do not hesitate to post a question here just because i'm a newbie and I really want to learn sth about Perl

about this line of code, i considered file Dictionary as an array of string so that i can compare between words in sentences and words in dictionary

do u have any suggestion about the decleration and the code line above?
Nov 30 '10 #6
3,503 Expert Mod 2GB
For the last screen shot of errors you posted, it looks like you definitely put the pragmas in place. What you have to realize is that when you use them, there are certain things you MUST do. For instance, when declaring variables, you would now have to use the "my" keyword in front of them. For instance:

Expand|Select|Wrap|Line Numbers
  1. my $line = "test";
You will have to do that before each variable, even if you have a section at the top of the script (after the pragmas), that is a simple declaration of each variable.


Nov 30 '10 #7
589 Expert Mod 512MB
The mere act of opening a filehandle does not place the contents of that file into an array. You need to read/parse the file.

Please provide more details.

Does each line in your dictionary file consist of a single word, or a phrase that needs to be matched?

If the dictionary lines consist of single words, does every word of a line in the text.txt need to be matched to be concidered successful and outputted?

You should parse the dictionary file and put its contents into a hash to simplify and make more efficient the matching.
Dec 1 '10 #8
Actually,each line in my dictionary file consists of both kind of "words":a single word,and a "phrase" (eg. do not, everyone's) that needs to be matched

And the program will choose the longer word than the shorter one. For example: Havard University. it will choose havard university instead of havard or university alone. In fact,these words appear in the dictionary
Dec 1 '10 #9
589 Expert Mod 512MB
Ok, then the starting point from here is for you to take our suggestions and rework your script then post back with your script and its results and an updated question based on those results.
Dec 1 '10 #10
ok ^^ i have almost go to the result,but there is a big problem now.hx. here is my code (just leave use strict, and use warnings aside for now):

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  3. print "input sentence you want to split: \n";
  5. my $string = <STDIN>;
  7. chomp $string;
  9. @dict = (
  10.         "if",
  11.         "error",
  12.         "does not",
  13.         "he",
  14.         "imperfect",
  15.          );
  17. $pos = 0;
  19. while ($pos <= length ($string))
  20. {
  21.        $myword = "";
  23.        foreach $word (@dic)
  24.        {
  25.                $newpos = index ($string, $word, $pos);
  26.                if ($newpos == $pos && length ($word)) > length ($myword))
  27.                {
  28.                         $myword = $word;
  29.                }
  30.        }
  31.                if ($myword)
  32.                {
  33.                        print $myword. "\n";
  34.                        $pos += length ($myword);
  35.                }
  36.                else
  37.                {
  38.                        $pos ++;
  39.                }
  40. }
if i type the sentence: if he does not imperfect. Perl print successfully:
does not

But the big trouble is that, this dictionary is not a file.u know, this a just kind of hash or array. If I open a file dictionary, it will not work like that :( How do i solve the problem with file to finish this stuff :(( How can i open and read file dictionary, and make it work like the "dictionary" above?
Dec 1 '10 #11
589 Expert Mod 512MB
Expand|Select|Wrap|Line Numbers
  1. open my $dictionary, '<', 'dictionary.txt'
  2.   or die "Failed to open 'dictionary.txt' $!";
  4. chomp(my @dictionary = <$dictionary>);
Dec 1 '10 #12
^^ first of all, thanks a lots RonB.
Actually, i'm nearly to the point

For example, when i type: if error rate of ham is imperfect

perl will print out:


so there's word that does not appear in the result.

another example: if everyone's imperfect so error rate of ham is high

Perl print everything except "imperfect" and "so". at first i thought about the position of them, but when i change their positions, Perl still does not catch them. I wonder this happens the same with other words :(

any suggestions?
Dec 1 '10 #13
3,503 Expert Mod 2GB
Looking at your code, I am guessing you chose not to deal with the errors produced by the pragmas. Sorry, I will not assist you with this code unless the pragmas are in place and you deal with the errors it brings up.

There is no reason for us to have to deal with your syntactical errors.


Dec 1 '10 #14
Thanks numberwhun,i've fixed the syntactial errors that you posted earlier.

Now i can read from a file, look it up in the dictionary and split both single and complex words in the file

However, in order to optimize the Perl code, I'm trying to print out not only the words that appear in the dictionary but the words not in dictionary as well

For example: Today i do not want to do the exercises

The dictionary contains: today, I, do not,to,do (for examle)

Now what i want is that the Perl code will split and print out not only "today, I, do not, to,do" but "the", "want and "exercises" as well

any suggest for my code?

the code is below:
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2. #this is file test.pl
  4. use strict;
  5. use warnings;
  7. open my @dictionary, '<', 'Dictionary.txt' or die $!;
  8. chomp(my @dictionary = <$dictionary>);
  9. print ("\nCongratulation!This is result:\n");
  11. while (my $string =<>)
  12. {
  13.      print "\n----------\n"; 
  14. #this is to separate each sentence of the file that contains more than 2 sentences
  15.      print "$string\n";
  16.      my $pos = 0;
  17.      while ($pos <= length ($string))
  18.      {
  19.           my $myword = "";
  20.           foreach my $word (@dictionary)
  21.           {
  22.               my $newpos = index ($string, $word, $pos);
  23.               if ($newpos == $pos && length($word)>length ($myword))
  24.               {
  25.                   $myword = $word;
  26.               }
  27.           }
  28.           if ($myword)
  29.           {
  30.                print "\n$myword\n";
  31.                $pos += length ($myword);
  32.           } 
  33.           else
  34.           {
  35.                $pos ++;
  36.           }
  37.      }       
  38. }
input: sentence.txt, Dictionary.txt
sentence.txt contains 2 sentences,for example:
- today I do not want to do the exercise
- he is very handsome
run: perl test.pl sentence.txt
perl will print out
today I do not want to do the exercise
do not

words "the", "exercise" do not appear in the dictionary (say for example) so that Perl will not print out. help me to do this stuff,plz
Dec 26 '10 #15
3,503 Expert Mod 2GB
I would take the words that are not in the dictionary and put them into their own array while you cycle though your sentences. Then, just print out that array afterwards.


Dec 26 '10 #16
i'm afraid that i do not get your idea about the array. now i try putting the code below into the if($myword) loop:

Expand|Select|Wrap|Line Numbers
  1. string =~ s/$myword//g;
  2. print "$string\n";
but the result is not up to my expectation.
what i'm trying to do is: this code will replace all the words that match (i mean $myword) with the blank character in string <=> i delete all matched words in the string.
then i'll print out the remained words that do not match with dictionary
any suggestion to fix my code?
Dec 31 '10 #17
is there any support?
Jan 2 '11 #18
589 Expert Mod 512MB
I'm not real clear about what the program specs were that your instructor gave you and since this is your homework assignment, I can't give you a complete solution. However, I will give you rough pseudo code or plan based on what I think you want.

1) load the "dictionary" file into an array and sort that array by line length

2) open a filehandle to the text file passed to the script

3) use a labeled while loop to process that file

4) for each line in the file loop over the "dictionary" array and test if that "word" is in the line. If it is, split the line into words and output each word on its own line and then, using the LABEL, move to the next iteration of the while loop.
Jan 2 '11 #19

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

12 posts views Thread by Martin Dieringer | last post: by
4 posts views Thread by qwweeeit | last post: by
1 post views Thread by mia456789 | last post: by
7 posts views Thread by HumanJHawkins | last post: by
2 posts views Thread by ownowl | last post: by
3 posts views Thread by Amit Gupta | last post: by
reply views Thread by Sarah Adam | last post: by
reply views Thread by NeoPa | last post: by
reply views Thread by YellowAndGreen | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.