473,434 Members | 1,834 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,434 software developers and data experts.

How to lookup into dictionary and split sentences from file into words?

hi,
I'm dealing with a problem in which I have to scan through a text file (there are one or more sentences in this file) and print out all the separated words in a sentence

We input Sentence.txt file,and run Perl program.It will lookup in a dictionary file (.txt) and then print out the words in sentence(s) which appear in the dictionary

For example: "today is Saturday". After lookup in dictionary, if the words match, Perl will print out: today,is,saturday (each word in one line)



Hope that you can help me. [point in pic to see more in detail ^^]
Nov 29 '10 #1
18 4427
numberwhun
3,509 Expert Mod 2GB
My first request would be that you add the following two lines to the beginning of the script:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3.  
Those will get rid of the simple errors, mostly syntactical and such, and allow you to specify the real error you are encountering.

Second, you need to please specify what you are seeing as wrong. You told us what you are trying to do, but failed to either ask a question or specify the error or even what is going wrong.

Regards,

Jeff
Nov 29 '10 #2
^^ thanks Jeff.As usual,I always put strict and warning in the code but this time i've forgot that things

Before putting those lines, i do not see any errors during compiling the code, but actually the code does not run.

Now it seems that i have many proplem with the variable and function decleration

Nov 30 '10 #3
RonB
589 Expert Mod 512MB
When posting code, it's best to use the code tags instead of posting a graphic image.

What happens when you fix those errors, or do you not know what those errors mean or how to fix them?
Nov 30 '10 #4
RonB
589 Expert Mod 512MB
Since you never declared and assigned anything to the @FILE1 array, what do you expect this line to do?
Expand|Select|Wrap|Line Numbers
  1. foreach $word (@FILE1)
Nov 30 '10 #5
@RonB:I do not hesitate to post a question here just because i'm a newbie and I really want to learn sth about Perl

about this line of code, i considered file Dictionary as an array of string so that i can compare between words in sentences and words in dictionary

do u have any suggestion about the decleration and the code line above?
Nov 30 '10 #6
numberwhun
3,509 Expert Mod 2GB
For the last screen shot of errors you posted, it looks like you definitely put the pragmas in place. What you have to realize is that when you use them, there are certain things you MUST do. For instance, when declaring variables, you would now have to use the "my" keyword in front of them. For instance:

Expand|Select|Wrap|Line Numbers
  1. my $line = "test";
  2.  
You will have to do that before each variable, even if you have a section at the top of the script (after the pragmas), that is a simple declaration of each variable.

Regards,

Jeff
Nov 30 '10 #7
RonB
589 Expert Mod 512MB
The mere act of opening a filehandle does not place the contents of that file into an array. You need to read/parse the file.

Please provide more details.

Does each line in your dictionary file consist of a single word, or a phrase that needs to be matched?

If the dictionary lines consist of single words, does every word of a line in the text.txt need to be matched to be concidered successful and outputted?

You should parse the dictionary file and put its contents into a hash to simplify and make more efficient the matching.
Dec 1 '10 #8
Actually,each line in my dictionary file consists of both kind of "words":a single word,and a "phrase" (eg. do not, everyone's) that needs to be matched

And the program will choose the longer word than the shorter one. For example: Havard University. it will choose havard university instead of havard or university alone. In fact,these words appear in the dictionary
Dec 1 '10 #9
RonB
589 Expert Mod 512MB
Ok, then the starting point from here is for you to take our suggestions and rework your script then post back with your script and its results and an updated question based on those results.
Dec 1 '10 #10
ok ^^ i have almost go to the result,but there is a big problem now.hx. here is my code (just leave use strict, and use warnings aside for now):

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. print "input sentence you want to split: \n";
  4.  
  5. my $string = <STDIN>;
  6.  
  7. chomp $string;
  8.  
  9. @dict = (
  10.         "if",
  11.         "error",
  12.         "does not",
  13.         "he",
  14.         "imperfect",
  15.          );
  16.  
  17. $pos = 0;
  18.  
  19. while ($pos <= length ($string))
  20. {
  21.        $myword = "";
  22.  
  23.        foreach $word (@dic)
  24.        {
  25.                $newpos = index ($string, $word, $pos);
  26.                if ($newpos == $pos && length ($word)) > length ($myword))
  27.                {
  28.                         $myword = $word;
  29.                }
  30.        }
  31.                if ($myword)
  32.                {
  33.                        print $myword. "\n";
  34.                        $pos += length ($myword);
  35.                }
  36.                else
  37.                {
  38.                        $pos ++;
  39.                }
  40. }
if i type the sentence: if he does not imperfect. Perl print successfully:
if
he
does not
imperfect

But the big trouble is that, this dictionary is not a file.u know, this a just kind of hash or array. If I open a file dictionary, it will not work like that :( How do i solve the problem with file to finish this stuff :(( How can i open and read file dictionary, and make it work like the "dictionary" above?
Dec 1 '10 #11
RonB
589 Expert Mod 512MB
Expand|Select|Wrap|Line Numbers
  1. open my $dictionary, '<', 'dictionary.txt'
  2.   or die "Failed to open 'dictionary.txt' $!";
  3.  
  4. chomp(my @dictionary = <$dictionary>);
Dec 1 '10 #12
^^ first of all, thanks a lots RonB.
Actually, i'm nearly to the point

For example, when i type: if error rate of ham is imperfect

perl will print out:

if
error
rate
of
ham
is

so there's word that does not appear in the result.

another example: if everyone's imperfect so error rate of ham is high

Perl print everything except "imperfect" and "so". at first i thought about the position of them, but when i change their positions, Perl still does not catch them. I wonder this happens the same with other words :(

any suggestions?
Dec 1 '10 #13
numberwhun
3,509 Expert Mod 2GB
Looking at your code, I am guessing you chose not to deal with the errors produced by the pragmas. Sorry, I will not assist you with this code unless the pragmas are in place and you deal with the errors it brings up.

There is no reason for us to have to deal with your syntactical errors.

Regards,

Jeff
Dec 1 '10 #14
Thanks numberwhun,i've fixed the syntactial errors that you posted earlier.

Now i can read from a file, look it up in the dictionary and split both single and complex words in the file

However, in order to optimize the Perl code, I'm trying to print out not only the words that appear in the dictionary but the words not in dictionary as well

For example: Today i do not want to do the exercises

The dictionary contains: today, I, do not,to,do (for examle)

Now what i want is that the Perl code will split and print out not only "today, I, do not, to,do" but "the", "want and "exercises" as well

any suggest for my code?

the code is below:
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2. #this is file test.pl
  3.  
  4. use strict;
  5. use warnings;
  6.  
  7. open my @dictionary, '<', 'Dictionary.txt' or die $!;
  8. chomp(my @dictionary = <$dictionary>);
  9. print ("\nCongratulation!This is result:\n");
  10.  
  11. while (my $string =<>)
  12. {
  13.      print "\n----------\n"; 
  14. #this is to separate each sentence of the file that contains more than 2 sentences
  15.      print "$string\n";
  16.      my $pos = 0;
  17.      while ($pos <= length ($string))
  18.      {
  19.           my $myword = "";
  20.           foreach my $word (@dictionary)
  21.           {
  22.               my $newpos = index ($string, $word, $pos);
  23.               if ($newpos == $pos && length($word)>length ($myword))
  24.               {
  25.                   $myword = $word;
  26.               }
  27.           }
  28.           if ($myword)
  29.           {
  30.                print "\n$myword\n";
  31.                $pos += length ($myword);
  32.           } 
  33.           else
  34.           {
  35.                $pos ++;
  36.           }
  37.      }       
  38. }
input: sentence.txt, Dictionary.txt
sentence.txt contains 2 sentences,for example:
- today I do not want to do the exercise
- he is very handsome
run: perl test.pl sentence.txt
perl will print out
---------------
today I do not want to do the exercise
today
I
do not
to
do
---------------
he
is
very
handsome

words "the", "exercise" do not appear in the dictionary (say for example) so that Perl will not print out. help me to do this stuff,plz
Dec 26 '10 #15
numberwhun
3,509 Expert Mod 2GB
I would take the words that are not in the dictionary and put them into their own array while you cycle though your sentences. Then, just print out that array afterwards.

Regards,

Jeff
Dec 26 '10 #16
i'm afraid that i do not get your idea about the array. now i try putting the code below into the if($myword) loop:

Expand|Select|Wrap|Line Numbers
  1. string =~ s/$myword//g;
  2. print "$string\n";
  3.  
but the result is not up to my expectation.
what i'm trying to do is: this code will replace all the words that match (i mean $myword) with the blank character in string <=> i delete all matched words in the string.
then i'll print out the remained words that do not match with dictionary
any suggestion to fix my code?
Dec 31 '10 #17
is there any support?
Jan 2 '11 #18
RonB
589 Expert Mod 512MB
I'm not real clear about what the program specs were that your instructor gave you and since this is your homework assignment, I can't give you a complete solution. However, I will give you rough pseudo code or plan based on what I think you want.

1) load the "dictionary" file into an array and sort that array by line length

2) open a filehandle to the text file passed to the script

3) use a labeled while loop to process that file

4) for each line in the file loop over the "dictionary" array and test if that "word" is in the line. If it is, split the line into words and output each word on its own line and then, using the LABEL, move to the next iteration of the while loop.
Jan 2 '11 #19

Sign in to post your reply or Sign up for a free account.

Similar topics

12
by: Martin Dieringer | last post by:
I am trying to split a file by a fixed string. The file is too large to just read it into a string and split this. I could probably use a lexer but there maybe anything more simple? thanks m.
4
by: qwweeeit | last post by:
The standard split() can use only one delimiter. To split a text file into words you need multiple delimiters like blank, punctuation, math signs (+-*/), parenteses and so on. I didn't...
1
by: mia456789 | last post by:
I hv a mysql db in my RH linux , there is a very large table in the db , the file size is about 2G , how can I split the file into two files - two files physically and one file logically ? is ...
7
by: HumanJHawkins | last post by:
Hi all, I have a table of text and associated data. I want to break apart the text into individual words, yet retain the data in other columns. For example: Sentence: Chapter:...
2
by: Robert Oschler | last post by:
Can someone give me a regex expression that will split a sentence containing words and double-quoted phrases, into an array? I don't want the words between the double-quotes to be split using the...
4
by: sun6 | last post by:
this is a program counting words from "text_in.txt" file and writing them in "text_out.txt". it uses binary tree search, but there is an error when i use insert () thanks for any help ...
2
by: ownowl | last post by:
Hello beginer under python, I have a problem to get lines in a text file. lines have inside the \n (x0A) char, and le readline method split the line at this char too (not only at x0Dx0A). for...
7
by: xephia | last post by:
Hey all, I am resorting to posting this question after getting absolutely nowhere in four days of searching (except for maybe negative steps, got mad at my laptop and broke my delete key :( ) ...
3
by: Amit Gupta | last post by:
Need a python trick, if it exists: I have a file that stores key, value in following format -- "v1" : "k1", "v2" : "k2" -- Is there a way to directly load this file as dictionary in python....
0
by: Sarah Adam | last post by:
Hello Everybody I'm working on a Java program that splits a File into tokens, first i split it one word then 2 words and 3 words. i already did the one word division but i have a problem with...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.