By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,852 Members | 1,426 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,852 IT Pros & Developers. It's quick & easy.

eleminating redundant words from a file

P: 89
Hi Friends,
I need a little help….

I have a file that looks like this:

cat cat cat dog dog cat dog cat cat dog dog
put put low put put cat dog dog cat cat put


The output of the file will look some thing like this:

cat dog cat dog cat dog
put low put cat dog cat put
.
.
That means no two words similar will repeat.

Let the file name is: repfile

In shell I am doing it like this:

Expand|Select|Wrap|Line Numbers
  1. exec 7 < &0
  2. exec < repfile
  3.  
  4. oldifs = $IFS;
  5.  
  6. var = “”
  7.  
  8. while read word      # read the first line
  9.  
  10. do 
  11.  
  12.             i = 0
  13.             IFS = “ “
  14.             while ($$i)
  15.             do 
  16.                         if( $$i –ne $var ){
  17.                              echo $$i >> unifile    # unifile is the file for output
  18.                              $var = $$i
  19.                         }
  20.                         fi 
  21.                         i++
  22.             done
  23. var = “”
  24. done
  25.  
  26. exec 0 <&7  7<&-
  27. IFS = $oldifs

Now I need to do it in perl:

Expand|Select|Wrap|Line Numbers
  1. open(STDOUT, “>>unifile”);
  2. open(STDIN, “repfile”);
  3.  
  4. while(<STDIN>){
  5.  
  6.      ## now how to separate words of the file, like that I do in shell as IFS = “ “
  7.   ### how to get words like $_[$i] 
  8.  
  9.             }
  10.  
  11. close(STDIN);
  12. close(STDOUT); 
Can you please help me.

Regards
rohit
Mar 24 '08 #1
Share this Question
Share on Google+
12 Replies


nithinpes
Expert 100+
P: 410
You can achieve it using regular expressions in a simple way:
Expand|Select|Wrap|Line Numbers
  1. open(FOUT, “>>unifile”);
  2. open(FIN, “repfile”);
  3. while(<FIN>) {
  4.  while(/((\w+)\s+)\2/ig) {  ## find for repetitive words
  5.   s/((\w+)\s+)\2/$1/;
  6. }
  7. print FOUT $_;
  8. }
  9. close(FIN);
  10. close(FOUT);
  11.  
In the RE, \2 is used to back-refer previous word(\w+) inside second parantheses.
Mar 24 '08 #2

nithinpes
Expert 100+
P: 410
To answer your exact question i.e how to separate words in a line, you can use split function.
Expand|Select|Wrap|Line Numbers
  1.  @words = split(/ /,$line);  #where / / indicates single-space as delimiter
  2. ## for-multiple spaces between words, use
  3.  @words1 = split(/\s+/,$line);
  4.  
While reading from files ;
Expand|Select|Wrap|Line Numbers
  1. while(<>) {
  2.  @words = split; ## By default, single-space is delimiter and $_ the 2nd argument
  3.  
You can now iterate through the words using an index variable and for() loop OR using foreach() loop.
Mar 24 '08 #3

P: 89
You can achieve it using regular expressions in a simple way:
Expand|Select|Wrap|Line Numbers
  1. open(FOUT, “>>unifile”);
  2. open(FIN, “repfile”);
  3. while(<FIN>) {
  4.  while(/((\w+)\s+)\2/ig) {  ## find for repetitive words
  5.   s/((\w+)\s+)\2/$1/;
  6. }
  7. print FOUT $_;
  8. }
  9. close(FIN);
  10. close(FOUT);
  11.  
In the RE, \2 is used to back-refer previous word(\w+) inside second parantheses.
regular exp reply is always good....
but is there any to get the words in the line as $_[$i]
for i=0 to i++, as long as $_[$i] is defined... for a particular line....
Mar 24 '08 #4

nithinpes
Expert 100+
P: 410
regular exp reply is always good....
but is there any to get the words in the line as $_[$i]
for i=0 to i++, as long as $_[$i] is defined... for a particular line....
Expand|Select|Wrap|Line Numbers
  1. for($i=0;$i<@arr;$i++) { ## array in scalar context returns length of array
  2.  
OR

Expand|Select|Wrap|Line Numbers
  1. for($i=0;$i<=$#arr;$i++) { ### here $#arr refers to last index value of @arr
  2.  
Mar 24 '08 #5

KevinADC
Expert 2.5K+
P: 4,059
the more perlish way is to use the range operator:

Expand|Select|Wrap|Line Numbers
  1. for my $i (0..$#array) {
  2.      print $array[$i],"\n";
  3. }
Mar 24 '08 #6

P: 89
Expand|Select|Wrap|Line Numbers
  1. for($i=0;$i<@arr;$i++) { ## array in scalar context returns length of array
  2.  
OR

Expand|Select|Wrap|Line Numbers
  1. for($i=0;$i<=$#arr;$i++) { ### here $#arr refers to last index value of @arr
  2.  
will this create a two dimentional array......
how to access each word from that array...
are you saying that $arr[0] will produce "cat".
is there any use of "@$_"
Mar 25 '08 #7

nithinpes
Expert 100+
P: 410
will this create a two dimentional array......
how to access each word from that array...
are you saying that $arr[0] will produce "cat".
If you want to create a two-dimensional array out of the data with each element being an array referring to each line and containing words as its elements, here is how you need to do it.
Let me take sample data from data.txt
Expand|Select|Wrap|Line Numbers
  1. cat cat cat dog dog cat dog cat cat dog dog
  2. put put low put put cat dog dog cat cat put
  3.  
To create 2-D array, you can use:
Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. ##creating 2-D array
  4. open(DATA,"data.txt") or die "opening failed:$!";
  5. my @data;
  6. while(<DATA>) {
  7. chomp;
  8. push @data,[split]; 
  9.  ##for multi-spaces use -push @data,[split(/\s+/,$_)]; 
  10.  }
  11.  
  12. ###To access each word in the file using foreach()
  13. print "using foreach(): \n";
  14. foreach(@data) {
  15.   print "$_\n" foreach(@{$_});   ##Note: each element is reference to an array
  16. }
  17.  
  18. ###To access each word in the file using for()
  19. print "using for(): \n";
  20. for my $i (0..$#data) {
  21.     for my $j (0..$#{$data[$i]}) {
  22.      print "$data[$i][$j]\n";
  23.      }
  24. }
Mar 25 '08 #8

P: 89
If you want to create a two-dimensional array out of the data with each element being an array referring to each line and containing words as its elements, here is how you need to do it.
Let me take sample data from data.txt
Expand|Select|Wrap|Line Numbers
  1. cat cat cat dog dog cat dog cat cat dog dog
  2. put put low put put cat dog dog cat cat put
  3.  
To create 2-D array, you can use:
Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. ##creating 2-D array
  4. open(DATA,"data.txt") or die "opening failed:$!";
  5. my @data;
  6. while(<DATA>) {
  7. chomp;
  8. push @data,[split]; 
  9.  ##for multi-spaces use -push @data,[split(/\s+/,$_)]; 
  10.  }
  11.  
  12. ###To access each word in the file using foreach()
  13. print "using foreach(): \n";
  14. foreach(@data) {
  15.   print "$_\n" foreach(@{$_});   ##Note: each element is reference to an array
  16. }
  17.  
  18. ###To access each word in the file using for()
  19. print "using for(): \n";
  20. for my $i (0..$#data) {
  21.     for my $j (0..$#{$data[$i]}) {
  22.      print "$data[$i][$j]\n";
  23.      }
  24. }
great done this is what i am looking for:

i change it little bit so that people can have a look and fill:

Expand|Select|Wrap|Line Numbers
  1. open(FIN, "file1");
  2.  open(FOUT, ">>file2");
  3.  
  4. my @data;
  5. while(<FIN>){
  6. chomp;
  7. push @data,[split];
  8.  ##for multi-spaces use -push @data,[split(/\s+/,$_)];
  9.  }
  10.  
  11. ###To access each word in the file using foreach()
  12. print "  using foreach(): \n";
  13. foreach(@data) {
  14.   print "$_\n" foreach(@{$_});    ##Note: each element is reference to an array
  15. }
  16.  
  17. ###To access each word in the file using for()
  18. print "using for(): \n";
  19. for my $i (0..$#data) {
  20.     for my $j (0..$#{$data[$i]}) {
  21.         print "$i....$j \n";
  22.      print "$data[$i][$j]\n";
  23.      }
  24. }
but whats the req for this for loop:
print "$_\n" foreach(@{$_});
it can work as:
print "$_ \n"; ---- that is what it is doing there also.

anyway thanks
Mar 25 '08 #9

nithinpes
Expert 100+
P: 410
but whats the req for this for loop:
print "$_\n" foreach(@{$_});
it can work as:
print "$_ \n"; ---- that is what it is doing there also.

anyway thanks
Didn't you try out the difference between these two!!
@data is a two dimensional array. If you use
Expand|Select|Wrap|Line Numbers
  1. foreach(@data) {
  2. print "$_ \n";
  3. }
  4.  
It would print, array references (reference name followed by hexadecimal number) . Something like:

Expand|Select|Wrap|Line Numbers
  1. ARRAY(0x1831a24)
  2. ARRAY(0x184f34c)
  3.  
You have to derefer this array to further print out the elements further:

Expand|Select|Wrap|Line Numbers
  1. foreach(@data) {
  2. print "$_\n" foreach(@{$_});  ## print element by element
  3. ##OR
  4. #print "@{$_}";  # print entire element array
  5. }
  6.  
I wonder how both the ways worked same for you!
Mar 25 '08 #10

P: 89
Didn't you try out the difference between these two!!
@data is a two dimensional array. If you use
Expand|Select|Wrap|Line Numbers
  1. foreach(@data) {
  2. print "$_ \n";
  3. }
  4.  
It would print, array references (reference name followed by hexadecimal number) . Something like:

Expand|Select|Wrap|Line Numbers
  1. ARRAY(0x1831a24)
  2. ARRAY(0x184f34c)
  3.  
You have to derefer this array to further print out the elements further:

Expand|Select|Wrap|Line Numbers
  1. foreach(@data) {
  2. print "$_\n" foreach(@{$_});  ## print element by element
  3. ##OR
  4. #print "@{$_}";  # print entire element array
  5. }
  6.  
I wonder how both the ways worked same for you!
but i am using this code:

Expand|Select|Wrap|Line Numbers
  1. open(FIN, "file1");
  2. open(FOUT, ">>file2");
  3.  
  4.  
  5.  while(<FIN>){
  6.                 my $srt = $_;
  7.                 $_ = "a";
  8.                 print $_;
  9.                 my @word = split(/ /, $srt);
  10.                 print " \n $word[0], $word[4] \n";
  11.                 foreach(@word){ 
  12.                        #print "\n";
  13.                         print " $_ .. ";
  14.                 print "  ";
  15.                         }
  16.         print " \n for each line \n";
  17.                 }
  18. close(FIN);
  19. close(FOUT);
  20.  
for the last $_ i am getting as:
cat.. cat..
not getting any address.
Mar 25 '08 #11

nithinpes
Expert 100+
P: 410
but i am using this code:

open(FIN, "file1");
open(FOUT, ">>file2");


while(<FIN>){
my $srt = $_;
$_ = "a";
print $_;
my @word = split(/ /, $srt);
print " \n $word[0], $word[4] \n";
foreach(@word){
#print "\n";
print " $_ .. ";
print " ";
}
print " \n for each line \n";
}
close(FIN);
close(FOUT);

i am getting as:
cat.. cat..
not getting any address.
What you have created here is a one-dimensional array. The script I posted was for creating a two-dimensional array as a reply to your previous posting(to bring out diff. between !-D and 2-D array).
will this create a two dimentional array......
Anyways, you got what you wanted. But, keep in mind the difference between the two scripts! :)
Mar 25 '08 #12

P: 89
What you have created here is a one-dimensional array. The script I posted was for creating a two-dimensional array as a reply to your previous posting(to bring out diff. between !-D and 2-D array).


Anyways, you got what you wanted. But, keep in mind the difference between the two scripts! :)
yes i got it...
thanks...
Mar 25 '08 #13

Post your reply

Sign in to post your reply or Sign up for a free account.