Connecting Tech Pros Worldwide Forums | Help | Site Map

average by user defined cutoff

Member
 
Join Date: Sep 2007
Posts: 49
#1: Jun 7 '09
Hi all,
I was trying to calculate the average value from different parts of the same data file. For example, if suppose we have number 1 - 10 and i was trying to calculate the average of only first 3 values and then 4 values and then last 3 value and then calculate the three averages. I have written a code but I guess it is very good way to calculate it and sometimes i get garbage values also.

Here is a data file:
Expand|Select|Wrap|Line Numbers
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11.  
so when the user specifies the different parts as arguments, the average should be calculated.

Here is my perl code:
Expand|Select|Wrap|Line Numbers
  1. #!/usrbin/perl                                                                                                                                              
  2.  
  3. use strict;                                                                                                                                                 
  4. use warnings;                                                                                                                                               
  5.  
  6. my $file = $ARGV[0];
  7. my $cut1 = $ARGV[1];
  8. my $cut2 = $ARGV[2];
  9. my $cut3 = $ARGV[3];
  10. my (@tt,$result1,$result2,$result3);
  11.  
  12. open (A,$file);
  13. my ($count,$total) = 0;
  14. my ($val,$result);
  15.  
  16. while (<A>)
  17. {
  18.     my @temp = split (/\s+/,$_);
  19.     $val = $temp[1];
  20.     $total = $total + $val;
  21.     $count++;
  22.  
  23.     if ($count == $cut1)
  24.     {
  25.         $result1 = sprintf("%.3f",$total/$count);
  26.         push(@tt,$result1);
  27.          my $s1= $total;
  28.     my $r1 = $total/$count;
  29.     print "$s1\t$count\t$r1\n";
  30.     }
  31.  
  32.     if ($count ==  ($cut2+ $cut1))
  33.     {
  34.  
  35.         $result2 = sprintf("%.3f",($total-($result1*$cut1))/($count-$cut1));
  36.         push(@tt,$result2);
  37.         my $s2=($total -($result1*$cut1));
  38.         my $c2 = ($count - $cut1);
  39. my $r2 = $s2/$c2;
  40.         print "$s2\t$c2\t$r2\n";
  41.  
  42.     }
  43.     if ($count == ($cut3+$cut2+$cut1))
  44.     {
  45.         $result3 = $total- (($result1*$cut1) + ($result2*$cut2)) / ($count- ($cut1+$cut2));
  46.         push(@tt,$result3);
  47.         my $s3 = ($total- (($result1*$cut1) + ($result2*$cut2)));
  48.         my $c3 = $count - ($cut1+$cut2);
  49.         my $r3 = $s3/$c3;
  50.     print "$s3\t$c3\t$r3\n";
  51.     }
  52. }
  53. my $add = 0;
  54. foreach my $r(@tt)
  55. {
  56.     $add = $add +$r;
  57. }
  58. print "$add/scalar(@tt)\t";
  59. my $final = sprintf("%.3f",$add/scalar(@tt));
  60. print "$final\n";
  61.  
Right now i can take 3 user cutoffs but if i want to make this program take any number of cutoffs to calculate the averages.
I guess there must a better way to calculate the average from different section of same data file.
Here I have used just 1 to 10 numbers as examples but my actual data files have 16900 lines and i have to calculate the average by using different parts of the file.
Any help in this regard is appreciated.
Thanks
Kumar

KevinADC's Avatar
Expert
 
Join Date: Jan 2007
Location: Southern California USA
Posts: 4,091
#2: Jun 7 '09

re: average by user defined cutoff


If you have a file with just numbers on each line, why are you using split?

Expand|Select|Wrap|Line Numbers
  1.  my @temp = split (/\s+/,$_);
Anyway, something like this seems easier:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl                                                                                                                                              
  2.  
  3. use strict;                                                                                                                                                 
  4. use warnings;                                                                                                                                               
  5.  
  6. my $file = $ARGV[0];
  7. my $cut1 = $ARGV[1];
  8. my $cut2 = $ARGV[2];
  9. my $cut3 = $ARGV[3];
  10.  
  11. my (@cut1, @cut2, @cut3);
  12.  
  13. open (my $IN, $file) or die "$!";
  14. push @cut1, chomp <$IN> for (1..$cut1);
  15. push @cut2, chomp <$IN> for ($cut1+1..$cut2);
  16. push @cut3, chomp <$IN> for ($cut2+1..$cut3);
  17. close $IN;
  18.  
  19. average(\@cut1,\@cut2,\@cut3);
  20.  
  21. sub average {
  22.    my @arrays = @_;
  23.    my $sum;
  24.    foreach my $list (@arrays) {
  25.       $sum += $_ for @{$list};
  26.       my $avg = $sum / @{$list};
  27.       print "Average = $avg\n";
  28.       $sum = 0;
  29.    }
  30.  
Member
 
Join Date: Sep 2007
Posts: 49
#3: Jun 8 '09

re: average by user defined cutoff


Thanks KevinADC for the reply,
I ran your code on a file with numbers 1 to 10, but its throwing an error, "Can't modify <HANDLE> in chomp at new.pl line 14, near "<$IN> for "
Execution of new.pl aborted due to compilation errors."
I checked on the error and when I removed the chomp it was working fine but now "Illegal division by zero at new.pl line 27." error is coming and was not able to remove it.

Thanks
Kumar
KevinADC's Avatar
Expert
 
Join Date: Jan 2007
Location: Southern California USA
Posts: 4,091
#4: Jun 8 '09

re: average by user defined cutoff


oops, my bad. THis new version of the code assumes (for a 10 line file) that cut1 cut2 and cut 3 are equal to 3, 4, 3 respectively, if not the for() loop conditions need to be adjusted.

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3.  
  4. my $file = $ARGV[0];
  5. my $cut1 = $ARGV[1];
  6. my $cut2 = $ARGV[2];
  7. my $cut3 = $ARGV[3];
  8.  
  9. my (@cut1, @cut2, @cut3);
  10.  
  11. open (my $IN, $file) or die "$!";
  12. for (1 .. $cut1) {
  13.    chomp ($_ = <$IN>);
  14.    push @cut1,$_;
  15. }
  16. for ($cut1+1 .. $cut1+$cut2){
  17.    chomp ($_ = <$IN>);
  18.    push @cut2, $_;
  19. }
  20. for ($cut1+$cut2+1 .. $cut1+$cut2+$cut3){
  21.    chomp ($_ = <$IN>);
  22.    push @cut3, $_;
  23. }    
  24. close $IN;
  25.  
  26. average(\@cut1,\@cut2,\@cut3);
  27.  
  28. sub average {
  29.    my @arrays = @_;
  30.    my $sum;
  31.    foreach my $list (@arrays) {
  32.       $sum += $_ for @{$list};
  33.       my $avg = $sum / @{$list};
  34.       print "Average = $avg\n";
  35.       $sum = 0;
  36.    }
  37. }
Member
 
Join Date: Sep 2007
Posts: 49
#5: Jun 8 '09

re: average by user defined cutoff


Thanks so much Kevin,
the output for the calculation of 1 to 10 " Average = 2 ; Average = 5.5; Average = 9" is correct, only a last small help, if in the end if I want to take the average of these values then how do i modifiy the subroutine?

Thanks
Kumar
KevinADC's Avatar
Expert
 
Join Date: Jan 2007
Location: Southern California USA
Posts: 4,091
#6: Jun 8 '09

re: average by user defined cutoff


change the subroutine to something like this:

Expand|Select|Wrap|Line Numbers
  1. sub average {
  2.    my @arrays = @_;
  3.    my $sum;
  4.    my @averages;
  5.    my $total;
  6.    foreach my $list (@arrays) {
  7.       $sum += $_ for @{$list};
  8.       push @averages, $sum;
  9.       my $avg = $sum / @{$list};
  10.       print "Average = $avg\n";
  11.       $sum = 0;
  12.    }
  13.    ($total += $_) for @averages;
  14.    printf "Total Average = %.3f\n", $total / @averages;
  15. }
  16.  
Next time, please show some effort on your part first to solve the problem.
Reply