My data set is followed like this... (as tab delimited input.csv)
First is column is gene name, second column is bn count, third colums is sh count.
ENSRNOG00000000417 42 102
ENSRNOG00000000417 26 52
ENSRNOG00000000417 152 284
ENSRNOG00000000417 270 372
ENSRNOG00000000417 418 540
ENSRNOG00000000417 1796 2110
ENSRNOG00000000417 2116 2464
ENSRNOG00000000417 450 518
ENSRNOG00000000417 90 102
ENSRNOG00000000417 6472 7076
ENSRNOG00000000417 724 784
ENSRNOG00000000417 724 780
ENSRNOG00000000417 406 430
ENSRNOG00000000417 536 564
ENSRNOG00000000417 236 242
ENSRNOG00000000417 112 114
ENSRNOG00000000417 1272 1224
ENSRNOG00000000417 290 264
ENSRNOG00000000417 688 624
ENSRNOG00000000417 90 78
ENSRNOG00000000417 180 148
ENSRNOG00000000417 96 78
ENSRNOG00000000417 210 164
ENSRNOG00000000417 64 38
ENSRNOG00000000417 166 78
ENSRNOG00000001487 96 172
ENSRNOG00000001487 114 168
ENSRNOG00000001487 1190 1654
ENSRNOG00000001487 86 104
ENSRNOG00000001487 86 102
ENSRNOG00000001487 1536 1679
ENSRNOG00000001487 1518 1659
ENSRNOG00000001487 188 146
ENSRNOG00000001487 100 72
ENSRNOG00000001487 84 56
ENSRNOG00000001487 46 26
ENSRNOG00000001487 0 0
ENSRNOG00000001487 0 0
ENSRNOG00000001487 0 0
ENSRNOG00000001488 814 2714
ENSRNOG00000001488 314 780
ENSRNOG00000001488 1110 2156
ENSRNOG00000001488 956 1652
ENSRNOG00000001488 4334 6926
ENSRNOG00000001488 2450 3202
ENSRNOG00000001489 50 114
ENSRNOG00000001489 100 186
ENSRNOG00000001489 72 118
ENSRNOG00000001489 356 516
ENSRNOG00000001489 42 50
ENSRNOG00000001489 32 32
ENSRNOG00000001489 72 66
ENSRNOG00000001489 14 10
ENSRNOG00000001489 1040 352
ENSRNOG00000001489 0 0
I want to calculate for each gene names of total count, sum, mean and standard deviation..
I could calculate for the count, sum and mean but the Standard deviation is not calculating properly with my code..
This is my perl code
Expand|Select|Wrap|Line Numbers
- #usr/bin/perl -w
- use strict;
- my ($input) = @ARGV;
- open (INPUT, "$input") or die "Can't open the file";
- my %hash = ();
- while (<INPUT>) {
- chomp;
- my @data = split " ", $_;
- for my $i (1,2) {
- $hash {$data[0]} [$i-1]+=$data[$i];
- }
- $hash{$data[0]}[2]++;
- }
- my $sumofsq=0;
- foreach my $key (keys %hash) {
- print "Number, Sum, Mean, Stdev for BN & SH $key : ";
- foreach my $i (0..$#{$hash{$key}}-1) {
- if ($hash{$key}[$i] == 0 ) {
- my $count = sprintf $hash{$key}[2];
- print "$count \t 0 \t 0 \t 0 \t";
- }
- else {
- my $count1 = sprintf $hash{$key}[2];
- my $sum = sprintf "%.2f", $hash{$key}[$i];
- my $avg = sprintf "%.2f", $hash{$key}[$i] / $hash{$key}[2];
- my $sumofsq += ($hash{$key}[$i] - ($hash{$key}[$i] / $hash{$key}[2]))**2;
- my $stdev = sprintf "%.2f", sqrt($sumofsq / ($hash{$key}[2]));
- print "$count1 \t $sum \t $avg \t $stdev \t";
- }
- }
- print "\n";
- }
- close (INPUT);
- exit;
Number, Sum, Mean, Stdev for BN & SHR ENSRNOG00000000417 : 26 17660.00 679.23 3330.20 26 19242.00 740.08 3628.53
Number, Sum, Mean, Stdev for BN & SH ENSRNOG00000001487 : 14 5044.00 360.29 1251.78 14 5838.00 417.00 1448.82
Number, Sum, Mean, Stdev for BN & SH ENSRNOG00000001488 : 6 9978.00 1663.00 3394.58 6 17430.00 2905.00 5929.81
Number, Sum, Mean, Stdev for BN & SH ENSRNOG00000001488 : 6 9978.00 1663.00 3394.58 6 17430.00 2905.00 5929.81
So can you help me to find out the problem of standard deviation?
For example the output should be like this for
Number, Sum, Mean, Stdev for BN & SH ENSRNOG00000000417 : 26 17660.00 679.23 1295.36 26 19242.00 740.08 1427.99