473,385 Members | 1,427 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

error in calculation of standard deviation

Hi,
My data set is followed like this... (as tab delimited input.csv)
First is column is gene name, second column is bn count, third colums is sh count.


ENSRNOG00000000417 42 102
ENSRNOG00000000417 26 52
ENSRNOG00000000417 152 284
ENSRNOG00000000417 270 372
ENSRNOG00000000417 418 540
ENSRNOG00000000417 1796 2110
ENSRNOG00000000417 2116 2464
ENSRNOG00000000417 450 518
ENSRNOG00000000417 90 102
ENSRNOG00000000417 6472 7076
ENSRNOG00000000417 724 784
ENSRNOG00000000417 724 780
ENSRNOG00000000417 406 430
ENSRNOG00000000417 536 564
ENSRNOG00000000417 236 242
ENSRNOG00000000417 112 114
ENSRNOG00000000417 1272 1224
ENSRNOG00000000417 290 264
ENSRNOG00000000417 688 624
ENSRNOG00000000417 90 78
ENSRNOG00000000417 180 148
ENSRNOG00000000417 96 78
ENSRNOG00000000417 210 164
ENSRNOG00000000417 64 38
ENSRNOG00000000417 166 78
ENSRNOG00000001487 96 172
ENSRNOG00000001487 114 168
ENSRNOG00000001487 1190 1654
ENSRNOG00000001487 86 104
ENSRNOG00000001487 86 102
ENSRNOG00000001487 1536 1679
ENSRNOG00000001487 1518 1659
ENSRNOG00000001487 188 146
ENSRNOG00000001487 100 72
ENSRNOG00000001487 84 56
ENSRNOG00000001487 46 26
ENSRNOG00000001487 0 0
ENSRNOG00000001487 0 0
ENSRNOG00000001487 0 0
ENSRNOG00000001488 814 2714
ENSRNOG00000001488 314 780
ENSRNOG00000001488 1110 2156
ENSRNOG00000001488 956 1652
ENSRNOG00000001488 4334 6926
ENSRNOG00000001488 2450 3202
ENSRNOG00000001489 50 114
ENSRNOG00000001489 100 186
ENSRNOG00000001489 72 118
ENSRNOG00000001489 356 516
ENSRNOG00000001489 42 50
ENSRNOG00000001489 32 32
ENSRNOG00000001489 72 66
ENSRNOG00000001489 14 10
ENSRNOG00000001489 1040 352
ENSRNOG00000001489 0 0


I want to calculate for each gene names of total count, sum, mean and standard deviation..
I could calculate for the count, sum and mean but the Standard deviation is not calculating properly with my code..
This is my perl code
Expand|Select|Wrap|Line Numbers
  1. #usr/bin/perl -w
  2. use strict;
  3.  
  4. my ($input) = @ARGV;
  5.  
  6. open (INPUT, "$input") or die "Can't open the file";
  7. my %hash = ();
  8.  
  9.  
  10. while (<INPUT>) { 
  11.     chomp;
  12.     my @data = split " ", $_;
  13.     for my $i (1,2) {
  14.         $hash {$data[0]} [$i-1]+=$data[$i];
  15.     }
  16.     $hash{$data[0]}[2]++;
  17. }
  18. my $sumofsq=0;
  19.     foreach my $key (keys %hash) {
  20.         print "Number, Sum, Mean, Stdev  for BN & SH  $key : ";
  21.         foreach my $i (0..$#{$hash{$key}}-1) {
  22.         if ($hash{$key}[$i] == 0 ) {
  23.                 my $count = sprintf $hash{$key}[2];            
  24.                 print "$count \t 0 \t 0 \t 0 \t";
  25.             }
  26.             else {
  27.         my $count1 = sprintf $hash{$key}[2];
  28.         my $sum = sprintf "%.2f", $hash{$key}[$i];    
  29.         my $avg = sprintf  "%.2f", $hash{$key}[$i] / $hash{$key}[2];
  30.               my $sumofsq += ($hash{$key}[$i] - ($hash{$key}[$i] / $hash{$key}[2]))**2;
  31.         my $stdev = sprintf "%.2f", sqrt($sumofsq / ($hash{$key}[2]));
  32.             print "$count1 \t $sum \t $avg \t $stdev \t";
  33.             }
  34.         }
  35.     print "\n";
  36. }
  37.  
  38. close (INPUT);
  39. exit;
  40.  
The output is
Number, Sum, Mean, Stdev for BN & SHR ENSRNOG00000000417 : 26 17660.00 679.23 3330.20 26 19242.00 740.08 3628.53
Number, Sum, Mean, Stdev for BN & SH ENSRNOG00000001487 : 14 5044.00 360.29 1251.78 14 5838.00 417.00 1448.82
Number, Sum, Mean, Stdev for BN & SH ENSRNOG00000001488 : 6 9978.00 1663.00 3394.58 6 17430.00 2905.00 5929.81
Number, Sum, Mean, Stdev for BN & SH ENSRNOG00000001488 : 6 9978.00 1663.00 3394.58 6 17430.00 2905.00 5929.81


So can you help me to find out the problem of standard deviation?

For example the output should be like this for
Number, Sum, Mean, Stdev for BN & SH ENSRNOG00000000417 : 26 17660.00 679.23 1295.36 26 19242.00 740.08 1427.99
Sep 17 '09 #1
6 3470
toolic
70 Expert
Since I could not determine what you problem was by looking at your code,
and because there is a discrepancy between your posted data and
your actual output (I only see 25 lines for gene 417, not 26), I decided
to code something else up myself. I also used very simple data,
which I just copied from the wikipedia standard deviation page.

Expand|Select|Wrap|Line Numbers
  1. use warnings;
  2. use strict;
  3. use List::Util qw(sum);
  4.  
  5. my %data;
  6. while (<DATA>) {
  7.     my ($gene, $bn, $sh) = split;
  8.     push @{ $data{$gene}{bn} }, $bn;
  9.     push @{ $data{$gene}{sh} }, $sh;
  10. }
  11.  
  12. for my $gene (keys %data) {
  13.     print "gene=$gene\n";
  14.     for my $type (qw(bn sh)) {
  15.         my @vals = @{ $data{$gene}{$type} };
  16.         my $tot = sum(@vals);
  17.         my $avg = $tot / @vals;
  18.         my $sos = 0;
  19.         for (@vals) {
  20.             $sos += ($_ - $avg)**2;
  21.         }
  22.         my $sd = sqrt($sos/@vals);
  23.         print "type=$type tot=$tot avg=$avg sd=$sd\n";
  24.     }
  25. }
  26. __DATA__
  27. foo 2 0
  28. foo 4 0
  29. foo 4 0
  30. foo 4 0
  31. foo 5 0
  32. foo 5 0
  33. foo 7 0
  34. foo 9 0
  35.  
This prints out:

Expand|Select|Wrap|Line Numbers
  1. gene=foo
  2. type=bn tot=40 avg=5 sd=2
  3. type=sh tot=0 avg=0 sd=0
  4.  
Perhaps you could use this simplified version of the code
to build upon.
Sep 18 '09 #2
Thanks for the help. It was my mistake that i missed a line in the gene 417. But your code is correct.. Its working fine.. Can you help me on this problem also..

The data in this format
_DATA__
foo 2 0 ens1
foo 4 0 ens1
foo 4 0 ens1
foo 4 0 ens1
foo 5 0 ens1
foo 5 0 ens1
foo 7 0 ens1
foo 9 0 ens1
fo1 1 3 ens2
fo1 4 2 ens2
fo3 2 2 ens3
fo3 3 2 ens3

i want to keep the 4th column geneid also with this output..
like

gene=foo
type=bn tot=40 avg=5 sd=2 geneid =ens1
type=sh tot=0 avg=0 sd=0 geneid =ens1

gene=fo1
type=bn tot=5 avg=2.5 sd=0.5 geneid=ens2
type=sh tot=4 avg=2 sd=0 geneid=ens2

gene=fo2
type=bn tot=5 avg=2.5 sd=1.5 geneid=ens3
type=sh tot=5 avg=2.5 sd=0.5 geneid=ens3

How i can include the geneid also with the output?
Sep 18 '09 #3
@toolic

Sorry there is mistake in the sd calculation. The value should be 2.14
can you please check this..
Sep 18 '09 #4
Sorry there is mistake in the sd calculation. The value should be 2.14
can you please check this..
Sep 18 '09 #5
Hi toolic,
I found the mistake in SD calculation.. I had assigned $count = @vals;
Then $sd = sqrt($sos/ ($count -1))
Now its correct the SD...
Thanks..
Sep 18 '09 #6
toolic
70 Expert
i want to keep the 4th column geneid also with this output..
like

gene=foo
type=bn tot=40 avg=5 sd=2 geneid =ens1
Assuming there is only one geneid per gene,
one way to add 'geneid' to your data structure is:
Expand|Select|Wrap|Line Numbers
  1. while (<DATA>) {
  2.     my ($gene, $bn, $sh, $geneid) = split;
  3.     push @{ $data{$gene}{bn} }, $bn;
  4.     push @{ $data{$gene}{sh} }, $sh;
  5.     $data{$gene}{sh}{geneid} = $geneid;
  6. }
  7.  
A handy reference for Perl data structures is:
perldsc

Sorry there is mistake in the sd calculation. The value should be 2.14
can you please check this..
The Wiki Standard_deviation Basic_example shows a result of 2. When I check this by hand, I get 2. When I run my Perl code, I get 2. Perhaps you are interested in a different formula from the wiki page. It's your code; you can use whatever formula you want.
Sep 18 '09 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

0
by: T.Venkatesh | last post by:
Hi friends, I just got stuck up in JDBC,JSP. My problem is iam unable to display the output of Standard deviation , variance sum, of a column in a JSP page. But it complies- but no display of...
4
by: muser | last post by:
Can anyone run this program through their compiler or if they can see a logical error please point it out. I have my tutor working on it at the moment but I would rather a less ambigious response...
12
by: theoderich | last post by:
Following is possible with gcc and g++: #include <math.h> double sin(double) { return 1; } int main()
3
by: Scott | last post by:
I know in some other languages there is a simple standard deviation function (sdev(1,2,3,4,5,etc...)). But I'm unable to find a suitable alternative for vb.net I'm assuming it's there...
2
by: Quentin | last post by:
How would I calculate standard deviation of a csv file? Any code examples would be great! Thank you!
10
by: Verbal Kint | last post by:
DEAR ALL, I just have a very short question. In the FAQ list question 13.20 the following text is mentioned: "These methods all generate numbers with mean 0 and standard deviation 1. (To adjust...
9
by: lajohn63 | last post by:
I'm trying to calculate a standard deviation for each record in a database...the record has 12 fields which contain consumption for each month of the year. I want to calculate the standard deviation...
0
by: aboxylica | last post by:
do v have a function to calculate variance, average and standard deviation in python??
6
by: kumarboston | last post by:
Hi All, I am trying to get an average value for my data, here is my data file DATA FILE EP1934.PDB 250 250 11.27 EP1934.PDB 251 251 12.7332 EP1934.PDB 252 252 6.38341 EP1934.PDB 253 253...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.