|
Hi All,
I am trying to get an average value for my data, here is my data file
where the first column is pdb id , second, third is residue position and fourth is distance.
What i am trying to do is to calculate the average value for each residue position and calulate standard deviation(SD).
For example: for residue position 250, program should select and calculate all the average values for distande at residue number 250 and then calculate SD.
and finaly print the residue number, average value and SD.
I have written a code but its not able to select the specified residue and do the calculations. -
#!/usr/bin/perl
-
use strict;
-
use warnings;
-
-
my (%hash,$respos1,$respos2,$dist,$val,$line,@temp);
-
my ($count,$dis) = 0;
-
-
-
open (FH,"caca.dat") or die "Check the file";
-
while (<FH>)
-
{
-
$line = $_;
-
chomp $_;
-
@temp = split (/\s/,$line);
-
$respos1 = $temp[1];
-
$respos2 = $temp[2];
-
$dist = $temp[3];
-
$hash{$respos1} = $dist;
-
}
-
-
for ($respos1=250;$respos1<=274;$respos1++)
-
{
-
if ($respos1 == $respos2)
-
{
-
$dis = $dis + $dist;
-
$count++;
-
}
-
}
-
-
Since the average value is not calculating correctly, I have not tried the SD part.
Any directions will be helpful.
Thanks
Kumar
| |
Share:
Expert 2GB |
Not sure if I got this correct but it should help or can be fixed easy enough (I think). -
#!/usr/bin/perl
-
use strict;
-
use warnings;
-
-
my %hash;
-
-
# ID r1 r2 sd
-
#EP1935.PDB 267 267 9.48097
-
-
open (FH,"caca.dat") or die "Check the file";
-
while (my $line = <FH>){
-
chomp $line;
-
my ($r1,$sd) = (split (/\s/,$line))[1,3];
-
$hash{$r1}{'sd'} += $sd;
-
$hash{$r1}{'divisor'}++;
-
}
-
close FH;
-
foreach my $key (sort {$a <=> $b} keys %hash) {
-
my $avg = sprintf "%.3f" , $hash{$key}{'sd'} / $hash{$key}{'divisor'};
-
print "The average SD for $key is $avg\n";
-
}
-
| | |
Hi All,
Thanks for the reply, I tried to calculate the average and the SD but something is wrong I am not sure.
here is my code -
#!/usr/bin/perl
-
-
use strict;
-
use warnings;
-
-
my (@pos,@cadist,$mean,%hash,$respos1,$respos2,$cadist,$val,$line,@temp,@respos2);
-
my ($cnt,$dis,$sum) = 0;
-
-
open (FH,"caca.dat") or die "Check the file";
-
while (<FH>)
-
{
-
$line = $_;chomp $_;
-
@temp = split (/\s/,$line);
-
$respos1 = $temp[1];
-
$respos2 = $temp[2];
-
$cadist = $temp[3];
-
for(my $i=250;$i<=274;$i++)
-
{
-
if ($i == $respos2)
-
{
-
push (@cadist,$cadist);
-
push (@pos,$respos2);
-
$sum +=$cadist;
-
$cnt++;
-
}
-
}
-
}
-
-
$mean = $sum/$cnt;
-
@cadist = ();
-
-
my $summ = 0;
-
my $deviation;
-
-
foreach my $val(@cadist)
-
{
-
my $abar = (($val-$mean)**2);
-
$summ += $abar;
-
}
-
$deviation = sprintf "%.5f",sqrt($summ/($cnt-1));
-
print "$pos[0] $mean $deviation\n";
-
If i remove the for loop and in the if statement simply put some value for comparision everything works fine, but when I put the condition for every residue position then calculation goes wrong.
Thanks
Kumar
| | Expert 2GB |
Well, if everything else is correct in your code, this line needs to be removed:
@cadist = (); (line 30)
That deletes the array of any values it had previously.
| | |
Thanks for the reply,
I finally succeded in calculating the values but one thing still remains, due to for loop in the code, all the values are printed repeatedly till the end, which makes it redundant,
I am posting the code which runs on the data file, which i posted earlier and one can see the results after running the code. -
#!/usr/bin/perl
-
-
use strict;
-
use warnings;
-
-
my ($line,@temp,$respos1,$respos2,@respos1,@respos2,$cadis,@lstdist,$i,$j,@cadist,@result,$length,$ele,$val,$abar,$deviation);
-
my ($cnt,$dis,$sum,$summ,$mean) = 0;
-
-
open (FH,"caca.dat") or die "Check the file";
-
while (<FH>)
-
{
-
$line = $_;chomp $_;
-
@temp = split (/\s/,$line);
-
$respos1 = $temp[1];
-
$respos2 = $temp[2];
-
$cadis = $temp[3];
-
push(@respos1,$respos1);push(@respos2,$respos2);push(@cadist,$cadis);
-
}
-
-
for($i=0;$i<@respos1;$i++)
-
{
-
@lstdist=();
-
for($j=0;$j<@respos2;$j++)
-
{
-
if ($respos1[$i] == $respos2[$j])
-
{
-
push (@lstdist,$cadist[$j]);
-
}
-
}
-
@result=&mean(@lstdist);
-
print "$respos1[$i]\t$result[0]\t$result[1]\n";
-
}
-
sub mean
-
{
-
(@lstdist)=@_;
-
$length=scalar(@lstdist);
-
$sum=0;$mean=0;$summ=0;
-
foreach $ele(@lstdist)
-
{
-
$sum +=$ele;
-
}
-
$mean=$sum/$length;
-
foreach $val(@lstdist)
-
{
-
$abar=0;
-
$abar = (($val-$mean)**2);
-
$summ += $abar;
-
}
-
$deviation = sqrt($summ/($length-1));
-
return($mean,$deviation);
-
}
-
How I can print the values only once for each residue number.
Thanks
Kumar
| | Expert 256MB |
Thanks for the reply,
I finally succeded in calculating the values but one thing still remains, due to for loop in the code, all the values are printed repeatedly till the end, which makes it redundant,
I am posting the code which runs on the data file, which i posted earlier and one can see the results after running the code. -
#!/usr/bin/perl
-
-
use strict;
-
use warnings;
-
-
my ($line,@temp,$respos1,$respos2,@respos1,@respos2,$cadis,@lstdist,$i,$j,@cadist,@result,$length,$ele,$val,$abar,$deviation);
-
my ($cnt,$dis,$sum,$summ,$mean) = 0;
-
-
open (FH,"caca.dat") or die "Check the file";
-
while (<FH>)
-
{
-
$line = $_;chomp $_;
-
@temp = split (/\s/,$line);
-
$respos1 = $temp[1];
-
$respos2 = $temp[2];
-
$cadis = $temp[3];
-
push(@respos1,$respos1);push(@respos2,$respos2);push(@cadist,$cadis);
-
}
-
-
for($i=0;$i<@respos1;$i++)
-
{
-
@lstdist=();
-
for($j=0;$j<@respos2;$j++)
-
{
-
if ($respos1[$i] == $respos2[$j])
-
{
-
push (@lstdist,$cadist[$j]);
-
}
-
}
-
@result=&mean(@lstdist);
-
print "$respos1[$i]\t$result[0]\t$result[1]\n";
-
}
-
sub mean
-
{
-
(@lstdist)=@_;
-
$length=scalar(@lstdist);
-
$sum=0;$mean=0;$summ=0;
-
foreach $ele(@lstdist)
-
{
-
$sum +=$ele;
-
}
-
$mean=$sum/$length;
-
foreach $val(@lstdist)
-
{
-
$abar=0;
-
$abar = (($val-$mean)**2);
-
$summ += $abar;
-
}
-
$deviation = sqrt($summ/($length-1));
-
return($mean,$deviation);
-
}
-
How I can print the values only once for each residue number.
Thanks
Kumar
You may store the result in hash of array, and print it outside the loop to avoid duplicate results: -
my %result=(); # result hash
-
for($i=0;$i<@respos1;$i++)
-
{
-
@lstdist=();
-
for($j=0;$j<@respos2;$j++)
-
{
-
if ($respos1[$i] == $respos2[$j])
-
{
-
push(@lstdist,$cadist[$j]);
-
}
-
}
-
@result=&mean(@lstdist);
-
$result{$respos1[$i]} = [$result[0],$result[1]]; # create hash of arrays
-
}
-
-
foreach(sort keys %result) {
-
print "$_\t$result{$_}[0]\t$result{$_}[1]\n"; #display result
-
}
-
| | |
Thanks Nithinpes and All, for suggestions and now the program works perfectly fine.
Thanks
Kumar
| | Post your reply Sign in to post your reply or Sign up for a free account.
Similar topics
reply
views
Thread by T.Venkatesh |
last post: by
|
3 posts
views
Thread by mat |
last post: by
|
12 posts
views
Thread by theoderich |
last post: by
|
3 posts
views
Thread by Scott |
last post: by
| |
10 posts
views
Thread by Verbal Kint |
last post: by
| | | | | | | | | | | | | |