By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,178 Members | 987 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,178 IT Pros & Developers. It's quick & easy.

Counting in while loop splitting

P: 55
Hi All,
I have an output data from CHARMM program which I am trying to parse. So, there are three variable in my output program "HEAD, TAIL, WAT" on which I have to count the number of occurence each time and print the values.
I have attached the output data of the program and my script file.
Expand|Select|Wrap|Line Numbers
  1.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  2.  MEMB POPC 41   O3             1.0000
  3.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  4.  MEMB POPC 30   C21            2.0000
  5.  MEMB POPC 30   O22            3.0000
  6.  MEMB POPC 30   C22            1.0000
  7.  MEMB POPC 41   C3             3.0000
  8.  MEMB POPC 41   O31            2.0000
  9.  MEMB POPC 41   C31            2.0000
  10.  MEMB POPC 41   O32            3.0000
  11.  CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
  12.  TIP3 TIP3 257  OH2            4.0000
  13.  TIP3 TIP3 524  OH2            3.0000
  14.  TIP3 TIP3 3687 OH2            2.0000
  15.  TIP3 TIP3 3798 OH2            7.0000
  16.  TIP3 TIP3 4038 OH2            3.0000
  17.  TIP3 TIP3 5218 OH2            3.0000
  18.  TIP3 TIP3 7177 OH2            1.0000
  19.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  20.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  21.  MEMB POPC 30   C21            1.0000
  22.  MEMB POPC 30   O22            2.0000
  23.  MEMB POPC 30   C22            2.0000
  24.  MEMB POPC 41   C3             7.0000
  25.  MEMB POPC 41   O31            5.0000
  26.  MEMB POPC 41   C31            3.0000
  27.  MEMB POPC 41   O32            3.0000
  28.  MEMB POPC 41   C32            1.0000
  29.  CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
  30.  TIP3 TIP3 524  OH2            1.0000
  31.  TIP3 TIP3 2474 OH2            1.0000
  32.  TIP3 TIP3 3687 OH2            1.0000
  33.  TIP3 TIP3 3798 OH2            7.0000
  34.  TIP3 TIP3 4038 OH2            4.0000
  35.  TIP3 TIP3 5196 OH2            1.0000
  36.  TIP3 TIP3 5218 OH2            2.0000
  37.  TIP3 TIP3 7177 OH2            2.0000
  38.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  39.  MEMB POPC 41   O3             2.0000
  40.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  41.  MEMB POPC 30   C21            1.0000
  42.  MEMB POPC 30   O22            3.0000
  43.  MEMB POPC 30   C22            2.0000
  44.  MEMB POPC 41   C3             5.0000
  45.  MEMB POPC 41   O31            1.0000
  46.  MEMB POPC 41   C31            2.0000
  47.  MEMB POPC 41   O32            2.0000
  48.  
here is my perl code
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2. use strict;
  3. use warnings;
  4.  
  5. my $i = 0;
  6. my ($cnt1, $cnt2, $cnt3) = 0;
  7. my $temp = "temp.dat";
  8.  
  9. open (A,"<$temp");
  10. while(my $line = <A>)
  11. {
  12.     if($line=~/ MEMB\s+POPC\s+\S+\s+\S+\s+(\S+)/)
  13.     {
  14.         if($1 > 0)
  15.         {
  16.             $cnt1++;
  17.         }
  18.     }
  19.     elsif($line=~/ MEMB\s+POPC\s+\S+\s+\S+\s+(\S+)/)
  20.     {
  21.         if($1 > 0)
  22.         {
  23.             $cnt2++;
  24.         }
  25.     }
  26.     elsif($line=~/ TIP3\s+TIP3\s+\S+\s+\S+\s+(\S+)/)
  27.     {
  28.         if($1 > 0)
  29.         {
  30.             $cnt3++;
  31.         }
  32.     }
  33.     elsif($line=~/coor contact cut 4.5 sele HEAD/)
  34.     {
  35.         printf "%4d  %5d  %5d\n",$i,$cnt1,$cnt2,$cnt3;
  36.         $cnt1=0;$cnt2=0;$cnt3=0;
  37.         $i++;
  38.     }
  39.     else
  40.     {
  41.         next;
  42.     }
  43. }
  44. printf "%4d  %5d  %5d\n",$i,$cnt1,$cnt2;$cnt3;
  45.  
As you can see from the data and code, I am trying to parse the content of each group (HEAD, TAIL, WAT) and writing the data, the problem I am facing is not able to count for the HEAD and the TAIL portion of the data as the regular expression i am using is not correct. Any help on this will be appreciated.

Thanks
Feb 12 '10 #1
Share this Question
Share on Google+
10 Replies


Expert Mod 100+
P: 589
The regex's aren't the problem, it's the logic.

If you data is as consistent as the example, you could go this route, if not, then we'd need to make a minor change.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. use strict;
  4. use warnings;
  5. use Data::Dumper;
  6.  
  7. $/ = "\n CHARMM>";
  8.  
  9. my %count;
  10. while ( <DATA> ) {
  11.     chomp;
  12.     my ($head, @data) = split /\n/;
  13.     my ($key) = $head =~ /(HEAD|TAIL|WAT)/;
  14.     $count{$key}++ for @data;
  15. }
  16.  
  17. print Dumper \%count;
  18.  
  19.  
  20. __DATA__
  21. CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  22.  MEMB POPC 41   O3             1.0000
  23.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  24.  MEMB POPC 30   C21            2.0000
  25.  MEMB POPC 30   O22            3.0000
  26.  MEMB POPC 30   C22            1.0000
  27.  MEMB POPC 41   C3             3.0000
  28.  MEMB POPC 41   O31            2.0000
  29.  MEMB POPC 41   C31            2.0000
  30.  MEMB POPC 41   O32            3.0000
  31.  CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
  32.  TIP3 TIP3 257  OH2            4.0000
  33.  TIP3 TIP3 524  OH2            3.0000
  34.  TIP3 TIP3 3687 OH2            2.0000
  35.  TIP3 TIP3 3798 OH2            7.0000
  36.  TIP3 TIP3 4038 OH2            3.0000
  37.  TIP3 TIP3 5218 OH2            3.0000
  38.  TIP3 TIP3 7177 OH2            1.0000
  39.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  40.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  41.  MEMB POPC 30   C21            1.0000
  42.  MEMB POPC 30   O22            2.0000
  43.  MEMB POPC 30   C22            2.0000
  44.  MEMB POPC 41   C3             7.0000
  45.  MEMB POPC 41   O31            5.0000
  46.  MEMB POPC 41   C31            3.0000
  47.  MEMB POPC 41   O32            3.0000
  48.  MEMB POPC 41   C32            1.0000
  49.  CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
  50.  TIP3 TIP3 524  OH2            1.0000
  51.  TIP3 TIP3 2474 OH2            1.0000
  52.  TIP3 TIP3 3687 OH2            1.0000
  53.  TIP3 TIP3 3798 OH2            7.0000
  54.  TIP3 TIP3 4038 OH2            4.0000
  55.  TIP3 TIP3 5196 OH2            1.0000
  56.  TIP3 TIP3 5218 OH2            2.0000
  57.  TIP3 TIP3 7177 OH2            2.0000
  58.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  59.  MEMB POPC 41   O3             2.0000
  60.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  61.  MEMB POPC 30   C21            1.0000
  62.  MEMB POPC 30   O22            3.0000
  63.  MEMB POPC 30   C22            2.0000
  64.  MEMB POPC 41   C3             5.0000
  65.  MEMB POPC 41   O31            1.0000
  66.  MEMB POPC 41   C31            2.0000
  67.  MEMB POPC 41   O32            2.0000
Outputs:
Expand|Select|Wrap|Line Numbers
  1. $VAR1 = {
  2.           'TAIL' => 22,
  3.           'WAT' => 15,
  4.           'HEAD' => 2
  5.         };
Feb 12 '10 #2

P: 55
Thanks Ron for the reply,
I agree with you and tried your suggestion in my script. My data file is regular and I am trying to print the values of HEAD, TAIL and WAT each time the program encounters if in a set of three. So for example, the final processing output be like this: 3 column -> 1st one for HEAD, 2nd one for TAIL and 3rd one for WAT and the parsing should be done in the group of three, so for example when the program matches HEAD, TAIL and WAT first time then it will be row 1 and again matches then row number 2 and so on..., if there is no entry for any then it should print zero there.

I hope i am clear what i am stating above.
Thanks once again.
Feb 12 '10 #3

Expert Mod 100+
P: 589
The code I posted will do that. The only thing I left out was the 2 printf statements (the 1st one being inside a conditional block) and setting the initial values to 0.

Which part are you having trouble accomplishing?

The more I look at this, the more it looks like your homework assignment and not a real world problem that you need to solve.
Feb 12 '10 #4

P: 55
Hi RonB,
Thanks for the clarification. This is not an homework assinment, my output file from the CHARMM program is 4.8 GB and from that I am extracting the data for my research project.
I am having trouble in printing the values for each row of data, I tried to print iniside the "while loop" and it all printed in cummulative sum of the counts.

Thanks
Feb 12 '10 #5

Expert Mod 100+
P: 589
This assumes that 'WAT' is not the last group in the file, which is what your sample and code suggested.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. use strict;
  4. use warnings;
  5. use Data::Dumper;
  6.  
  7. $/ = "\n CHARMM>";
  8.  
  9. printf "%4s  %5s  %5s\n", 'HEAD', 'TAIL', 'WAT';
  10.  
  11. my %count;
  12. while ( <DATA> ) {
  13.     chomp;
  14.     my ($head, @data) = split /\n/;
  15.     my ($key) = $head =~ /(HEAD|TAIL|WAT)/;
  16.  
  17.     if ($key eq 'HEAD') {  # assign default value of 0 for each key
  18.  
  19.         # this can be done in the other if block,
  20.         # but I think it makes more sense here
  21.         $count{$_} = 0 for keys %count;
  22.     }
  23.     $count{$key}++ for @data;
  24.  
  25.     if ($key eq 'WAT') {
  26.         printf "%4d  %5d  %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
  27.     }
  28. }
  29. printf "%4d  %5d  %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
  30.  
  31.  
  32. __DATA__
  33. CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  34.  MEMB POPC 41   O3             1.0000
  35.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  36.  MEMB POPC 30   C21            2.0000
  37.  MEMB POPC 30   O22            3.0000
  38.  MEMB POPC 30   C22            1.0000
  39.  MEMB POPC 41   C3             3.0000
  40.  MEMB POPC 41   O31            2.0000
  41.  MEMB POPC 41   C31            2.0000
  42.  MEMB POPC 41   O32            3.0000
  43.  CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
  44.  TIP3 TIP3 257  OH2            4.0000
  45.  TIP3 TIP3 524  OH2            3.0000
  46.  TIP3 TIP3 3687 OH2            2.0000
  47.  TIP3 TIP3 3798 OH2            7.0000
  48.  TIP3 TIP3 4038 OH2            3.0000
  49.  TIP3 TIP3 5218 OH2            3.0000
  50.  TIP3 TIP3 7177 OH2            1.0000
  51.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  52.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  53.  MEMB POPC 30   C21            1.0000
  54.  MEMB POPC 30   O22            2.0000
  55.  MEMB POPC 30   C22            2.0000
  56.  MEMB POPC 41   C3             7.0000
  57.  MEMB POPC 41   O31            5.0000
  58.  MEMB POPC 41   C31            3.0000
  59.  MEMB POPC 41   O32            3.0000
  60.  MEMB POPC 41   C32            1.0000
  61.  CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
  62.  TIP3 TIP3 524  OH2            1.0000
  63.  TIP3 TIP3 2474 OH2            1.0000
  64.  TIP3 TIP3 3687 OH2            1.0000
  65.  TIP3 TIP3 3798 OH2            7.0000
  66.  TIP3 TIP3 4038 OH2            4.0000
  67.  TIP3 TIP3 5196 OH2            1.0000
  68.  TIP3 TIP3 5218 OH2            2.0000
  69.  TIP3 TIP3 7177 OH2            2.0000
  70.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  71.  MEMB POPC 41   O3             2.0000
  72.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  73.  MEMB POPC 30   C21            1.0000
  74.  MEMB POPC 30   O22            3.0000
  75.  MEMB POPC 30   C22            2.0000
  76.  MEMB POPC 41   C3             5.0000
  77.  MEMB POPC 41   O31            1.0000
  78.  MEMB POPC 41   C31            2.0000
  79.  MEMB POPC 41   O32            2.0000
Outputs:
Expand|Select|Wrap|Line Numbers
  1. C:\TEMP>kumarboston.pl
  2. HEAD   TAIL    WAT
  3.    1      7      7
  4.    0      8      8
  5.    1      7      0
Feb 12 '10 #6

P: 55
Hi RonB,
I tried to run the script using your suggestions but somehow it is printing all zero values.
I have attached the data file, and the script file also. The data file wil always be in a group of three, HEAD, TAIL,and WAT.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. use strict;
  4. use warnings;
  5. use Data::Dumper;
  6.  
  7. open (DATA, "file.txt");
  8.  
  9.  printf "%4s  %5s  %5s\n", 'HEAD', 'TAIL', 'WAT';
  10.  
  11.  my %count;
  12.  while ( <DATA> ) {
  13.      chomp;
  14.      my ($head, @data) = split /\n/;
  15.      my ($key) = $head =~ /(HEAD|TAIL|WAT)/;
  16.  
  17.      if ($key eq 'HEAD') {  # assign default value of 0 for each key this can be done in the other if block, but I think it makes more sense here
  18.          $count{$_} = 0 for keys %count;
  19.      }
  20.      $count{$key}++ for @data;
  21.  
  22.      if ($key eq 'WAT') {
  23.          printf "%4d  %5d  %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
  24.      }
  25.  }
  26.  printf "%4d  %5d  %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
  27.  
Thanks
Attached Files
File Type: txt file.txt (9.5 KB, 425 views)
Feb 12 '10 #7

Expert Mod 100+
P: 589
Compare the code you posted to what I posted and you'll see that you're missing a very important line.

DATA is one of Perl's special built-in filehandles and it's best not to use it when opening your file.

Use a lexical var for the filehandle and use the 3 arg form of open.

When opening a filehandle, you should always check the return code to verify that it was successful and take proper action if it failed.

Expand|Select|Wrap|Line Numbers
  1. my $file = "file.txt";
  2. open my $data_FH, '<', $file or die "failed to open <$file> $!";
The file you attached has a 'WAT' section as its last block. Will that always be true in your actual complete data file? As it is right now, the last 2 rows of output will be duplicates.
Feb 12 '10 #8

P: 55
Yes, the last section will always be having WAT section whether the data is there or not."
Thanks
Feb 12 '10 #9

Expert Mod 100+
P: 589
In that case, remove that last printf statement, it's redundant.
Feb 12 '10 #10

Expert Mod 100+
P: 589
After thinking about it, the assignment of the default hash values should be moved into the 'WAT' if block, or we need to define the 3 hash keys prior to the while loop.
Feb 12 '10 #11

Post your reply

Sign in to post your reply or Sign up for a free account.