473,320 Members | 1,838 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Counting in while loop splitting

Hi All,
I have an output data from CHARMM program which I am trying to parse. So, there are three variable in my output program "HEAD, TAIL, WAT" on which I have to count the number of occurence each time and print the values.
I have attached the output data of the program and my script file.
Expand|Select|Wrap|Line Numbers
  1.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  2.  MEMB POPC 41   O3             1.0000
  3.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  4.  MEMB POPC 30   C21            2.0000
  5.  MEMB POPC 30   O22            3.0000
  6.  MEMB POPC 30   C22            1.0000
  7.  MEMB POPC 41   C3             3.0000
  8.  MEMB POPC 41   O31            2.0000
  9.  MEMB POPC 41   C31            2.0000
  10.  MEMB POPC 41   O32            3.0000
  11.  CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
  12.  TIP3 TIP3 257  OH2            4.0000
  13.  TIP3 TIP3 524  OH2            3.0000
  14.  TIP3 TIP3 3687 OH2            2.0000
  15.  TIP3 TIP3 3798 OH2            7.0000
  16.  TIP3 TIP3 4038 OH2            3.0000
  17.  TIP3 TIP3 5218 OH2            3.0000
  18.  TIP3 TIP3 7177 OH2            1.0000
  19.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  20.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  21.  MEMB POPC 30   C21            1.0000
  22.  MEMB POPC 30   O22            2.0000
  23.  MEMB POPC 30   C22            2.0000
  24.  MEMB POPC 41   C3             7.0000
  25.  MEMB POPC 41   O31            5.0000
  26.  MEMB POPC 41   C31            3.0000
  27.  MEMB POPC 41   O32            3.0000
  28.  MEMB POPC 41   C32            1.0000
  29.  CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
  30.  TIP3 TIP3 524  OH2            1.0000
  31.  TIP3 TIP3 2474 OH2            1.0000
  32.  TIP3 TIP3 3687 OH2            1.0000
  33.  TIP3 TIP3 3798 OH2            7.0000
  34.  TIP3 TIP3 4038 OH2            4.0000
  35.  TIP3 TIP3 5196 OH2            1.0000
  36.  TIP3 TIP3 5218 OH2            2.0000
  37.  TIP3 TIP3 7177 OH2            2.0000
  38.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  39.  MEMB POPC 41   O3             2.0000
  40.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  41.  MEMB POPC 30   C21            1.0000
  42.  MEMB POPC 30   O22            3.0000
  43.  MEMB POPC 30   C22            2.0000
  44.  MEMB POPC 41   C3             5.0000
  45.  MEMB POPC 41   O31            1.0000
  46.  MEMB POPC 41   C31            2.0000
  47.  MEMB POPC 41   O32            2.0000
  48.  
here is my perl code
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2. use strict;
  3. use warnings;
  4.  
  5. my $i = 0;
  6. my ($cnt1, $cnt2, $cnt3) = 0;
  7. my $temp = "temp.dat";
  8.  
  9. open (A,"<$temp");
  10. while(my $line = <A>)
  11. {
  12.     if($line=~/ MEMB\s+POPC\s+\S+\s+\S+\s+(\S+)/)
  13.     {
  14.         if($1 > 0)
  15.         {
  16.             $cnt1++;
  17.         }
  18.     }
  19.     elsif($line=~/ MEMB\s+POPC\s+\S+\s+\S+\s+(\S+)/)
  20.     {
  21.         if($1 > 0)
  22.         {
  23.             $cnt2++;
  24.         }
  25.     }
  26.     elsif($line=~/ TIP3\s+TIP3\s+\S+\s+\S+\s+(\S+)/)
  27.     {
  28.         if($1 > 0)
  29.         {
  30.             $cnt3++;
  31.         }
  32.     }
  33.     elsif($line=~/coor contact cut 4.5 sele HEAD/)
  34.     {
  35.         printf "%4d  %5d  %5d\n",$i,$cnt1,$cnt2,$cnt3;
  36.         $cnt1=0;$cnt2=0;$cnt3=0;
  37.         $i++;
  38.     }
  39.     else
  40.     {
  41.         next;
  42.     }
  43. }
  44. printf "%4d  %5d  %5d\n",$i,$cnt1,$cnt2;$cnt3;
  45.  
As you can see from the data and code, I am trying to parse the content of each group (HEAD, TAIL, WAT) and writing the data, the problem I am facing is not able to count for the HEAD and the TAIL portion of the data as the regular expression i am using is not correct. Any help on this will be appreciated.

Thanks
Feb 12 '10 #1
10 2138
RonB
589 Expert Mod 512MB
The regex's aren't the problem, it's the logic.

If you data is as consistent as the example, you could go this route, if not, then we'd need to make a minor change.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. use strict;
  4. use warnings;
  5. use Data::Dumper;
  6.  
  7. $/ = "\n CHARMM>";
  8.  
  9. my %count;
  10. while ( <DATA> ) {
  11.     chomp;
  12.     my ($head, @data) = split /\n/;
  13.     my ($key) = $head =~ /(HEAD|TAIL|WAT)/;
  14.     $count{$key}++ for @data;
  15. }
  16.  
  17. print Dumper \%count;
  18.  
  19.  
  20. __DATA__
  21. CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  22.  MEMB POPC 41   O3             1.0000
  23.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  24.  MEMB POPC 30   C21            2.0000
  25.  MEMB POPC 30   O22            3.0000
  26.  MEMB POPC 30   C22            1.0000
  27.  MEMB POPC 41   C3             3.0000
  28.  MEMB POPC 41   O31            2.0000
  29.  MEMB POPC 41   C31            2.0000
  30.  MEMB POPC 41   O32            3.0000
  31.  CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
  32.  TIP3 TIP3 257  OH2            4.0000
  33.  TIP3 TIP3 524  OH2            3.0000
  34.  TIP3 TIP3 3687 OH2            2.0000
  35.  TIP3 TIP3 3798 OH2            7.0000
  36.  TIP3 TIP3 4038 OH2            3.0000
  37.  TIP3 TIP3 5218 OH2            3.0000
  38.  TIP3 TIP3 7177 OH2            1.0000
  39.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  40.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  41.  MEMB POPC 30   C21            1.0000
  42.  MEMB POPC 30   O22            2.0000
  43.  MEMB POPC 30   C22            2.0000
  44.  MEMB POPC 41   C3             7.0000
  45.  MEMB POPC 41   O31            5.0000
  46.  MEMB POPC 41   C31            3.0000
  47.  MEMB POPC 41   O32            3.0000
  48.  MEMB POPC 41   C32            1.0000
  49.  CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
  50.  TIP3 TIP3 524  OH2            1.0000
  51.  TIP3 TIP3 2474 OH2            1.0000
  52.  TIP3 TIP3 3687 OH2            1.0000
  53.  TIP3 TIP3 3798 OH2            7.0000
  54.  TIP3 TIP3 4038 OH2            4.0000
  55.  TIP3 TIP3 5196 OH2            1.0000
  56.  TIP3 TIP3 5218 OH2            2.0000
  57.  TIP3 TIP3 7177 OH2            2.0000
  58.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  59.  MEMB POPC 41   O3             2.0000
  60.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  61.  MEMB POPC 30   C21            1.0000
  62.  MEMB POPC 30   O22            3.0000
  63.  MEMB POPC 30   C22            2.0000
  64.  MEMB POPC 41   C3             5.0000
  65.  MEMB POPC 41   O31            1.0000
  66.  MEMB POPC 41   C31            2.0000
  67.  MEMB POPC 41   O32            2.0000
Outputs:
Expand|Select|Wrap|Line Numbers
  1. $VAR1 = {
  2.           'TAIL' => 22,
  3.           'WAT' => 15,
  4.           'HEAD' => 2
  5.         };
Feb 12 '10 #2
Thanks Ron for the reply,
I agree with you and tried your suggestion in my script. My data file is regular and I am trying to print the values of HEAD, TAIL and WAT each time the program encounters if in a set of three. So for example, the final processing output be like this: 3 column -> 1st one for HEAD, 2nd one for TAIL and 3rd one for WAT and the parsing should be done in the group of three, so for example when the program matches HEAD, TAIL and WAT first time then it will be row 1 and again matches then row number 2 and so on..., if there is no entry for any then it should print zero there.

I hope i am clear what i am stating above.
Thanks once again.
Feb 12 '10 #3
RonB
589 Expert Mod 512MB
The code I posted will do that. The only thing I left out was the 2 printf statements (the 1st one being inside a conditional block) and setting the initial values to 0.

Which part are you having trouble accomplishing?

The more I look at this, the more it looks like your homework assignment and not a real world problem that you need to solve.
Feb 12 '10 #4
Hi RonB,
Thanks for the clarification. This is not an homework assinment, my output file from the CHARMM program is 4.8 GB and from that I am extracting the data for my research project.
I am having trouble in printing the values for each row of data, I tried to print iniside the "while loop" and it all printed in cummulative sum of the counts.

Thanks
Feb 12 '10 #5
RonB
589 Expert Mod 512MB
This assumes that 'WAT' is not the last group in the file, which is what your sample and code suggested.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. use strict;
  4. use warnings;
  5. use Data::Dumper;
  6.  
  7. $/ = "\n CHARMM>";
  8.  
  9. printf "%4s  %5s  %5s\n", 'HEAD', 'TAIL', 'WAT';
  10.  
  11. my %count;
  12. while ( <DATA> ) {
  13.     chomp;
  14.     my ($head, @data) = split /\n/;
  15.     my ($key) = $head =~ /(HEAD|TAIL|WAT)/;
  16.  
  17.     if ($key eq 'HEAD') {  # assign default value of 0 for each key
  18.  
  19.         # this can be done in the other if block,
  20.         # but I think it makes more sense here
  21.         $count{$_} = 0 for keys %count;
  22.     }
  23.     $count{$key}++ for @data;
  24.  
  25.     if ($key eq 'WAT') {
  26.         printf "%4d  %5d  %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
  27.     }
  28. }
  29. printf "%4d  %5d  %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
  30.  
  31.  
  32. __DATA__
  33. CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  34.  MEMB POPC 41   O3             1.0000
  35.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  36.  MEMB POPC 30   C21            2.0000
  37.  MEMB POPC 30   O22            3.0000
  38.  MEMB POPC 30   C22            1.0000
  39.  MEMB POPC 41   C3             3.0000
  40.  MEMB POPC 41   O31            2.0000
  41.  MEMB POPC 41   C31            2.0000
  42.  MEMB POPC 41   O32            3.0000
  43.  CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
  44.  TIP3 TIP3 257  OH2            4.0000
  45.  TIP3 TIP3 524  OH2            3.0000
  46.  TIP3 TIP3 3687 OH2            2.0000
  47.  TIP3 TIP3 3798 OH2            7.0000
  48.  TIP3 TIP3 4038 OH2            3.0000
  49.  TIP3 TIP3 5218 OH2            3.0000
  50.  TIP3 TIP3 7177 OH2            1.0000
  51.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  52.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  53.  MEMB POPC 30   C21            1.0000
  54.  MEMB POPC 30   O22            2.0000
  55.  MEMB POPC 30   C22            2.0000
  56.  MEMB POPC 41   C3             7.0000
  57.  MEMB POPC 41   O31            5.0000
  58.  MEMB POPC 41   C31            3.0000
  59.  MEMB POPC 41   O32            3.0000
  60.  MEMB POPC 41   C32            1.0000
  61.  CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
  62.  TIP3 TIP3 524  OH2            1.0000
  63.  TIP3 TIP3 2474 OH2            1.0000
  64.  TIP3 TIP3 3687 OH2            1.0000
  65.  TIP3 TIP3 3798 OH2            7.0000
  66.  TIP3 TIP3 4038 OH2            4.0000
  67.  TIP3 TIP3 5196 OH2            1.0000
  68.  TIP3 TIP3 5218 OH2            2.0000
  69.  TIP3 TIP3 7177 OH2            2.0000
  70.  CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
  71.  MEMB POPC 41   O3             2.0000
  72.  CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
  73.  MEMB POPC 30   C21            1.0000
  74.  MEMB POPC 30   O22            3.0000
  75.  MEMB POPC 30   C22            2.0000
  76.  MEMB POPC 41   C3             5.0000
  77.  MEMB POPC 41   O31            1.0000
  78.  MEMB POPC 41   C31            2.0000
  79.  MEMB POPC 41   O32            2.0000
Outputs:
Expand|Select|Wrap|Line Numbers
  1. C:\TEMP>kumarboston.pl
  2. HEAD   TAIL    WAT
  3.    1      7      7
  4.    0      8      8
  5.    1      7      0
Feb 12 '10 #6
Hi RonB,
I tried to run the script using your suggestions but somehow it is printing all zero values.
I have attached the data file, and the script file also. The data file wil always be in a group of three, HEAD, TAIL,and WAT.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. use strict;
  4. use warnings;
  5. use Data::Dumper;
  6.  
  7. open (DATA, "file.txt");
  8.  
  9.  printf "%4s  %5s  %5s\n", 'HEAD', 'TAIL', 'WAT';
  10.  
  11.  my %count;
  12.  while ( <DATA> ) {
  13.      chomp;
  14.      my ($head, @data) = split /\n/;
  15.      my ($key) = $head =~ /(HEAD|TAIL|WAT)/;
  16.  
  17.      if ($key eq 'HEAD') {  # assign default value of 0 for each key this can be done in the other if block, but I think it makes more sense here
  18.          $count{$_} = 0 for keys %count;
  19.      }
  20.      $count{$key}++ for @data;
  21.  
  22.      if ($key eq 'WAT') {
  23.          printf "%4d  %5d  %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
  24.      }
  25.  }
  26.  printf "%4d  %5d  %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
  27.  
Thanks
Attached Files
File Type: txt file.txt (9.5 KB, 489 views)
Feb 12 '10 #7
RonB
589 Expert Mod 512MB
Compare the code you posted to what I posted and you'll see that you're missing a very important line.

DATA is one of Perl's special built-in filehandles and it's best not to use it when opening your file.

Use a lexical var for the filehandle and use the 3 arg form of open.

When opening a filehandle, you should always check the return code to verify that it was successful and take proper action if it failed.

Expand|Select|Wrap|Line Numbers
  1. my $file = "file.txt";
  2. open my $data_FH, '<', $file or die "failed to open <$file> $!";
The file you attached has a 'WAT' section as its last block. Will that always be true in your actual complete data file? As it is right now, the last 2 rows of output will be duplicates.
Feb 12 '10 #8
Yes, the last section will always be having WAT section whether the data is there or not."
Thanks
Feb 12 '10 #9
RonB
589 Expert Mod 512MB
In that case, remove that last printf statement, it's redundant.
Feb 12 '10 #10
RonB
589 Expert Mod 512MB
After thinking about it, the assignment of the default hash values should be moved into the 'WAT' if block, or we need to define the 3 hash keys prior to the while loop.
Feb 12 '10 #11

Sign in to post your reply or Sign up for a free account.

Similar topics

7
by: sathyashrayan | last post by:
Group, Following function will check weather a bit is set in the given variouble x. int bit_count(long x) { int n = 0; /* ** The loop will execute once for each bit of x set,
7
by: zets | last post by:
I need a macro for counting the bits in the odd positions of a given input (of any type, char, pointer, int, struct, whatever). Is there any clever way I could not think of, to do it efficiently? ...
5
by: andy.lee23 | last post by:
hi im having trouble counting lines in a text file, i have the following code int node1, node2, i; char name; float value; ifstream fin; fin.open(OpenDialog1->FileName.c_str()); i=1;
13
by: eiaks | last post by:
Hello, I want to print a table of characters and their values for my system like 65: A 66: B aso. starting from 0 to 255. Am I rigth that I should use an unsigned char for this,...
1
by: Sorin Schwimmer | last post by:
to Fredrik Lundh I'm afraid Claudio Grondi can't use your solution, as he needs it hosted on Windows, which lacks signal.alarm. to Claudio Grondi How about splitting your loop in two? The...
6
by: Matt Chwastek | last post by:
Anyone who can help, I am curretnly attempting to write some code that will allow iteration using a vector<intfrom the highest possilbe degree of a combination of ones & zeros (111, 110, 101,...
5
by: sololoquist | last post by:
#define COUNT_UP #include <stdio.h> #define N 10 int main() { int i; #ifdef COUNT_UP for (i = 0; i < N; i++)
7
by: peraklo | last post by:
Hello, there is another problem i am facing. i have a text file which is about 15000 lines big. i have to cut the last 27 lines from that file and create a new text file that contans those 27...
6
by: Shawn Minisall | last post by:
I've been having some problems with using a while statement for one menu within another while statement for the main menu, first time I've done it. It's with choice number two from the menu. When...
4
by: frizzle | last post by:
Hi there, I have a function to create an array of all files in a certain folder, so i can display the structure. The actual function is below the message, as is an example of its output. As...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.