Hi All,
I have an output data from CHARMM program which I am trying to parse. So, there are three variable in my output program "HEAD, TAIL, WAT" on which I have to count the number of occurence each time and print the values.
I have attached the output data of the program and my script file. -
CHARMM> coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
-
MEMB POPC 41 O3 1.0000
-
CHARMM> coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
-
MEMB POPC 30 C21 2.0000
-
MEMB POPC 30 O22 3.0000
-
MEMB POPC 30 C22 1.0000
-
MEMB POPC 41 C3 3.0000
-
MEMB POPC 41 O31 2.0000
-
MEMB POPC 41 C31 2.0000
-
MEMB POPC 41 O32 3.0000
-
CHARMM> coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
-
TIP3 TIP3 257 OH2 4.0000
-
TIP3 TIP3 524 OH2 3.0000
-
TIP3 TIP3 3687 OH2 2.0000
-
TIP3 TIP3 3798 OH2 7.0000
-
TIP3 TIP3 4038 OH2 3.0000
-
TIP3 TIP3 5218 OH2 3.0000
-
TIP3 TIP3 7177 OH2 1.0000
-
CHARMM> coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
-
CHARMM> coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
-
MEMB POPC 30 C21 1.0000
-
MEMB POPC 30 O22 2.0000
-
MEMB POPC 30 C22 2.0000
-
MEMB POPC 41 C3 7.0000
-
MEMB POPC 41 O31 5.0000
-
MEMB POPC 41 C31 3.0000
-
MEMB POPC 41 O32 3.0000
-
MEMB POPC 41 C32 1.0000
-
CHARMM> coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
-
TIP3 TIP3 524 OH2 1.0000
-
TIP3 TIP3 2474 OH2 1.0000
-
TIP3 TIP3 3687 OH2 1.0000
-
TIP3 TIP3 3798 OH2 7.0000
-
TIP3 TIP3 4038 OH2 4.0000
-
TIP3 TIP3 5196 OH2 1.0000
-
TIP3 TIP3 5218 OH2 2.0000
-
TIP3 TIP3 7177 OH2 2.0000
-
CHARMM> coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
-
MEMB POPC 41 O3 2.0000
-
CHARMM> coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
-
MEMB POPC 30 C21 1.0000
-
MEMB POPC 30 O22 3.0000
-
MEMB POPC 30 C22 2.0000
-
MEMB POPC 41 C3 5.0000
-
MEMB POPC 41 O31 1.0000
-
MEMB POPC 41 C31 2.0000
-
MEMB POPC 41 O32 2.0000
-
here is my perl code -
#!/usr/bin/perl
-
use strict;
-
use warnings;
-
-
my $i = 0;
-
my ($cnt1, $cnt2, $cnt3) = 0;
-
my $temp = "temp.dat";
-
-
open (A,"<$temp");
-
while(my $line = <A>)
-
{
-
if($line=~/ MEMB\s+POPC\s+\S+\s+\S+\s+(\S+)/)
-
{
-
if($1 > 0)
-
{
-
$cnt1++;
-
}
-
}
-
elsif($line=~/ MEMB\s+POPC\s+\S+\s+\S+\s+(\S+)/)
-
{
-
if($1 > 0)
-
{
-
$cnt2++;
-
}
-
}
-
elsif($line=~/ TIP3\s+TIP3\s+\S+\s+\S+\s+(\S+)/)
-
{
-
if($1 > 0)
-
{
-
$cnt3++;
-
}
-
}
-
elsif($line=~/coor contact cut 4.5 sele HEAD/)
-
{
-
printf "%4d %5d %5d\n",$i,$cnt1,$cnt2,$cnt3;
-
$cnt1=0;$cnt2=0;$cnt3=0;
-
$i++;
-
}
-
else
-
{
-
next;
-
}
-
}
-
printf "%4d %5d %5d\n",$i,$cnt1,$cnt2;$cnt3;
-
As you can see from the data and code, I am trying to parse the content of each group (HEAD, TAIL, WAT) and writing the data, the problem I am facing is not able to count for the HEAD and the TAIL portion of the data as the regular expression i am using is not correct. Any help on this will be appreciated.
Thanks
10 2138 RonB 589
Expert Mod 512MB
The regex's aren't the problem, it's the logic.
If you data is as consistent as the example, you could go this route, if not, then we'd need to make a minor change. - #!/usr/bin/perl
-
-
use strict;
-
use warnings;
-
use Data::Dumper;
-
-
$/ = "\n CHARMM>";
-
-
my %count;
-
while ( <DATA> ) {
-
chomp;
-
my ($head, @data) = split /\n/;
-
my ($key) = $head =~ /(HEAD|TAIL|WAT)/;
-
$count{$key}++ for @data;
-
}
-
-
print Dumper \%count;
-
-
-
__DATA__
-
CHARMM> coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
-
MEMB POPC 41 O3 1.0000
-
CHARMM> coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
-
MEMB POPC 30 C21 2.0000
-
MEMB POPC 30 O22 3.0000
-
MEMB POPC 30 C22 1.0000
-
MEMB POPC 41 C3 3.0000
-
MEMB POPC 41 O31 2.0000
-
MEMB POPC 41 C31 2.0000
-
MEMB POPC 41 O32 3.0000
-
CHARMM> coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
-
TIP3 TIP3 257 OH2 4.0000
-
TIP3 TIP3 524 OH2 3.0000
-
TIP3 TIP3 3687 OH2 2.0000
-
TIP3 TIP3 3798 OH2 7.0000
-
TIP3 TIP3 4038 OH2 3.0000
-
TIP3 TIP3 5218 OH2 3.0000
-
TIP3 TIP3 7177 OH2 1.0000
-
CHARMM> coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
-
CHARMM> coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
-
MEMB POPC 30 C21 1.0000
-
MEMB POPC 30 O22 2.0000
-
MEMB POPC 30 C22 2.0000
-
MEMB POPC 41 C3 7.0000
-
MEMB POPC 41 O31 5.0000
-
MEMB POPC 41 C31 3.0000
-
MEMB POPC 41 O32 3.0000
-
MEMB POPC 41 C32 1.0000
-
CHARMM> coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
-
TIP3 TIP3 524 OH2 1.0000
-
TIP3 TIP3 2474 OH2 1.0000
-
TIP3 TIP3 3687 OH2 1.0000
-
TIP3 TIP3 3798 OH2 7.0000
-
TIP3 TIP3 4038 OH2 4.0000
-
TIP3 TIP3 5196 OH2 1.0000
-
TIP3 TIP3 5218 OH2 2.0000
-
TIP3 TIP3 7177 OH2 2.0000
-
CHARMM> coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
-
MEMB POPC 41 O3 2.0000
-
CHARMM> coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
-
MEMB POPC 30 C21 1.0000
-
MEMB POPC 30 O22 3.0000
-
MEMB POPC 30 C22 2.0000
-
MEMB POPC 41 C3 5.0000
-
MEMB POPC 41 O31 1.0000
-
MEMB POPC 41 C31 2.0000
-
MEMB POPC 41 O32 2.0000
Outputs: - $VAR1 = {
-
'TAIL' => 22,
-
'WAT' => 15,
-
'HEAD' => 2
-
};
Thanks Ron for the reply,
I agree with you and tried your suggestion in my script. My data file is regular and I am trying to print the values of HEAD, TAIL and WAT each time the program encounters if in a set of three. So for example, the final processing output be like this: 3 column -> 1st one for HEAD, 2nd one for TAIL and 3rd one for WAT and the parsing should be done in the group of three, so for example when the program matches HEAD, TAIL and WAT first time then it will be row 1 and again matches then row number 2 and so on..., if there is no entry for any then it should print zero there.
I hope i am clear what i am stating above.
Thanks once again.
RonB 589
Expert Mod 512MB
The code I posted will do that. The only thing I left out was the 2 printf statements (the 1st one being inside a conditional block) and setting the initial values to 0.
Which part are you having trouble accomplishing?
The more I look at this, the more it looks like your homework assignment and not a real world problem that you need to solve.
Hi RonB,
Thanks for the clarification. This is not an homework assinment, my output file from the CHARMM program is 4.8 GB and from that I am extracting the data for my research project.
I am having trouble in printing the values for each row of data, I tried to print iniside the "while loop" and it all printed in cummulative sum of the counts.
Thanks
RonB 589
Expert Mod 512MB
This assumes that 'WAT' is not the last group in the file, which is what your sample and code suggested. - #!/usr/bin/perl
-
-
use strict;
-
use warnings;
-
use Data::Dumper;
-
-
$/ = "\n CHARMM>";
-
-
printf "%4s %5s %5s\n", 'HEAD', 'TAIL', 'WAT';
-
-
my %count;
-
while ( <DATA> ) {
-
chomp;
-
my ($head, @data) = split /\n/;
-
my ($key) = $head =~ /(HEAD|TAIL|WAT)/;
-
-
if ($key eq 'HEAD') { # assign default value of 0 for each key
-
-
# this can be done in the other if block,
-
# but I think it makes more sense here
-
$count{$_} = 0 for keys %count;
-
}
-
$count{$key}++ for @data;
-
-
if ($key eq 'WAT') {
-
printf "%4d %5d %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
-
}
-
}
-
printf "%4d %5d %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
-
-
-
__DATA__
-
CHARMM> coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
-
MEMB POPC 41 O3 1.0000
-
CHARMM> coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
-
MEMB POPC 30 C21 2.0000
-
MEMB POPC 30 O22 3.0000
-
MEMB POPC 30 C22 1.0000
-
MEMB POPC 41 C3 3.0000
-
MEMB POPC 41 O31 2.0000
-
MEMB POPC 41 C31 2.0000
-
MEMB POPC 41 O32 3.0000
-
CHARMM> coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
-
TIP3 TIP3 257 OH2 4.0000
-
TIP3 TIP3 524 OH2 3.0000
-
TIP3 TIP3 3687 OH2 2.0000
-
TIP3 TIP3 3798 OH2 7.0000
-
TIP3 TIP3 4038 OH2 3.0000
-
TIP3 TIP3 5218 OH2 3.0000
-
TIP3 TIP3 7177 OH2 1.0000
-
CHARMM> coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
-
CHARMM> coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
-
MEMB POPC 30 C21 1.0000
-
MEMB POPC 30 O22 2.0000
-
MEMB POPC 30 C22 2.0000
-
MEMB POPC 41 C3 7.0000
-
MEMB POPC 41 O31 5.0000
-
MEMB POPC 41 C31 3.0000
-
MEMB POPC 41 O32 3.0000
-
MEMB POPC 41 C32 1.0000
-
CHARMM> coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
-
TIP3 TIP3 524 OH2 1.0000
-
TIP3 TIP3 2474 OH2 1.0000
-
TIP3 TIP3 3687 OH2 1.0000
-
TIP3 TIP3 3798 OH2 7.0000
-
TIP3 TIP3 4038 OH2 4.0000
-
TIP3 TIP3 5196 OH2 1.0000
-
TIP3 TIP3 5218 OH2 2.0000
-
TIP3 TIP3 7177 OH2 2.0000
-
CHARMM> coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
-
MEMB POPC 41 O3 2.0000
-
CHARMM> coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
-
MEMB POPC 30 C21 1.0000
-
MEMB POPC 30 O22 3.0000
-
MEMB POPC 30 C22 2.0000
-
MEMB POPC 41 C3 5.0000
-
MEMB POPC 41 O31 1.0000
-
MEMB POPC 41 C31 2.0000
-
MEMB POPC 41 O32 2.0000
Outputs: - C:\TEMP>kumarboston.pl
-
HEAD TAIL WAT
-
1 7 7
-
0 8 8
-
1 7 0
Hi RonB,
I tried to run the script using your suggestions but somehow it is printing all zero values.
I have attached the data file, and the script file also. The data file wil always be in a group of three, HEAD, TAIL,and WAT. -
#!/usr/bin/perl
-
-
use strict;
-
use warnings;
-
use Data::Dumper;
-
-
open (DATA, "file.txt");
-
-
printf "%4s %5s %5s\n", 'HEAD', 'TAIL', 'WAT';
-
-
my %count;
-
while ( <DATA> ) {
-
chomp;
-
my ($head, @data) = split /\n/;
-
my ($key) = $head =~ /(HEAD|TAIL|WAT)/;
-
-
if ($key eq 'HEAD') { # assign default value of 0 for each key this can be done in the other if block, but I think it makes more sense here
-
$count{$_} = 0 for keys %count;
-
}
-
$count{$key}++ for @data;
-
-
if ($key eq 'WAT') {
-
printf "%4d %5d %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
-
}
-
}
-
printf "%4d %5d %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
-
Thanks
RonB 589
Expert Mod 512MB
Compare the code you posted to what I posted and you'll see that you're missing a very important line. DATA is one of Perl's special built-in filehandles and it's best not to use it when opening your file.
Use a lexical var for the filehandle and use the 3 arg form of open.
When opening a filehandle, you should always check the return code to verify that it was successful and take proper action if it failed. - my $file = "file.txt";
-
open my $data_FH, '<', $file or die "failed to open <$file> $!";
The file you attached has a 'WAT' section as its last block. Will that always be true in your actual complete data file? As it is right now, the last 2 rows of output will be duplicates.
Yes, the last section will always be having WAT section whether the data is there or not."
Thanks
RonB 589
Expert Mod 512MB
In that case, remove that last printf statement, it's redundant.
RonB 589
Expert Mod 512MB
After thinking about it, the assignment of the default hash values should be moved into the 'WAT' if block, or we need to define the 3 hash keys prior to the while loop.
Sign in to post your reply or Sign up for a free account.
Similar topics
by: sathyashrayan |
last post by:
Group,
Following function will check weather a bit
is set in the given variouble x.
int bit_count(long x)
{
int n = 0;
/*
** The loop will execute once for each bit of x
set,
|
by: zets |
last post by:
I need a macro for counting the bits in the odd positions of a given
input (of any type, char, pointer, int, struct, whatever).
Is there any clever way I could not think of, to do it efficiently?
...
|
by: andy.lee23 |
last post by:
hi im having trouble counting lines in a text file, i have the
following code
int node1, node2, i;
char name;
float value;
ifstream fin;
fin.open(OpenDialog1->FileName.c_str());
i=1;
|
by: eiaks |
last post by:
Hello,
I want to print a table of characters and their values for my system like
65: A
66: B
aso.
starting from 0 to 255.
Am I rigth that I should use an unsigned char for this,...
|
by: Sorin Schwimmer |
last post by:
to Fredrik Lundh
I'm afraid Claudio Grondi can't use your solution, as
he needs it hosted on Windows, which lacks
signal.alarm.
to Claudio Grondi
How about splitting your loop in two? The...
|
by: Matt Chwastek |
last post by:
Anyone who can help,
I am curretnly attempting to write some code that will allow iteration
using a vector<intfrom the highest possilbe degree of a combination
of ones & zeros (111, 110, 101,...
|
by: sololoquist |
last post by:
#define COUNT_UP
#include <stdio.h>
#define N 10
int main()
{
int i;
#ifdef COUNT_UP
for (i = 0; i < N; i++)
|
by: peraklo |
last post by:
Hello,
there is another problem i am facing. i have a text file which is
about 15000 lines big. i have to cut the last 27 lines from that file
and create a new text file that contans those 27...
|
by: Shawn Minisall |
last post by:
I've been having some problems with using a while statement for one menu
within another while statement for the main menu, first time I've done
it. It's with choice number two from the menu. When...
|
by: frizzle |
last post by:
Hi there,
I have a function to create an array of all files in a certain folder,
so i can display the structure.
The actual function is below the message, as is an example of its
output.
As...
|
by: DolphinDB |
last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation.
Take...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: af34tf |
last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
| |