By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,847 Members | 1,253 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,847 IT Pros & Developers. It's quick & easy.

parsing text

P: 76
Hi,
how can i get the second data in the 4th column (ie "NP_047184.1") if the data above that in the same column ("AAD12597.1") is what i have as my query data. as you can see that the second column has same numbers grouped together which means they are the same, but with different data in the 4th columns (ids). i have one of the ids, how can i get the other one?
thanks a lot.

here is how my input file (testfile) looks like (with space delimited):
Expand|Select|Wrap|Line Numbers
  1. 9    1246500        -             AAD12597.1        3282737
  2. 9    1246500    Provisional        NP_047184.1         10954455
  3. 9    1246501        -             AAD12599.1        3282739
  4. 9    1246501    Provisional        NP_047186.1         10954457
  5.  
Expand|Select|Wrap|Line Numbers
  1. my $infile='./testfile';
  2. open(FH,$infile);
  3. while(<FH>){
  4.     if($_ =~ /^\d+\s+(\d+)\s+\-\s+($search)\./) { 
  5.         next; 
  6.     }
  7.     if($_ =~ /^\d+\s+(\d+)\s+\w+\s+(\S+)\./) { 
  8.         $id=$1; 
  9.         print $id; 
  10.         exit; 
  11.     }
  12. }
  13.  
Mar 5 '08 #1
Share this Question
Share on Google+
7 Replies


KevinADC
Expert 2.5K+
P: 4,059
One possible way:

Expand|Select|Wrap|Line Numbers
  1. my $infile='./testfile';
  2. open(FH,$infile);
  3. while(<FH>){
  4.    if ( /^\d+\s+(\d+)\s+\-\s+($search)\./ ) { 
  5.       my $next_line = <FH>;
  6.       if ($next_line =~ /^\d+\s+(\d+)\s+\w+\s+(\S+)\./ ) {
  7.          my $id=$1; 
  8.          print $id; 
  9.          close (FH);
  10.          exit;
  11.      }
  12. }
  13.  
Mar 6 '08 #2

P: 76
Thank you so much Kevin. That works fine when the immediate next line is where the data I want to get is, but how do I do when I have the data I'm interested in is in the 3rd line, as in the one below:
Expand|Select|Wrap|Line Numbers
  1. 9   1246501        -            AAD12597.1        3282737
  2. 9   1246501        -            AAD12599.1        3282739
  3. 9   1246501    Provisional     NP_047184.1       10954455
  4.  
I'm sorry but I should have told you that I'm looking for the data in the 4th column (NP_047184.1) whenever there's the word "Provisional", given that I have the data (AAD12597.1) as the first one in same column.

Cheers!

One possible way:

Expand|Select|Wrap|Line Numbers
  1. my $infile='./testfile';
  2. open(FH,$infile);
  3. while(<FH>){
  4.    if ( /^\d+\s+(\d+)\s+\-\s+($search)\./ ) { 
  5.       my $next_line = <FH>;
  6.       if ($next_line =~ /^\d+\s+(\d+)\s+\w+\s+(\S+)\./ ) {
  7.          my $id=$1; 
  8.          print $id; 
  9.          close (FH);
  10.          exit;
  11.      }
  12. }
  13.  
Mar 6 '08 #3

KevinADC
Expert 2.5K+
P: 4,059
Its difficult to hit a moving target. Are you sure this is the only other requirement that the search term is found and then x lines later the word provisional indicates the line you want to extract some data out of or will the target move again?
Mar 6 '08 #4

P: 76
Hi Kevin,
Yes, the word "Provisional" could be in the immediate next line when the query term (search term) is found or could be after x lines later. So, it is something that I can't possibly do?
Help please, if you can. Thanks a lot.

Its difficult to hit a moving target. Are you sure this is the only other requirement that the search term is found and then x lines later the word provisional indicates the line you want to extract some data out of or will the target move again?
Mar 6 '08 #5

nithinpes
Expert 100+
P: 410
From your initial code, where you have used exit after printing the desired number, I am assuming you just want the first occurence it and not continue the search further. This code will work even if 'Provisional' line is immediately after your search string's line or 'n' lines after it.

Expand|Select|Wrap|Line Numbers
  1. my $infile='testfile.txt';
  2. my $search='AAD12597.1';
  3. open(FH,$infile) or die "failed to open:$!";
  4. while(<FH>){
  5.    if ( /^\d+\s+(\d+)\s+\-\s+($search)/ ) { 
  6.       my $next_line = <FH>;
  7.       until ($next_line =~ /^\d+\s+\d+\s+Provisional\s+\S+\./ ) {
  8.             $next_line = <FH>;
  9.          }
  10.     if ($next_line =~ /^\d+\s+(\d+)\s+\w+\s+(\S+)\./ ) {
  11.          my $id=$1; 
  12.          print "$id\n"; 
  13.          close (FH);
  14.          exit;
  15.      }
  16. }
  17. }
  18.  
  19.  
Mar 6 '08 #6

KevinADC
Expert 2.5K+
P: 4,059
Hi Kevin,
Yes, the word "Provisional" could be in the immediate next line when the query term (search term) is found or could be after x lines later. So, it is something that I can't possibly do?
Help please, if you can. Thanks a lot.
It is easily possible, but is there always an occurance of "provisional" within the set of lines you are searching? If so, nithinpes's code looks like it will work.
Mar 6 '08 #7

P: 76
Thank you so much to both of you for your help. My script works fine now.
Cheers!
^ ^*
Mar 7 '08 #8

Post your reply

Sign in to post your reply or Sign up for a free account.