473,412 Members | 1,973 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,412 software developers and data experts.

parsing text

76
Hi,
how can i get the second data in the 4th column (ie "NP_047184.1") if the data above that in the same column ("AAD12597.1") is what i have as my query data. as you can see that the second column has same numbers grouped together which means they are the same, but with different data in the 4th columns (ids). i have one of the ids, how can i get the other one?
thanks a lot.

here is how my input file (testfile) looks like (with space delimited):
Expand|Select|Wrap|Line Numbers
  1. 9    1246500        -             AAD12597.1        3282737
  2. 9    1246500    Provisional        NP_047184.1         10954455
  3. 9    1246501        -             AAD12599.1        3282739
  4. 9    1246501    Provisional        NP_047186.1         10954457
  5.  
Expand|Select|Wrap|Line Numbers
  1. my $infile='./testfile';
  2. open(FH,$infile);
  3. while(<FH>){
  4.     if($_ =~ /^\d+\s+(\d+)\s+\-\s+($search)\./) { 
  5.         next; 
  6.     }
  7.     if($_ =~ /^\d+\s+(\d+)\s+\w+\s+(\S+)\./) { 
  8.         $id=$1; 
  9.         print $id; 
  10.         exit; 
  11.     }
  12. }
  13.  
Mar 5 '08 #1
7 1120
KevinADC
4,059 Expert 2GB
One possible way:

Expand|Select|Wrap|Line Numbers
  1. my $infile='./testfile';
  2. open(FH,$infile);
  3. while(<FH>){
  4.    if ( /^\d+\s+(\d+)\s+\-\s+($search)\./ ) { 
  5.       my $next_line = <FH>;
  6.       if ($next_line =~ /^\d+\s+(\d+)\s+\w+\s+(\S+)\./ ) {
  7.          my $id=$1; 
  8.          print $id; 
  9.          close (FH);
  10.          exit;
  11.      }
  12. }
  13.  
Mar 6 '08 #2
idorjee
76
Thank you so much Kevin. That works fine when the immediate next line is where the data I want to get is, but how do I do when I have the data I'm interested in is in the 3rd line, as in the one below:
Expand|Select|Wrap|Line Numbers
  1. 9   1246501        -            AAD12597.1        3282737
  2. 9   1246501        -            AAD12599.1        3282739
  3. 9   1246501    Provisional     NP_047184.1       10954455
  4.  
I'm sorry but I should have told you that I'm looking for the data in the 4th column (NP_047184.1) whenever there's the word "Provisional", given that I have the data (AAD12597.1) as the first one in same column.

Cheers!

One possible way:

Expand|Select|Wrap|Line Numbers
  1. my $infile='./testfile';
  2. open(FH,$infile);
  3. while(<FH>){
  4.    if ( /^\d+\s+(\d+)\s+\-\s+($search)\./ ) { 
  5.       my $next_line = <FH>;
  6.       if ($next_line =~ /^\d+\s+(\d+)\s+\w+\s+(\S+)\./ ) {
  7.          my $id=$1; 
  8.          print $id; 
  9.          close (FH);
  10.          exit;
  11.      }
  12. }
  13.  
Mar 6 '08 #3
KevinADC
4,059 Expert 2GB
Its difficult to hit a moving target. Are you sure this is the only other requirement that the search term is found and then x lines later the word provisional indicates the line you want to extract some data out of or will the target move again?
Mar 6 '08 #4
idorjee
76
Hi Kevin,
Yes, the word "Provisional" could be in the immediate next line when the query term (search term) is found or could be after x lines later. So, it is something that I can't possibly do?
Help please, if you can. Thanks a lot.

Its difficult to hit a moving target. Are you sure this is the only other requirement that the search term is found and then x lines later the word provisional indicates the line you want to extract some data out of or will the target move again?
Mar 6 '08 #5
nithinpes
410 Expert 256MB
From your initial code, where you have used exit after printing the desired number, I am assuming you just want the first occurence it and not continue the search further. This code will work even if 'Provisional' line is immediately after your search string's line or 'n' lines after it.

Expand|Select|Wrap|Line Numbers
  1. my $infile='testfile.txt';
  2. my $search='AAD12597.1';
  3. open(FH,$infile) or die "failed to open:$!";
  4. while(<FH>){
  5.    if ( /^\d+\s+(\d+)\s+\-\s+($search)/ ) { 
  6.       my $next_line = <FH>;
  7.       until ($next_line =~ /^\d+\s+\d+\s+Provisional\s+\S+\./ ) {
  8.             $next_line = <FH>;
  9.          }
  10.     if ($next_line =~ /^\d+\s+(\d+)\s+\w+\s+(\S+)\./ ) {
  11.          my $id=$1; 
  12.          print "$id\n"; 
  13.          close (FH);
  14.          exit;
  15.      }
  16. }
  17. }
  18.  
  19.  
Mar 6 '08 #6
KevinADC
4,059 Expert 2GB
Hi Kevin,
Yes, the word "Provisional" could be in the immediate next line when the query term (search term) is found or could be after x lines later. So, it is something that I can't possibly do?
Help please, if you can. Thanks a lot.
It is easily possible, but is there always an occurance of "provisional" within the set of lines you are searching? If so, nithinpes's code looks like it will work.
Mar 6 '08 #7
idorjee
76
Thank you so much to both of you for your help. My script works fine now.
Cheers!
^ ^*
Mar 7 '08 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

16
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...
4
by: ralphNOSPAM | last post by:
Is there a function or otherwise some way to pull out the target text within an XML tag? For example, in the XML tag below, I want to pull out 'CALIFORNIA'. ...
3
by: Pir8 | last post by:
I have a complex xml file, which contains stories within a magazine. The structure of the xml file is as follows: <?xml version="1.0" encoding="ISO-8859-1" ?> <magazine> <story>...
7
by: Lucas Tam | last post by:
Hi all, Does anyone know of a GOOD example on parsing text with text qualifiers? I am hoping to parse text with variable length delimiters/qualifiers. Also, qualified text could run onto...
4
by: Earl | last post by:
I'm curious if there are others who have a better method of accepting/parsing phone numbers. I've used a couple of different techniques that are functional but I can't really say that I'm totally...
2
by: JaythePCguy | last post by:
Hi, I am trying to write a text parser to group all nonprintable and control characters, spaces and space delimited words in different groups using Regex class. Using a parsing of...
9
by: ankitdesai | last post by:
I would like to parse a couple of tables within an individual player's SHTML page. For example, I would like to get the "Actual Pitching Statistics" and the "Translated Pitching Statistics"...
3
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
1
by: martinsson | last post by:
Hi all! I'm pretty mad about this... dont know what is going on. Im parsing XML file that looks like this: <something> __<item att="something">text<item> __<item...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.