By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,834 Members | 2,265 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,834 IT Pros & Developers. It's quick & easy.

Grabing line from file

P: 11
Hi my original file looks like this-
>P1;208,D-
208,D-5 - (MOUSE) mouse
DVLMTQTPPSLPVSLGDQASISCRSSQTIVHSDGNTYLEW
YLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKI
SRVEAEDLGLYYCFQGSHVPPTFGGGTKLEIKR*
VQLLEESGGGLVKPGGSLKLSCAASGFTFSDYYMFWVRQT
PEKRLEWFATISDGGSHTYYPDSVKGRFTISRDNAKNNLY
LQMSCLRSEDTAMYYCTRDSLDFYGMDYWGQGTSVTVSS*
>P1;MN14C1
MN14C11.6 - (MOUSE) mouse
DILMTQTPLSLPVSLGDQASISCRSSQSLVHSNGNTYLHW
YLQKPGQSPKLLIYTVSNRFSGVPDRFSGSGSGTDFTLKI
SRVEAEDLGLYFCSQSTHLPPTFGGGTKLDIKR*
DVHLLVSGGGLVKPGGSLKLSCAASGFTFSDYYMFWVRQT
PEKRLEWVATISDGGAHTYYPDSVKGRFTISRDNAKNNLY
LHMNSLKSEDTAMYYCARDPLEYYGMDYWGQGTAVTV*
>P1;scFv40
scFv40 - (MOUSE) mouse
DVQIIQTTASLSASVGETVTITCRASEHIYSYLAWYQQKQ
GKSPQLLVYSAKTLAEGVPSRFSGSGSGTQFSLKINSLQP
EDFGSYYCQHHYDTPRTFGGGTKLEIRRA*
VDQVQQPGAELVRSGASVKMSCKASGYTFTSYNMHWVKQT
PGQGLEWIGYIYPGNGGTIYNQKFKGKATLTADTSSSTAN
MQISSLTSEDSAVYFCARGDYRNDPFDFWGQGTTLTVSS*

After that I created new file by perl and its looks like this -
>P1|208,D
DVLMTQTPPSLPVSLGDQASISCRSSQTIVHSDGNTYLEW
YLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKI
SRVEAEDLGLYYCFQGSHVPPTFGGGTKLEIKR*
VQLLEESGGGLVKPGGSLKLSCAASGFTFSDYYMFWVRQT
PEKRLEWFATISDGGSHTYYPDSVKGRFTISRDNAKNNLY
LQMSCLRSEDTAMYYCTRDSLDFYGMDYWGQGTSVTVSS*
>P1|MN14C1
DILMTQTPLSLPVSLGDQASISCRSSQSLVHSNGNTYLHW
YLQKPGQSPKLLIYTVSNRFSGVPDRFSGSGSGTDFTLKI
SRVEAEDLGLYFCSQSTHLPPTFGGGTKLDIKR*
DVHLLVSGGGLVKPGGSLKLSCAASGFTFSDYYMFWVRQT
PEKRLEWVATISDGGAHTYYPDSVKGRFTISRDNAKNNLY
LHMNSLKSEDTAMYYCARDPLEYYGMDYWGQGTAVTV*
>P1|scFv40
DVQIIQTTASLSASVGETVTITCRASEHIYSYLAWYQQKQ
GKSPQLLVYSAKTLAEGVPSRFSGSGSGTQFSLKINSLQP
EDFGSYYCQHHYDTPRTFGGGTKLEIRRA*
VDQVQQPGAELVRSGASVKMSCKASGYTFTSYNMHWVKQT
PGQGLEWIGYIYPGNGGTIYNQKFKGKATLTADTSSSTAN
MQISSLTSEDSAVYFCARGDYRNDPFDFWGQGTTLTVSS*


So want to grab lines in two ways

one is
>P1|208,D
DVLMTQTPPSLPVSLGDQASISCRSSQTIVHSDGNTYLEW
YLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKI
SRVEAEDLGLYYCFQGSHVPPTFGGGTKLEIKR*


2nd way is
>P1|208,D
*
VQLLEESGGGLVKPGGSLKLSCAASGFTFSDYYMFWVRQT
PEKRLEWFATISDGGSHTYYPDSVKGRFTISRDNAKNNLY
LQMSCLRSEDTAMYYCTRDSLDFYGMDYWGQGTSVTVSS*

I was thinking about regular expression but its not working. Anyone can plz help me....

Thanks
Pallab
Jun 17 '10 #1
Share this Question
Share on Google+
5 Replies


Expert
P: 70
Expand|Select|Wrap|Line Numbers
  1. use warnings;
  2. use strict;
  3.  
  4. my $start;
  5. my $flag = 0;
  6. while (<DATA>) {
  7.     print;
  8.     if (/^>/) {
  9.         $start = $_;
  10.         $flag = 1;
  11.     }
  12.     elsif (/\*/) {
  13.         print "$start*\n" if $flag;
  14.         $flag = 0;
  15.     }
  16. }
  17.  
  18. __DATA__
  19. >P1|208,D
  20. DVLMTQTPPSLPVSLGDQASISCRSSQTIVHSDGNTYLEW
  21. YLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKI
  22. SRVEAEDLGLYYCFQGSHVPPTFGGGTKLEIKR*
  23. VQLLEESGGGLVKPGGSLKLSCAASGFTFSDYYMFWVRQT
  24. PEKRLEWFATISDGGSHTYYPDSVKGRFTISRDNAKNNLY
  25. LQMSCLRSEDTAMYYCTRDSLDFYGMDYWGQGTSVTVSS*
  26. >P1|MN14C1
  27. DILMTQTPLSLPVSLGDQASISCRSSQSLVHSNGNTYLHW
  28. YLQKPGQSPKLLIYTVSNRFSGVPDRFSGSGSGTDFTLKI
  29. SRVEAEDLGLYFCSQSTHLPPTFGGGTKLDIKR*
  30. DVHLLVSGGGLVKPGGSLKLSCAASGFTFSDYYMFWVRQT
  31. PEKRLEWVATISDGGAHTYYPDSVKGRFTISRDNAKNNLY
  32. LHMNSLKSEDTAMYYCARDPLEYYGMDYWGQGTAVTV*
  33. >P1|scFv40
  34. DVQIIQTTASLSASVGETVTITCRASEHIYSYLAWYQQKQ
  35. GKSPQLLVYSAKTLAEGVPSRFSGSGSGTQFSLKINSLQP
  36. EDFGSYYCQHHYDTPRTFGGGTKLEIRRA*
  37. VDQVQQPGAELVRSGASVKMSCKASGYTFTSYNMHWVKQT
  38. PGQGLEWIGYIYPGNGGTIYNQKFKGKATLTADTSSSTAN
  39. MQISSLTSEDSAVYFCARGDYRNDPFDFWGQGTTLTVSS*
Prints out:

Expand|Select|Wrap|Line Numbers
  1. >P1|208,D
  2. DVLMTQTPPSLPVSLGDQASISCRSSQTIVHSDGNTYLEW
  3. YLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKI
  4. SRVEAEDLGLYYCFQGSHVPPTFGGGTKLEIKR*
  5. >P1|208,D
  6. *
  7. VQLLEESGGGLVKPGGSLKLSCAASGFTFSDYYMFWVRQT
  8. PEKRLEWFATISDGGSHTYYPDSVKGRFTISRDNAKNNLY
  9. LQMSCLRSEDTAMYYCTRDSLDFYGMDYWGQGTSVTVSS*
  10. >P1|MN14C1
  11. DILMTQTPLSLPVSLGDQASISCRSSQSLVHSNGNTYLHW
  12. YLQKPGQSPKLLIYTVSNRFSGVPDRFSGSGSGTDFTLKI
  13. SRVEAEDLGLYFCSQSTHLPPTFGGGTKLDIKR*
  14. >P1|MN14C1
  15. *
  16. DVHLLVSGGGLVKPGGSLKLSCAASGFTFSDYYMFWVRQT
  17. PEKRLEWVATISDGGAHTYYPDSVKGRFTISRDNAKNNLY
  18. LHMNSLKSEDTAMYYCARDPLEYYGMDYWGQGTAVTV*
  19. >P1|scFv40
  20. DVQIIQTTASLSASVGETVTITCRASEHIYSYLAWYQQKQ
  21. GKSPQLLVYSAKTLAEGVPSRFSGSGSGTQFSLKINSLQP
  22. EDFGSYYCQHHYDTPRTFGGGTKLEIRRA*
  23. >P1|scFv40
  24. *
  25. VDQVQQPGAELVRSGASVKMSCKASGYTFTSYNMHWVKQT
  26. PGQGLEWIGYIYPGNGGTIYNQKFKGKATLTADTSSSTAN
  27. MQISSLTSEDSAVYFCARGDYRNDPFDFWGQGTTLTVSS*
Jun 17 '10 #2

P: 11
@toolic
Thanks for your answer
Jun 17 '10 #3

P: 11
@toolic
hi
can u plz explain me about function of variable $flag.
and other thing is that if I try to print only

>P1|208,D
DVLMTQTPPSLPVSLGDQASISCRSSQTIVHSDGNTYLEW
YLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKI
SRVEAEDLGLYYCFQGSHVPPTFGGGTKLEIKR*

>P1|MN14C1
DILMTQTPLSLPVSLGDQASISCRSSQSLVHSNGNTYLHW
YLQKPGQSPKLLIYTVSNRFSGVPDRFSGSGSGTDFTLKI
SRVEAEDLGLYFCSQSTHLPPTFGGGTKLDIKR*

>P1|scFv40
DVQIIQTTASLSASVGETVTITCRASEHIYSYLAWYQQKQ
GKSPQLLVYSAKTLAEGVPSRFSGSGSGTQFSLKINSLQP
EDFGSYYCQHHYDTPRTFGGGTKLEIRRA*


Or only I if I try to printout this format

>P1|208,D
*
VQLLEESGGGLVKPGGSLKLSCAASGFTFSDYYMFWVRQT
PEKRLEWFATISDGGSHTYYPDSVKGRFTISRDNAKNNLY
LQMSCLRSEDTAMYYCTRDSLDFYGMDYWGQGTSVTVSS*

>P1|MN14C1
*
DVHLLVSGGGLVKPGGSLKLSCAASGFTFSDYYMFWVRQT
PEKRLEWVATISDGGAHTYYPDSVKGRFTISRDNAKNNLY
LHMNSLKSEDTAMYYCARDPLEYYGMDYWGQGTAVTV*

>P1|scFv40
*
VDQVQQPGAELVRSGASVKMSCKASGYTFTSYNMHWVKQT
PGQGLEWIGYIYPGNGGTIYNQKFKGKATLTADTSSSTAN
MQISSLTSEDSAVYFCARGDYRNDPFDFWGQGTTLTVSS*

so for this one what I have to change in previous programme

Thanks
Pallab
Jun 17 '10 #4

Expert
P: 70
$flag is used as a state variable.
It is set when a line begins with ">".
It is cleared when a line ends with "*".
I admit that it is not the best name for a variable,
but you can change it to be more meaningful for you.

I don't understand the rest of yuor question.
Jun 17 '10 #5

P: 11
@toolic
Thanks for your feedback.........Actually in every Sequence there are two parts which is divided by *...* so that means
>P1|BD-8
FHFHFJHFJFDJJHFJF
HFJHFJKFJKFJKFJKF
*
GFHFHJFJFJFJKFJKF
HJFHJFJFJKFJKFJKF
*

So bold is part1 of that sequence and thats lies between > and first *

and 2nd part lies between first * and second *

so my job is to grab part1 of all sequence and print them
and second job is that grab par2 of all sequence and print them with header line.

Thanks
Pallab
Jun 17 '10 #6

Post your reply

Sign in to post your reply or Sign up for a free account.