473,394 Members | 1,766 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Parse a FASTA text file

I am new to perl and would like to parse a nucleotide or sequence file, awaiting for help..

>gi|116666853|pdb|2BZX|A unnamed protein product [Homo sapiens]
PVFAKAIQKRVPCAYDKTALALEVGDIVKVTRMNINGQWEGEVNGRKGLF PFTHVKIFDPQNPDENE

>gi|110591044|pdb|2CN8|A unnamed protein product [Homo sapiens]
GPLGSHMSVYPKALRDEYIMSKTLGSGACGEVKLAFERKTCKKVAIKIIS KRKFAIGSAREADPALNVET
EIEILKKLNHPCIIKIKNFFDAEDYYIVLELMEGGELFDKVVGNKRLKEA TCKLYFYQMLLAVQYLHENG
IIHRDLKPENVLLSSQEEDCLIKITDFGHSKILGETSLMRTLCGTPTYLA PEVLVSVGTAGYNRAVDCWS
LGVILFICLSGYPPFSEHRTQVSLKDQITSGKYNFIPEVWAEVSEKALDL VKKLLVVDPKARFTTEEALR
HPWLQDEDMKRKFQDLLSEENESTALPQVLAQPSTSRKRPREGEAEGAE


A text file which I would like to parse has the data in the above format the expected output is

116666853
A unnamed protein product
>gi|116666853|pdb|2BZX|A unnamed protein product [Homo sapiens]
PVFAKAIQKRVPCAYDKTALALEVGDIVKVTRMNINGQWEGEVNGRKGLF PFTHVKIFDPQNPDENE

110591044
A unnamed protein product
>gi|110591044|pdb|2CN8|A unnamed protein product [Homo sapiens]
GPLGSHMSVYPKALRDEYIMSKTLGSGACGEVKLAFERKTCKKVAIKIIS KRKFAIGSAREADPALNVET
EIEILKKLNHPCIIKIKNFFDAEDYYIVLELMEGGELFDKVVGNKRLKEA TCKLYFYQMLLAVQYLHENG
IIHRDLKPENVLLSSQEEDCLIKITDFGHSKILGETSLMRTLCGTPTYLA PEVLVSVGTAGYNRAVDCWS
LGVILFICLSGYPPFSEHRTQVSLKDQITSGKYNFIPEVWAEVSEKALDL VKKLLVVDPKARFTTEEALR
HPWLQDEDMKRKFQDLLSEENESTALPQVLAQPSTSRKRPREGEAEGAE
Jun 12 '07 #1
3 3320
Could you be somewhat more specific about the fasta files like:

Is the data (Which is in Capital letters) will be in the single line or in multiple lines as shown in the second sequence.

Is the number of fields seperated by "|" will be the same in each sequence (for example 5) as in the given sequences of the example file.

Can you post the code which you have tried so far?
Jun 20 '07 #2
KevinADC
4,059 Expert 2GB
There are many DNA / FASTA modules on cpan as well as bioperl that are available for this type of work.
Jun 20 '07 #3
miller
1,089 Expert 1GB
cpan search "FASTA"
cpan search "DNA"

- Miller
Jun 20 '07 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

6
by: chuck amadi | last post by:
Hi , Im trying to parse a specific users mailbox (testwwws) and output the body of the messages to a file ,that file will then be loaded into a PostGresql DB at some point . I have read the...
26
by: Chris Lasher | last post by:
Hello, I have a rather large (100+ MB) FASTA file from which I need to access records in a random order. The FASTA format is a standard format for storing molecular biological sequences. Each...
22
by: Ram Laxman | last post by:
Hi all, I have a text file which have data in CSV format. "empno","phonenumber","wardnumber" 12345,2234353,1000202 12326,2243653,1000098 Iam a beginner of C/C++ programming. I don't know how to...
6
by: nate | last post by:
Hello, Does anyone know where I can find an ASP server side script written in JavaScript to parse text fields from a form method='POST' using enctype='multipart/form-data'? I'd also like it to...
13
by: DH | last post by:
Hi, I'm trying to strip the html and other useless junk from a html page.. Id like to create something like an automated text editor, where it takes the keywords from a txt file and removes them...
1
AdrianH
by: AdrianH | last post by:
Assumptions I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C programming. FYI Although I have called this...
0
by: ghostface | last post by:
**How do I parse a textfile and edit only a certain part of it. Specifically, just the last column. My textfile looks like this. #Server Group 1...
6
by: =?Utf-8?B?RGF2aWRN?= | last post by:
Hello, I have an XML file generated from a third party application that I would like to parse. Ideally, I plan on having a windows service setup to scan various folders for XML files and parse the...
0
by: drjekil | last post by:
Planning to Write a script that reads a Fasta-formatted file and writes the number of sequences in it, followed by the accessions of the sequences. For example, if the input is >Pig ACGT ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.