By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,673 Members | 1,267 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,673 IT Pros & Developers. It's quick & easy.

Reading from a file

P: 6
I'm trying to get data from a txt file, still I don't know how to do it. The data is in fasta format (a format used in molecular biology to store protein/DNA sequences) whis is very simple:
Expand|Select|Wrap|Line Numbers
  1. >Header1
  2. Sequence1
  3.  
  4. >Header2
  5. Sequence2
  6. .
  7. .
  8. .
  9. >HeaderN
  10. SequenceN
  11.  
The ">" is always present and denotes an identifier line (in which we usually write the name/id of the sequence below). The line or lines following the header are the proper sequence, which have different lenghts.

So, my question is which instructions to use so I can read the file and copy all the sequences, each one in a list for itself.
Any ideas? Thanks in advance.
Apr 22 '12 #1
Share this Question
Share on Google+
1 Reply

bvdet
Expert Mod 2.5K+
P: 2,851
The following code will read a text file and create a dictionary of headers and sequences.
Expand|Select|Wrap|Line Numbers
  1. f = open("fasta1.txt")
  2. dd = {}
  3. current_header = False
  4. for line in f:
  5.     if line.startswith(">"):
  6.         current_header = line[1:].strip()
  7.     elif current_header:
  8.         dd.setdefault(current_header, []).append(line.strip())
  9. f.close()
It saves the sequences in a list in case the sequence spans over multiple lines.
Apr 22 '12 #2

Post your reply

Sign in to post your reply or Sign up for a free account.