By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,403 Members | 1,089 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,403 IT Pros & Developers. It's quick & easy.

How to use .split

P: 7
I need to write a Python script to convert two sequences to interleaved FASTA format with 10 characters per line.
so if the input is

>human
ACACCGGTACCAGATATGATATACCGAGA
>mouse
ACCAGAGGGGGTTTTAAACCACAGCG

(saved as dna.txt)
the output should be
>human
ACACCGGTAC
CAGATATGAT
ATACCGAGA
>mouse
ACCAGAGGGG
GTTTTAAACC
ACAGCG


i have no idea what to do.
this is all i have so far
Expand|Select|Wrap|Line Numbers
  1. def read_data(filename):
  2.     with open("p:/dna.txt", "r") as myfile:
  3.         data = myfile.readlines()
  4.         myfile.close()
  5.     for i in range(0, len(data)):
  6.         data[i] = data[i].rstrip("\n")
  7.  
  8.     return data
  9.  
  10. seq1 = data[1]
  11. seq2 = data[3]
Nov 7 '10 #1
Share this Question
Share on Google+
1 Reply


bvdet
Expert Mod 2.5K+
P: 2,851
Maybe this will help:
Expand|Select|Wrap|Line Numbers
  1. >>> import textwrap
  2. >>> textwrap.wrap("ACCAGAGGGGGTTTTAAACCACAGCG", 10)
  3. ['ACCAGAGGGG', 'GTTTTAAACC', 'ACAGCG']
  4. >>> data = [">human", "ACACCGGTACCAGATATGATATACCGAGA", ">mouse", "ACCAGAGGGGGTTTTAAACCACAGCG"]
  5. >>> for item in data:
  6. ...     print "\n".join([s for s in textwrap.wrap(item, 10)])
  7. ...     
  8. >human
  9. ACACCGGTAC
  10. CAGATATGAT
  11. ATACCGAGA
  12. >mouse
  13. ACCAGAGGGG
  14. GTTTTAAACC
  15. ACAGCG
  16. >>> 
Nov 8 '10 #2

Post your reply

Sign in to post your reply or Sign up for a free account.