By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,436 Members | 2,958 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,436 IT Pros & Developers. It's quick & easy.

Print column numbers that are 100% conserved

P: 13
how can i write a Python program that prints the column numbers in a FASTA format multiple alignment that are 100% conserved. im really having trouble getting a grip of this concept...For example in the multiple alignment below

>human
ACC
>mouse
ACC
>cat
TCC
>dog
ACA

column 2 is 100% conserved but columns 1 and 3 are not 100% conserved.
Nov 12 '10 #1
Share this Question
Share on Google+
8 Replies


bvdet
Expert Mod 2.5K+
P: 2,851
What is the output supposed to look like? I don't know what a Fasta format multiple alignment is. :(
Nov 12 '10 #2

P: 13
the fasta format multiple alignment is just a name i believe, i do not think it is anything significant..
Nov 12 '10 #3

bvdet
Expert Mod 2.5K+
P: 2,851
What is the output supposed to look like?
Nov 12 '10 #4

P: 13
output: column 2 = 100%
Nov 14 '10 #5

bvdet
Expert Mod 2.5K+
P: 2,851
What is column 2? 100% of what? Please be specific.
Nov 15 '10 #6

P: 13
A 100% conserved column is
one that has the exact same nucleotide in every sequence. For example if the
user enters 1 and the the multiple alignment below is given as input

>human
ACC
>mouse
ACC
>cat
TCC
>dog
ACA

then the output should be "No". But if the user enters 2 then the output
should be "Yes".
Nov 17 '10 #7

bvdet
Expert Mod 2.5K+
P: 2,851
Following is a test script that show's how it can be done using set().
Expand|Select|Wrap|Line Numbers
  1. import random
  2.  
  3. data = '''>human
  4. ACC
  5. >mouse
  6. ACC
  7. >cat
  8. TCC
  9. >dog
  10. ACA'''
  11.  
  12. def conserved(col, seq):
  13.     colList = set([item[col] for item in seq])
  14.     if len(colList) == 1:
  15.         return True
  16.     return False
  17.  
  18. dataList = data.split("\n")
  19. sequences =[list(dataList[i]) for i in range(1, len(dataList), 2)]
  20.  
  21. column = random.choice([0,1,2])
  22.  
  23. result = conserved(column, sequences)
  24. print "Column %s %s conserved" % (column, ["IS", "IS NOT"][not result or 0])
The three columns are 0, 1 and 2 which is consistent with a list index.
Nov 17 '10 #8

Expert 100+
P: 621
Shouldn't it be true for 1 or 2, and false for 3 or 4?
Nov 17 '10 #9

Post your reply

Sign in to post your reply or Sign up for a free account.