472,102 Members | 1,083 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,102 software developers and data experts.

Print column numbers that are 100% conserved

13
how can i write a Python program that prints the column numbers in a FASTA format multiple alignment that are 100% conserved. im really having trouble getting a grip of this concept...For example in the multiple alignment below

>human
ACC
>mouse
ACC
>cat
TCC
>dog
ACA

column 2 is 100% conserved but columns 1 and 3 are not 100% conserved.
Nov 12 '10 #1
8 2278
bvdet
2,851 Expert Mod 2GB
What is the output supposed to look like? I don't know what a Fasta format multiple alignment is. :(
Nov 12 '10 #2
CCGG26
13
the fasta format multiple alignment is just a name i believe, i do not think it is anything significant..
Nov 12 '10 #3
bvdet
2,851 Expert Mod 2GB
What is the output supposed to look like?
Nov 12 '10 #4
CCGG26
13
output: column 2 = 100%
Nov 14 '10 #5
bvdet
2,851 Expert Mod 2GB
What is column 2? 100% of what? Please be specific.
Nov 15 '10 #6
CCGG26
13
A 100% conserved column is
one that has the exact same nucleotide in every sequence. For example if the
user enters 1 and the the multiple alignment below is given as input

>human
ACC
>mouse
ACC
>cat
TCC
>dog
ACA

then the output should be "No". But if the user enters 2 then the output
should be "Yes".
Nov 17 '10 #7
bvdet
2,851 Expert Mod 2GB
Following is a test script that show's how it can be done using set().
Expand|Select|Wrap|Line Numbers
  1. import random
  2.  
  3. data = '''>human
  4. ACC
  5. >mouse
  6. ACC
  7. >cat
  8. TCC
  9. >dog
  10. ACA'''
  11.  
  12. def conserved(col, seq):
  13.     colList = set([item[col] for item in seq])
  14.     if len(colList) == 1:
  15.         return True
  16.     return False
  17.  
  18. dataList = data.split("\n")
  19. sequences =[list(dataList[i]) for i in range(1, len(dataList), 2)]
  20.  
  21. column = random.choice([0,1,2])
  22.  
  23. result = conserved(column, sequences)
  24. print "Column %s %s conserved" % (column, ["IS", "IS NOT"][not result or 0])
The three columns are 0, 1 and 2 which is consistent with a list index.
Nov 17 '10 #8
dwblas
626 Expert 512MB
Shouldn't it be true for 1 or 2, and false for 3 or 4?
Nov 17 '10 #9

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

14 posts views Thread by vronskij | last post: by
33 posts views Thread by Nick Evans | last post: by
1 post views Thread by Jim Langston | last post: by
19 posts views Thread by pitamber kumar | last post: by
8 posts views Thread by karthikbalaguru | last post: by
2 posts views Thread by WP | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.