By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,036 Members | 1,963 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,036 IT Pros & Developers. It's quick & easy.

How to get the number of A's in each column?

P: 13
...hey guys.. how can you write aprogram that prints the number of A's in each column of a multiple sequence alignment. For example for the multiple alignment below

>human
ACCT
>mouse
ACCT
>cat
TCCT
>dog
ACAT

the output should be

2 0 1 0
Nov 18 '10 #1
Share this Question
Share on Google+
7 Replies


bvdet
Expert Mod 2.5K+
P: 2,851
We are not here to write programs for you. At least show us some effort on your part to write the code.
Nov 18 '10 #2

P: 13
Expand|Select|Wrap|Line Numbers
  1. with open("e:/dna14.txt", "r") as myfile:
  2.     data = myfile.readlines()
  3.     myfile.close()
  4.  
  5. for i in range(0, len(data), 1):
  6.     data[i] = data[i].rstrip("/n")
  7.  
  8. column_number = input("Please enter a coumn number: ")
  9. column_number = int(column_number)
  10.  
  11. ch1 = (data[1])[column_number]
  12. Acount = 0
Nov 18 '10 #3

bvdet
Expert Mod 2.5K+
P: 2,851
You can create a new list from data that contains every other list item (each item is a str) in data. It should look like this: ['ACCT', 'ACCT', 'TCCT', 'ACAT']

Then you can use zip to create a list of tuples. Each tuple would contain the respective column.
Expand|Select|Wrap|Line Numbers
  1. >>> zip(*sequences)
  2. [('A', 'A', 'T', 'A'), ('C', 'C', 'C', 'C'), ('C', 'C', 'C', 'A'), ('T', 'T', 'T', 'T')]
  3. >>> 
Then:
Expand|Select|Wrap|Line Numbers
  1. >>> " ".join([str(list(item).count(letter)) for item in zip(*sequences)])
  2. '3 0 1 0'
  3. >>> 
Nov 19 '10 #4

P: 13
ok, im having trouble understanding zip(*sequences)..ive never used that before at my level of programming and was wondering if there is an alternative method
Nov 19 '10 #5

bvdet
Expert Mod 2.5K+
P: 2,851
It can also be accomplished with a list comprehension.
Expand|Select|Wrap|Line Numbers
  1. >>> sequences
  2. ['ACCT', 'ACCT', 'TCCT', 'ACAT', 'CAAT']
  3. >>> [[item[i] for item in sequences] for i in range(len(sequences[0]))]
  4. [['A', 'A', 'T', 'A', 'C'], ['C', 'C', 'C', 'C', 'A'], ['C', 'C', 'C', 'A', 'A'], ['T', 'T', 'T', 'T', 'T']]
  5. >>> 
Nov 19 '10 #6

P: 13
great, can this way work for any sequence or just this specific problem...if the sequences were of any length how would you combine them without having to type out each sequence?
Nov 19 '10 #7

bvdet
Expert Mod 2.5K+
P: 2,851
You would not have to type in anything if you had a disk file to read from. The key is to parse the file as it is read. In this case I might do this:
Expand|Select|Wrap|Line Numbers
  1. f = open(file_name)
  2. sequences = [line.strip() for i, line in enumerate(f) if i>0 and (i==1 or not (i+1)%2)]
  3. f.close()
Nov 19 '10 #8

Post your reply

Sign in to post your reply or Sign up for a free account.