440,036 Members | 1,963 Online
Need help? Post your question and get tips & solutions from a community of 440,036 IT Pros & Developers. It's quick & easy.

# How to get the number of A's in each column?

 P: 13 ...hey guys.. how can you write aprogram that prints the number of A's in each column of a multiple sequence alignment. For example for the multiple alignment below >human ACCT >mouse ACCT >cat TCCT >dog ACAT the output should be 2 0 1 0 Nov 18 '10 #1
7 Replies

 Expert Mod 2.5K+ P: 2,851 We are not here to write programs for you. At least show us some effort on your part to write the code. Nov 18 '10 #2

 P: 13 Expand|Select|Wrap|Line Numbers with open("e:/dna14.txt", "r") as myfile:     data = myfile.readlines()     myfile.close()   for i in range(0, len(data), 1):     data[i] = data[i].rstrip("/n")   column_number = input("Please enter a coumn number: ") column_number = int(column_number)   ch1 = (data[1])[column_number] Acount = 0 Nov 18 '10 #3

 Expert Mod 2.5K+ P: 2,851 You can create a new list from data that contains every other list item (each item is a str) in data. It should look like this: `['ACCT', 'ACCT', 'TCCT', 'ACAT']` Then you can use zip to create a list of tuples. Each tuple would contain the respective column. Expand|Select|Wrap|Line Numbers >>> zip(*sequences) [('A', 'A', 'T', 'A'), ('C', 'C', 'C', 'C'), ('C', 'C', 'C', 'A'), ('T', 'T', 'T', 'T')] >>>  Then: Expand|Select|Wrap|Line Numbers >>> " ".join([str(list(item).count(letter)) for item in zip(*sequences)]) '3 0 1 0' >>>  Nov 19 '10 #4

 P: 13 ok, im having trouble understanding zip(*sequences)..ive never used that before at my level of programming and was wondering if there is an alternative method Nov 19 '10 #5

 Expert Mod 2.5K+ P: 2,851 It can also be accomplished with a list comprehension. Expand|Select|Wrap|Line Numbers >>> sequences ['ACCT', 'ACCT', 'TCCT', 'ACAT', 'CAAT'] >>> [[item[i] for item in sequences] for i in range(len(sequences[0]))] [['A', 'A', 'T', 'A', 'C'], ['C', 'C', 'C', 'C', 'A'], ['C', 'C', 'C', 'A', 'A'], ['T', 'T', 'T', 'T', 'T']] >>>  Nov 19 '10 #6

 P: 13 great, can this way work for any sequence or just this specific problem...if the sequences were of any length how would you combine them without having to type out each sequence? Nov 19 '10 #7

 Expert Mod 2.5K+ P: 2,851 You would not have to type in anything if you had a disk file to read from. The key is to parse the file as it is read. In this case I might do this: Expand|Select|Wrap|Line Numbers f = open(file_name) sequences = [line.strip() for i, line in enumerate(f) if i>0 and (i==1 or not (i+1)%2)] f.close() Nov 19 '10 #8