The problem I'm solving is to take a sequence like 'ATSGS' and make all
the DNA sequences it represents. The A, T, and G are fine but the S
represents C or G. I want to take this input:
[ [ 'A' ] , [ 'T' ] , [ 'C' , 'G' ], [ 'G' ] , [ 'C' , 'G' ] ]
and make the list:
[ 'ATCGC' , 'ATCGG' , 'ATGGC' , 'ATGGG' ]
The code below is what I have so far: 'alphabet' is a dictionary that
designates the set oif base pairs that each letter represents (for
example for S above it gives C and G). I call these ambiguous base
pairs because they could be more then one. Thus the function name
'unambiguate'. It makes a list of sequences with only A T C and Gs and
none of the ambiguous base pair designations.
The function 'unambiguate_bp' takes a sequence and a base pair in it
and returns a set of sequences with that base pair replaced by each of
it's unambiguous possibilities.
The function unambiguate_seq takes a sequence and runs unambiguate_bp
on each base pair in the sequence. Each time it does a base pair it
replaces the set of things it's working on with the output from the
unambiguate_bp. It's a bit confusing. I'd like it to be clearer.
Is there a better way to do this?
--
David Siedband
generation-xml.com
def unambiguate_bp(seq, bp):
seq_set = []
for i in alphabet[seq[bp]]:
seq_set.append(seq[:bp]+i+seq[bp+1:])
return seq_set
def unambiguate_seq(seq):
result = [seq]
for i in range(len(seq)):
result_tmp=[]
for j in result:
result_tmp = result_tmp + unambiguate_bp(j,i)
result = result_tmp
return result
alphabet = {
'A' : ['A'],
'T' : ['T'],
'C' : ['C'],
'G' : ['G'],
'W' : ['A','T'],
'M' : ['A','C'],
'R' : ['A','G'],
'Y' : ['T','C'],
'K' : ['T','G'],
'S' : ['C','G'],
'H' : ['A','T','C'],
'D' : ['A','T','G'],
'V': ['A','G','C'],
'B' : ['C','T','G'],
'N' : ['A','T','C','G']
} 4 1809
David Siedband wrote: [...] Is there a better way to do this? [...]
Take a look at Biopython: http://biopython.org/
Your problem may be solved there already.
David Siedband wrote: The problem I'm solving is to take a sequence like 'ATSGS' and make all the DNA sequences it represents. The A, T, and G are fine but the S represents C or G. I want to take this input:
[ [ 'A' ] , [ 'T' ] , [ 'C' , 'G' ], [ 'G' ] , [ 'C' , 'G' ] ]
and make the list:
[ 'ATCGC' , 'ATCGG' , 'ATGGC' , 'ATGGG' ]
[...]
The code you provide only addresses the first part of your problem, and so
does mine: def disambiguate(seq, alphabet):
.... return[list(alphabet.get(c, c)) for c in seq]
.... alphabet = {
.... "W": "AT",
.... "S": "CG"
.... } disambiguate("ATSGS", alphabet)
[['A'], ['T'], ['C', 'G'], ['G'], ['C', 'G']]
Note that "identity entries" (e. g. mapping "A" to "A") in the alphabet
dictionary are no longer necessary. The list() call in disambiguate() is
most likely superfluous, but I put it in to meet your spec accurately.
Now on to the next step :-)
Peter
alphabet = {
'A': 'A',
'T': 'T',
'C': 'C',
'G': 'G',
'W': 'AT',
'M': 'AC',
'R': 'AG',
'Y': 'TC',
'K': 'TG',
'S': 'CG',
'H': 'ATC',
'D': 'ATG',
'V': 'AGC',
'B': 'CTG',
'N': 'ATCG'
}
expand = lambda t: reduce(lambda r, s: [x+y for x in r for y in
alphabet[s]], t, [''])
print expand('ATSGS')
--------------
['ATCGC', 'ATCGG', 'ATGGC', 'ATGGG']
Hung Jung Lu wrote: ... expand = lambda t: reduce(lambda r, s: [x+y for x in r for y in alphabet[s]], t, ['']) print expand('ATSGS')
Or, for a more verbose version:
multis = dict(W='AT', M='AC', R='AG', Y='TC', K='TG', S='CG',
H='ATC', D='ATG', V='AGC', B='CTG', N='ATCG')
def expanded(string, expansions=multis):
result = ''
for pos, char in enumerate(string):
if char in multis:
break
else:
yield string
raise StopIteration
parts = multis[char]
prelude, string = string[:pos], string[pos+1:]
for expansion in expanded(string, multis):
for middle in parts:
yield prelude + middle + expansion
--Scott David Daniels Sc***********@Acm.Org This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Alex Bell |
last post by:
There have been several postings recently dealing with bugs in
Internet Explorer. Can anyone point me to a review of these bugs and
how to work around them?
Regards, Alex
|
by: Oliver Douglas |
last post by:
adapted from:
http://www.gotmono.com/docs/gnome/bindings/gtk-sharp/buttonwidget.html
// togglebut.cs
using Gtk;
using GtkSharp;
using System;
|
by: John Baker |
last post by:
Hi:
I have two systems, one a W98 and the other XP Home. I have Access 2000 installed on both,
and have run into a difference in the way the two behave.
I have a table on which I wish to reset...
|
by: Otto Blomqvist |
last post by:
Hello!
I have a small database (10MB gz dump). When I do a pg_dump -l -s (to
list the schema) of the original database it takes below 1 second. But
when I do dump of a copy of the database...
|
by: emmba |
last post by:
The premise for my assignment is to write variations of the read_line
functions. The variations are:
a) skips leading whitespace before storing input
b) stops reading at first whitespace...
|
by: Jordan S. |
last post by:
Using .NET 2.0 (C#) I'm writing a small app that will have a "Person" class
that exposes FirstName and LastName properties. At runtime, a "People"
collection will be populated with a few thousand...
|
by: Showjumper |
last post by:
Is one better than the other with handling xml files?
Ashok
|
by: Vcskiran |
last post by:
Hi Friends,
Can anyone help me y the textbox height is reduced or changed in the aspx page, if i response.write in the .vb page and how can i avoid this
anyone plzz plzz help.
thanks
KČ
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
| |