By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,688 Members | 1,212 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,688 IT Pros & Developers. It's quick & easy.

How to find a overlap between 2 sequences, and make the function return the overlap

P: 4
I need to make a function that determines the overlap between 2 sequences, and then return the overlap.
The overlap is a coherent sequence that is in the left end of the first sequence, and in the right end of the second sequence.

my sequences are: s1= "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATC TC"
s2= "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCC CTAGC"

my function should be named def getOverlap(left, right)

and should return ‘GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC’
if s1 is the left sequence and s2 the right one.
Jan 2 '13 #1
Share this Question
Share on Google+
6 Replies

bvdet
Expert Mod 2.5K+
P: 2,851
Have you developed an algorithm? Have you tried anything yet?
Jan 2 '13 #2

P: 4
I have tried this:
Expand|Select|Wrap|Line Numbers
  1. left = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
  2. right = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"
  3.  
  4.  
  5. def getOverlap(left,right):
  6.     if left == right[::-1]:
  7.         return ""
  8.     else:
  9.         for i in left:
  10.             for i in right:
  11.                 if left[len(left)-i]==(right[::-1])[len(right)-i]:
  12.                     if False:
  13.                         continue
  14.                     return right[:len(left)-i]



But something is very wrong :)
Jan 2 '13 #3

P: 4
When i turn the s1, and s2 around, the answer is supposed to be 'c', but i get the same answer.
Jan 2 '13 #4

bvdet
Expert Mod 2.5K+
P: 2,851
It appears to me you have the sequences switched. The overlap you describe is at the left end of s2 and right end of s1. You can use string method find to find the overlap.

A possible algorithm:
  1. Initialize an empty string and assign to an identifier, let's say "X"
  2. Iterate on the left string, character by character
  3. Assign "X"+next character to an identifier, let's sat "temp"
  4. If the current value of "temp" is found in the the second string, assign the value of "X" to "temp".
  5. If the current value of "temp" is not found in the the second string, return "X"
This assumes the "left" string overlap will always start at the beginning of the string.
Jan 2 '13 #5

bvdet
Expert Mod 2.5K+
P: 2,851
To add to the qualification above - The algorithm I described does not require the overlap to be at the right end of the "right" string.
Expand|Select|Wrap|Line Numbers
  1. def getOverlap(left, right):
  2.     overlap = ""
  3.     for c in left:
  4.         temp  = overlap+c
  5.         if right.find(temp) >= 0:
  6.             overlap = temp
  7.         else:
  8.             return overlap
  9.     return    
  10.  
  11. s1 = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"
  12. s2 = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
  13.  
  14. print getOverlap(s1,s2)
  15. print getOverlap(s2,s1)
Produces:
Expand|Select|Wrap|Line Numbers
  1. >>> GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC
  2. CG
  3. >>> 
Jan 2 '13 #6

bvdet
Expert Mod 2.5K+
P: 2,851
You should use the slice operator to determine the overlap, something like this:
Expand|Select|Wrap|Line Numbers
  1. s1 = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"
  2. s2 = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
  3.  
  4. def getOverlap(left, right):
  5.     if left == right[-len(left):]:
  6.         return left
  7.     i = -1
  8.     while True:
  9.         temp = left[0:i]
  10.         if temp == right[-len(temp):]:
  11.             return temp
  12.         elif temp == "":
  13.             return ""
  14.         i -= 1
  15.  
  16. print getOverlap(s1,s2)
  17. print getOverlap(s2,s1)
  18.  
Output:
Expand|Select|Wrap|Line Numbers
  1. >>> GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC
  2. C
  3. >>> 
Jan 2 '13 #7

Post your reply

Sign in to post your reply or Sign up for a free account.