470,872 Members | 1,807 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,872 developers. It's quick & easy.

How to find a overlap between 2 sequences, and make the function return the overlap

I need to make a function that determines the overlap between 2 sequences, and then return the overlap.
The overlap is a coherent sequence that is in the left end of the first sequence, and in the right end of the second sequence.

my sequences are: s1= "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATC TC"
s2= "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCC CTAGC"

my function should be named def getOverlap(left, right)

and should return ‘GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC’
if s1 is the left sequence and s2 the right one.
Jan 2 '13 #1
6 6240
bvdet
2,851 Expert Mod 2GB
Have you developed an algorithm? Have you tried anything yet?
Jan 2 '13 #2
I have tried this:
Expand|Select|Wrap|Line Numbers
  1. left = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
  2. right = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"
  3.  
  4.  
  5. def getOverlap(left,right):
  6.     if left == right[::-1]:
  7.         return ""
  8.     else:
  9.         for i in left:
  10.             for i in right:
  11.                 if left[len(left)-i]==(right[::-1])[len(right)-i]:
  12.                     if False:
  13.                         continue
  14.                     return right[:len(left)-i]



But something is very wrong :)
Jan 2 '13 #3
When i turn the s1, and s2 around, the answer is supposed to be 'c', but i get the same answer.
Jan 2 '13 #4
bvdet
2,851 Expert Mod 2GB
It appears to me you have the sequences switched. The overlap you describe is at the left end of s2 and right end of s1. You can use string method find to find the overlap.

A possible algorithm:
  1. Initialize an empty string and assign to an identifier, let's say "X"
  2. Iterate on the left string, character by character
  3. Assign "X"+next character to an identifier, let's sat "temp"
  4. If the current value of "temp" is found in the the second string, assign the value of "X" to "temp".
  5. If the current value of "temp" is not found in the the second string, return "X"
This assumes the "left" string overlap will always start at the beginning of the string.
Jan 2 '13 #5
bvdet
2,851 Expert Mod 2GB
To add to the qualification above - The algorithm I described does not require the overlap to be at the right end of the "right" string.
Expand|Select|Wrap|Line Numbers
  1. def getOverlap(left, right):
  2.     overlap = ""
  3.     for c in left:
  4.         temp  = overlap+c
  5.         if right.find(temp) >= 0:
  6.             overlap = temp
  7.         else:
  8.             return overlap
  9.     return    
  10.  
  11. s1 = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"
  12. s2 = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
  13.  
  14. print getOverlap(s1,s2)
  15. print getOverlap(s2,s1)
Produces:
Expand|Select|Wrap|Line Numbers
  1. >>> GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC
  2. CG
  3. >>> 
Jan 2 '13 #6
bvdet
2,851 Expert Mod 2GB
You should use the slice operator to determine the overlap, something like this:
Expand|Select|Wrap|Line Numbers
  1. s1 = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"
  2. s2 = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
  3.  
  4. def getOverlap(left, right):
  5.     if left == right[-len(left):]:
  6.         return left
  7.     i = -1
  8.     while True:
  9.         temp = left[0:i]
  10.         if temp == right[-len(temp):]:
  11.             return temp
  12.         elif temp == "":
  13.             return ""
  14.         i -= 1
  15.  
  16. print getOverlap(s1,s2)
  17. print getOverlap(s2,s1)
  18.  
Output:
Expand|Select|Wrap|Line Numbers
  1. >>> GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC
  2. C
  3. >>> 
Jan 2 '13 #7

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

5 posts views Thread by david | last post: by
2 posts views Thread by erictham | last post: by
4 posts views Thread by wongjoekmeu | last post: by
8 posts views Thread by Ravindranath Gummadidala | last post: by
6 posts views Thread by John | last post: by
4 posts views Thread by barcaroller | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.