Hi,
I have a list of numbers each with a +/ margin of error. I need to
identify which ones overlab each other.
For example:
55 +/ 3
20 +/ 2
17 +/ 4
60 +/ 3
#base, max, min
list = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]
In this example the range of list[0] overlaps the range of list[3] AND
list[1] overlaps list[2]
What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.
Thanks!
Erik 10 1970
erikcw <er***********@gmail.comwrites:
What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.
This sounds like a homework problem, so the first thing I'll suggest
is that you figure out exactly what it means for two of those
intervals to overlap. That should let you write a simple program that
gets the right answer, but that can run slowly if the number of lists
gets large. The next thing to do after that is figure out how to
speed it up, if necessary. But do the first part first.
On Jan 31, 4:12*pm, erikcw <erikwickst...@gmail.comwrote:
Hi,
I have a list of numbers each with a +/ margin of error. *I need to
identify which ones overlab each other.
For example:
55 +/ 3
20 +/ 2
17 +/ 4
60 +/ 3
#base, max, min
list = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]
In this example the range of list[0] overlaps the range of list[3] AND
list[1] overlaps list[2]
What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.
This is definitely the best way:
=======================
lst = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]
from itertools import chain
def overlaps(lst):
bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))
inside = {}
for x, i in sorted(bounds):
if inside.pop(i, None) is None:
for j, y in inside.iteritems():
if y != x: yield i, j
inside[i] = x
==============================
>>list(overlaps(lst))
[(1, 2), (3, 0)]

Arnaud
On Jan 31, 8:12 am, erikcw <erikwickst...@gmail.comwrote:
Hi,
I have a list of numbers each with a +/ margin of error. I need to
identify which ones overlab each other.
For example:
55 +/ 3
20 +/ 2
17 +/ 4
60 +/ 3
#base, max, min
list = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]
In this example the range of list[0] overlaps the range of list[3] AND
list[1] overlaps list[2]
What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.
One way would be to use sets and check for intersection:
for idx, s in enumerate(mysets):
for next_idx, next_s in enumerate(mysets[idx+1:]):
if s.intersection(next_s):
print "mylist[%d] and mylist[%d] intersect" % (
idx, idx + next_idx + 1 )

Hope this helps,
Steve
Arnaud Delobelle wrote:
>
This is definitely the best way:
from itertools import chain
def overlaps(lst):
bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))
inside = {}
for x, i in sorted(bounds):
if inside.pop(i, None) is None:
for j, y in inside.iteritems():
if y != x: yield i, j
inside[i] = x
Why not just
def overlaps(lst):
for i,x in enumerate(lst):
for j,y in enumerate(lst):
if i != j and x[1] y[2] x[2]:
yield i,j
It's still n^2. Or am I missing something?
Cheers,
thomas
On Jan 31, 8:12 am, erikcw <erikwickst...@gmail.comwrote:
Hi,
I have a list of numbers each with a +/ margin of error. I need to
identify which ones overlab each other.
For example:
55 +/ 3
20 +/ 2
17 +/ 4
60 +/ 3
#base, max, min
list = [
(55, 58, 52),
(20, 22, 18),
(17, 21, 13),
(60, 63, 57),
]
In this example the range of list[0] overlaps the range of list[3] AND
list[1] overlaps list[2]
What is the best way to in python to identify the list items that
overlap and the items that don't overlap with any other.
Note you just need the leftend and rightend of each interval. Mean
is redundant information. Once you sort the interval, you can just go
from left to right, retaining only necessary information.
Below method is O(n log n) + O (nk) where k is the average overlaps
per interval.
On most average case, first term dominates and hence its O(n log n);
worst case is ofcourse O(n^2) (a simple case is all n intervals
overlapping with each other)
def strip_sort (a, b):
if a[0] < b[0]:
return 1
if a[0] b[0]:
return 1
# To allow overlaps on a point, return L first then R
# If overlap on a point must not be allowed, return 1 below
if a[0] == 'L': return 1
return 0
def overlaps (strips_given):
# split each interval into two items. basically decorate with 'L'
for leftend of the interval, 'R' for right end of the interval
s2 = [((s[0], 'L', i) , (s[1], 'R', i)) for i,s in
enumerate(strips_given)]
strips = []
for s in s2: # flatten out the tuples
strips.append(s[0])
strips.append(s[1])
clique = set() # set of nodes where each overlap with everyone
else in the set
edges = [] # we are constructing a graph on N nodes where edge
i,j implies i and j overlap
# below is O(N log N) where is N is number of intervals
strips.sort(cmp=strip_sort)
for s in strips:
node = s[2]
if s[1] == 'L':
clique.add(node)
if s[1] == 'R':
# below is O(k) where k is clique size (overlaps per
interval)
new_edges = [(node, i) for i in clique if i != node]
edges += new_edges
clique.remove(node)
return edges
def main():
lst = [(52, 58), (18, 22), (13, 21), (57, 63)]
print overlaps(lst)
Output:
[(2, 1), (0, 3)]
Karthik
>
Thanks!
Erik
On Feb 1, 8:17*pm, Karthik Gurusamy <kar1...@gmail.comwrote:
[...]
def strip_sort (a, b):
* * if a[0] < b[0]:
* * * * return 1
* * if a[0] b[0]:
* * * * return 1
* * if a[0] == 'L': return 1
* * return 0
def overlaps (strips_given):
* * s2 = [((s[0], 'L', i) , (s[1], 'R', i)) for i,s in
enumerate(strips_given)]
* * strips = []
* * for s in s2:
* * * * strips.append(s[0])
* * * * strips.append(s[1])
* * clique = set()
* * edges = []
* * strips.sort(cmp=strip_sort)
* * for s in strips:
* * * * node = s[2]
* * * * if s[1] == 'L':
* * * * * * clique.add(node)
* * * * if s[1] == 'R':
* * * * * * new_edges = [(node, i) for i in clique if i !=node]
* * * * * * edges += new_edges
* * * * * * clique.remove(node)
* * return edges
Interesting. This is a long version of the algorithm I posted
earlier.

Arnaud
On Feb 1, 8:34*pm, "Neil Cerutti" <mr.ceru...@gmail.comwrote:
On Feb 1, 2008 3:16 PM, Arnaud Delobelle <arno...@googlemail.comwrote:
The total number of iterations is 1+2+...+n = n(n+1)/2 (when n is the
number of intervals) so this has quadratic behaviour regardless of
input.[...]
But it breaks out of the inner loop as soon as ranges on the right
cannot overlap. The loop is O(N) for all input if I define N as
"number of overlaps", a pretty reasonable definitionit's madness to
expect to report N overlaps with less than N complexity.
Unless I'm making a silly mistake. Again.
No you're not mistaken, I am. I didn't see the 'else: break' bit
which of course makes all the difference. Sorry...
On the point of complexity though, it is only O(N) is N dominates
nlogn (n being the length of input).

Arnaud
Arnaud Delobelle wrote:
(...)
>
from itertools import chain
def overlaps(lst):
bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))
imho, this is a uselessly painful equivalent of
bounds = ((x[k],i) for i,x in enumerate(lst) for k in (1,2))
Cheers, BB
Boris Borcic wrote:
Arnaud Delobelle wrote:
(...)
>> from itertools import chain
def overlaps(lst): bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))
imho, this is a uselessly painful equivalent of
bounds = ((x[k],i) for i,x in enumerate(lst) for k in (1,2))
even more readable :
bounds = ((bound(x),i) for bound in (min,max) for i,x in enumerate(lst))
On Feb 8, 1:48*pm, Boris Borcic <bbor...@gmail.comwrote:
Arnaud Delobelle wrote:
(...)
from itertools import chain
def overlaps(lst):
* * bounds = chain(*(((x[1],i), (x[2], i)) for i,x in enumerate(lst)))
imho, this is a uselessly painful equivalent of
* * * *bounds = ((x[k],i) for i,x in enumerate(lst) for k in (1,2))
Cheers, BB
I did mention that it was awkward (but at the time I just wrote what
came to mind)  thank you for providing a much improved version.
It doesn't deter from the fact that the algorithm is of optimal
complexity (modulo sorting of the list).

Arnaud This discussion thread is closed Replies have been disabled for this discussion. Similar topics
7 posts
views
Thread by Leif KBrooks 
last post: by

5 posts
views
Thread by BlackFireNova 
last post: by

3 posts
views
Thread by Stewart Allen 
last post: by

28 posts
views
Thread by Elfour 
last post: by
 
13 posts
views
Thread by Peter Oliphant 
last post: by

26 posts
views
Thread by bilgekhan 
last post: by

3 posts
views
Thread by djcamo 
last post: by

33 posts
views
Thread by Andreas Prilop 
last post: by
          