445,885 Members | 1,474 Online Need help? Post your question and get tips & solutions from a community of 445,885 IT Pros & Developers. It's quick & easy.

# Merging overlapping spans/ranges

11 Replies

 P: n/a This is the problem of finding the connected components inside an interval graph. You can implement the algorithms yourself, of you can use my graph data structure here: http://sourceforge.net/projects/pynetwork/ The graph methods: createIntervalgGraph And: connectedComponents can probably solve your problem quite fast, algorithmically, and with few lines of code. If you need more help, then ask for it and I'll give the little code needed. Bear hugs, Bearophile Jul 19 '05 #2

 P: n/a The linear method: You create an array - one bool per minute. For one day 24 * 60 entries is enough. Spans (Start, End) are in minutes from midnight. Set array slots in range(Start, End) to True for each input span. Scan the array and find metaspans - contiguous sequences of False. Jul 19 '05 #4

 P: n/a Max M wrote: I am writing a "find-free-time" function for a calendar. There are a lot of time spans with start end times, some overlapping, some not. To find the free time spans, I first need to convert the events into a list of non overlapping time spans "meta-spans". "Almost" linear method (includes .sort() which is afaik N*logN): [Set all priority to 1, or you will get changes in priorities as well.] def merge(p): """ p is a seq of (start, priority, end) returns list of merged events: (start1, pri), (end1, 0), (start2, pri), (end2, 0), ... """ all=[] # list of x, h, up, id for id,(l,h,r) in enumerate(p): all.append((l,h,1,id)) all.append((r,h,0,id)) all.sort() sl = [] # skyline solution slx = slh = 0 # current values vis = {} # active {id:h} for x,h,up,id in all: if up: vis[id]=h if h>slh: sl.append((x,h)) slh=h else: del vis[id] assert h<=slh if h==slh: v = vis.values() if v: h = max(v) else: h = 0 sl.append((x,h)) slh=h slx=x # merge same time events s=dict(sl) sl=s.keys() sl.sort() return [(k,s[k]) for k in sl] Jul 19 '05 #5

 P: n/a el*******@hotmail.com wrote: The linear method: You create an array - one bool per minute. For one day 24 * 60 entries is enough. Spans (Start, End) are in minutes from midnight. Set array slots in range(Start, End) to True for each input span. Scan the array and find metaspans - contiguous sequences of False. Yeah, this would basically have been my suggestion. One implementation: py> def merge(spans, nbins): .... bins = *nbins .... for start, end in spans: .... for bin in xrange(start, end): .... bins[bin] += 1 .... def key((i, count)): .... return count != 0 .... index_groups = itertools.groupby(enumerate(bins), key=key) .... for isnonzero, indices in index_groups: .... if isnonzero: .... indices = [i for (i, count) in indices] .... yield (indices, indices[-1]) .... py> spans = [(0,3), (4,7), (2,5), (9,14), (12,17)] py> list(merge(spans, 20)) [(0, 6), (9, 16)] Obviously, this needs some modification to handle datetime objects. STeVe Jul 19 '05 #6

 P: n/a On Tue, 10 May 2005 15:14:47 +0200, Max M wrote: I am writing a "find-free-time" function for a calendar. There are a lotof time spans with start end times, some overlapping, some not.To find the free time spans, I first need to convert the events into alist of non overlapping time spans "meta-spans".This nice ascii graph should show what I mean.1) ---2) ---3) ---4) -----5) ----- ------- -------- # meta spansI can then iterate through the meta-spans and find non-busy times.I have written the class below, but it is rather O^2, so I wondered ifanybody has an idea for a better approach? Maybe (not tested beyond what you see ;-) def mergespans(spans): ... start = end = None ... for s,e in sorted(spans): ... if start is None: start, end = s,e; continue ... if s <= end: end = e; continue ... yield start, end ... start,end = s,e ... if start is not None: yield start, end ... spans = [(0,3), (4,7), (2,5), (9,14), (12,17)] list(mergespans(spans)) [(0, 7), (9, 17)] Regards, Bengt Richter Jul 19 '05 #7

 P: n/a On Tue, 10 May 2005 15:14:47 +0200, Max M wrote: I am writing a "find-free-time" function for a calendar. There are a lotof time spans with start end times, some overlapping, some not.To find the free time spans, I first need to convert the events into alist of non overlapping time spans "meta-spans".This nice ascii graph should show what I mean.1) ---2) ---3) ---4) -----5) ----- ------- -------- # meta spansI can then iterate through the meta-spans and find non-busy times.I have written the class below, but it is rather O^2, so I wondered ifanybody has an idea for a better approach? spans = [(0,3), (4,7), (2,5), (9,14), (12,17)] Non-recommended one-liner: reduce(lambda L,se: (L and se<=L[-1] and (L.append((L.pop(), se)) or L)) or L.append(se) or L ,sorted(spans), []) [(0, 7), (9, 17)] Regards, Bengt Richter Jul 19 '05 #8

 P: n/a Bengt Richter wrote: On Tue, 10 May 2005 15:14:47 +0200, Max M wrote:I am writing a "find-free-time" function for a calendar. There are a lotof time spans with start end times, some overlapping, some not.To find the free time spans, I first need to convert the events into alist of non overlapping time spans "meta-spans".This nice ascii graph should show what I mean.1) ---2) ---3) ---4) -----5) ------------ -------- # meta spansI can then iterate through the meta-spans and find non-busy times.I have written the class below, but it is rather O^2, so I wondered ifanybody has an idea for a better approach? Maybe (not tested beyond what you see ;-) It is with some trepidation that I write this message; after hanging around in c.l.p for the better part of a year, I have come to expect Bengt's messages to be right-on and error-free. However this time... :) >>> def mergespans(spans): ... start = end = None ... for s,e in sorted(spans): ... if start is None: start, end = s,e; continue ... if s <= end: end = e; continue ... yield start, end ... start,end = s,e ... if start is not None: yield start, end ... >>> spans = [(0,3), (4,7), (2,5), (9,14), (12,17)] >>> list(mergespans(spans)) [(0, 7), (9, 17)] There can be a problem in mergespans if one span fits inside another: spans = [(0,5), (4,7), (2,3), (9,14), (12,17)] list(mergespans(spans)) [(0, 3), (4, 7), (9, 17)] Here is a revised version (not tested beyond what you see ;-) def mergespans(spans): ... start = end = None ... for s,e in sorted(spans): ... if start is None: ... start, end = s,e ... continue ... if s <= end: ... if end < e: ... end = e ... continue ... yield start, end ... start,end = s,e ... if start is not None: ... yield start, end ... list(mergespans([(0,5), (4,7), (2,3), (9,14), (12,17)])) [(0, 7), (9, 17)] list(mergespans([(0,3), (4,7), (2,5), (9,14), (12,17)])) [(0, 7), (9, 17)] Can anyone find any other errors in this? ;-) With humble regards, Jim Sizelove Jul 19 '05 #9

 P: n/a Should work fine as far as I can see. Of course, thats not very 'pythonic' - I should probably give at least 10 different unit tests that it passes ;) Its gratifying to know I'm not the only one who needed that final yield. Jim Sizelove wrote: Bengt Richter wrote: On Tue, 10 May 2005 15:14:47 +0200, Max M wrote:I am writing a "find-free-time" function for a calendar. There are a lotof time spans with start end times, some overlapping, some not.To find the free time spans, I first need to convert the events into alist of non overlapping time spans "meta-spans".This nice ascii graph should show what I mean.1) ---2) ---3) ---4) -----5) ----- >------- -------- # meta spansI can then iterate through the meta-spans and find non-busy times.I have written the class below, but it is rather O^2, so I wondered ifanybody has an idea for a better approach? Maybe (not tested beyond what you see ;-) It is with some trepidation that I write this message; after hanging around in c.l.p for the better part of a year, I have come to expect Bengt's messages to be right-on and error-free. However this time... :) >>> def mergespans(spans): ... start = end = None ... for s,e in sorted(spans): ... if start is None: start, end = s,e; continue ... if s <= end: end = e; continue ... yield start, end ... start,end = s,e ... if start is not None: yield start, end ... >>> spans = [(0,3), (4,7), (2,5), (9,14), (12,17)] >>> list(mergespans(spans)) [(0, 7), (9, 17)] There can be a problem in mergespans if one span fits inside another: >>> spans = [(0,5), (4,7), (2,3), (9,14), (12,17)] >>> list(mergespans(spans)) [(0, 3), (4, 7), (9, 17)] Here is a revised version (not tested beyond what you see ;-) >>> def mergespans(spans): ... start = end = None ... for s,e in sorted(spans): ... if start is None: ... start, end = s,e ... continue ... if s <= end: ... if end < e: ... end = e ... continue ... yield start, end ... start,end = s,e ... if start is not None: ... yield start, end ... >>> list(mergespans([(0,5), (4,7), (2,3), (9,14), (12,17)])) [(0, 7), (9, 17)] >>> list(mergespans([(0,3), (4,7), (2,5), (9,14), (12,17)])) [(0, 7), (9, 17)] Can anyone find any other errors in this? ;-) With humble regards, Jim Sizelove Jul 19 '05 #10

 P: n/a Jim Sizelove wrote: Wow! c.l.py is allmost like an AI program generator. But I guess it helps to ask questions about problems that programmers find interresting :-) The "linear" approach is pretty simple to code and understand. I am just afraid what happens when many users tries to book that 14 day conference in the calendar sometimes this year. 365 * 24 * 60 = 525,600 booleans in the array Well ok a few MBytes ain't too bad on a modern machine. It would even be possible to cut the resolution down to 15 minutes. Yielding 35,040 bools, but the solution is still O^2 with regards to size of "search window" and number of events. Bengts/Jims and Jordans solutions seems to be relatively similar. I will use one of those instead of my own code. Thanks! -- hilsen/regards Max M, Denmark http://www.mxm.dk/ IT's Mad Science Jul 19 '05 #11

 P: n/a On Tue, 10 May 2005 23:53:43 GMT, Jim Sizelove wrote:Bengt Richter wrote: [...] Maybe (not tested beyond what you see ;-) I had that feeling ... somehow the word "max" was trying to get my attention, but I posted without figuring out why ;-/It is with some trepidation that I write this message; after hangingaround in c.l.p for the better part of a year, I have come to expectBengt's messages to be right-on and error-free. Would that 'twere so ;-) However this time... :) >>> def mergespans(spans): ... start = end = None ... for s,e in sorted(spans): ... if start is None: start, end = s,e; continue ... if s <= end: end = e; continue ... yield start, end ... start,end = s,e ... if start is not None: yield start, end ... >>> spans = [(0,3), (4,7), (2,5), (9,14), (12,17)] >>> list(mergespans(spans)) [(0, 7), (9, 17)]There can be a problem in mergespans if one span fits inside another: Good call. I should have thought of it. >>> spans = [(0,5), (4,7), (2,3), (9,14), (12,17)] >>> list(mergespans(spans)) [(0, 3), (4, 7), (9, 17)]Here is a revised version (not tested beyond what you see ;-) >>> def mergespans(spans): ... start = end = None ... for s,e in sorted(spans): ... if start is None: ... start, end = s,e ... continue ... if s <= end: ... if end < e: ... end = e ... continue ... yield start, end ... start,end = s,e ... if start is not None: ... yield start, end ... >>> list(mergespans([(0,5), (4,7), (2,3), (9,14), (12,17)])) [(0, 7), (9, 17)] >>> list(mergespans([(0,3), (4,7), (2,5), (9,14), (12,17)])) [(0, 7), (9, 17)]Can anyone find any other errors in this? ;-) Not off hand. OTOH, I really didn't like the aesthetics of all that crufty logic in my original. Maybe (again, barely tested ;-) def mergespans(spans): ... it = iter(sorted(spans)) ... start, end = it.next() ... for s,e in it: ... if s <= end: ... end = max(end, e) # was what "max" was trying to tell me, I think ... else: ... yield start, end ... start,end = s,e ... yield start, end ... list(mergespans([(0,5), (4,7), (2,3), (9,14), (12,17)])) [(0, 7), (9, 17)] list(mergespans([(0,3), (4,7), (2,5), (9,14), (12,17)])) [(0, 7), (9, 17)] With humble regards, Me too. UIAM we're all still human here (even my favorite 'bots, I think ;-) Regards, Bengt Richter Jul 19 '05 #12

### This discussion thread is closed

Replies have been disabled for this discussion. 