469,612 Members | 2,246 Online

# grouping a flat list of number by range

hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

best regards,

J.

Jun 1 '06 #1
22 1548
jo******@yahoo.fr wrote:
i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

Don't hold back your code. Would it work to replace each occurrence of

result_list.append([start, stop])

with

yield [start, stop]

?

Peter
Jun 1 '06 #2
In article <11*********************@c74g2000cwc.googlegroups. com>,
<jo******@yahoo.fr> wrote:
hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?
There are probably better ways, but this works

best regards,

J.

class IterInterval(object):
"""Create an iterator which, given a list of integers,
for each run of consecutive integers i...j, yields a two
element list [i, j + 1]

Singleton lists [i] return [i, i + 1]
Empty lists return None
"""
def __init__(self, seq):
self.seq = seq
self.firstval = None
self.index = 0

def __iter__(self):
# firstval = the start of a run of consecutive integers
# lastval = the last value found in the run
# nextval = the most recent value taken from the input list
if not self.firstval:
# set up the first iteration
if self.index >= len(self.seq):
# empty list, return
raise StopIteration
self.firstval = lastval = int(self.seq[self.index])
self.index += 1
while True:
if self.index >= len(self.seq):
# list exhausted, output the last value read
yield [self.firstval, lastval + 1]
raise StopIteration
nextval = int(self.seq[self.index])
self.index += 1
if nextval == lastval + 1:
lastval = nextval
continue
else:
# end of run - output the run, reset for next call
yield [self.firstval, lastval + 1]
self.firstval = lastval = nextval
continue

if __name__ == '__main__':
for l in [[3, 6, 7, 8, 12, 13, 15], [2], []]:
print l, "=>", [lst for lst in IterInterval(l)]

/usr/home/jes% python interval.py
[3, 6, 7, 8, 12, 13, 15] => [[3, 4], [6, 9], [12, 14], [15, 16]]
[3] => [[3, 4]]
[] => []

--
Jim Segrave (je*@jes-2.demon.nl)

Jun 1 '06 #3
jo******@yahoo.fr wrote:
i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

Sure:

def group_intervals(it):
it = iter(it)
val = it.next()
run = [val, val+1]
for val in it:
if val == run[1]:
run[1] += 1
else:
yield run
run = [val, val+1]
yield run

--Ben

Jun 1 '06 #4
jo******@yahoo.fr wrote:
hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

best regards,

J.

I did a proceedural version, then re-read your post and did a generator
based version ;-)

=== interv1 ===
inlist = [3, 6, 7, 8, 12, 13, 15]
tmp = []
for i,val in enumerate(inlist): .... if i==0:
.... tmp.append(val)
.... elif val != valinc:
.... tmp += [valinc, val]
.... valinc = val+1
.... tmp.append(valinc)
tmp [3, 4, 6, 9, 12, 14, 15, 16] tmp[0::2] [3, 6, 12, 15] tmp[1::2] [4, 9, 14, 16] zip(tmp[0::2], tmp[1::2]) [(3, 4), (6, 9), (12, 14), (15, 16)]
=== END interv1 ===

=== interv2 === def interv2(inlist): .... for i,val in enumerate(inlist):
.... if i==0:
.... tmp = val
.... elif val != valinc:
.... yield [tmp, valinc]
.... tmp = val
.... valinc = val+1
.... yield [tmp, valinc]
.... list(interv2(inlist))

[[3, 4], [6, 9], [12, 14], [15, 16]]

=== END interv2 ===

Jun 1 '06 #5
I did a little re-arranging of the generator version:

def interv3(inlist):
tmp = inlist[0]
valinc = tmp+1
for val in inlist[1:]:
if val != valinc:
yield [tmp, valinc];
tmp = val
valinc = val+1
yield [tmp, valinc]

Jun 1 '06 #6
In article <11**********************@c74g2000cwc.googlegroups .com>,
=== interv2 ===
def interv2(inlist):... for i,val in enumerate(inlist):
... if i==0:
... tmp = val
... elif val != valinc:
... yield [tmp, valinc]
... tmp = val
... valinc = val+1
... yield [tmp, valinc]
... list(interv2(inlist))

[[3, 4], [6, 9], [12, 14], [15, 16]]

=== END interv2 ===

This doesn't actually run, changing it to make it do so:

def interv2(inlist):
tmp = valinc = 0
for i,val in enumerate(inlist):
if i==0:
tmp = val
valinc = val + 1
elif val != valinc:
yield [tmp, valinc]
tmp = val
valinc = val+1
yield [tmp, valinc]

it now works, but returns [0, 0] when passed an empty list, when it
should return nothing at all

--
Jim Segrave (je*@jes-2.demon.nl)

Jun 1 '06 #7

Jim Segrave wrote:
In article <11**********************@c74g2000cwc.googlegroups .com>,
=== interv2 ===
> def interv2(inlist):

... for i,val in enumerate(inlist):
... if i==0:
... tmp = val
... elif val != valinc:
... yield [tmp, valinc]
... tmp = val
... valinc = val+1
... yield [tmp, valinc]
...
> list(interv2(inlist))

[[3, 4], [6, 9], [12, 14], [15, 16]]

=== END interv2 ===

This doesn't actually run, changing it to make it do so:

def interv2(inlist):
tmp = valinc = 0
for i,val in enumerate(inlist):
if i==0:
tmp = val
valinc = val + 1
elif val != valinc:
yield [tmp, valinc]
tmp = val
valinc = val+1
yield [tmp, valinc]

it now works, but returns [0, 0] when passed an empty list, when it
should return nothing at all
--
Jim Segrave (je*@jes-2.demon.nl)

Jim, I had tabs/spaces indent problems when cut-n-pasting.

What I ran was more like the version below, but i did a quick
separation of the line that has the ';' in it and goofed.
def interv2(inlist): .... for i,val in enumerate(inlist):
.... if i==0:
.... tmp = val
.... elif val != valinc:
.... yield [tmp, valinc]; tmp = val
.... valinc = val+1
.... yield [tmp, valinc]
.... list(interv2(inlist))

[[3, 4], [6, 9], [12, 14], [15, 16]]

Jun 1 '06 #8
In article <11**********************@u72g2000cwu.googlegroups .com>,
I did a little re-arranging of the generator version:

def interv3(inlist):
tmp = inlist[0]
valinc = tmp+1
for val in inlist[1:]:
if val != valinc:
yield [tmp, valinc];
tmp = val
valinc = val+1
yield [tmp, valinc]

Still fails when passed an empty list, the initial assignment to tmp
is an IndexError

--
Jim Segrave (je*@jes-2.demon.nl)

Jun 1 '06 #9
In article <11**********************@f6g2000cwb.googlegroups. com>,

What I ran was more like the version below, but i did a quick
separation of the line that has the ';' in it and goofed.
def interv2(inlist):... for i,val in enumerate(inlist):
... if i==0:
... tmp = val
... elif val != valinc:
... yield [tmp, valinc]; tmp = val
... valinc = val+1
... yield [tmp, valinc]
... list(interv2(inlist))

[[3, 4], [6, 9], [12, 14], [15, 16]]

Fails on an empty list, as tmp is not defined when it hits the yield
--
Jim Segrave (je*@jes-2.demon.nl)

Jun 1 '06 #10
On 2/06/2006 8:36 AM, Paddy wrote:
jo******@yahoo.fr wrote:
hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

best regards,

J.

I did a proceedural version, then re-read your post and did a generator
based version ;-)

=== interv1 ===
inlist = [3, 6, 7, 8, 12, 13, 15]
tmp = []
for i,val in enumerate(inlist): ... if i==0:
... tmp.append(val)
... elif val != valinc:
... tmp += [valinc, val]
... valinc = val+1
... tmp.append(valinc)
tmp [3, 4, 6, 9, 12, 14, 15, 16] tmp[0::2] [3, 6, 12, 15] tmp[1::2] [4, 9, 14, 16] zip(tmp[0::2], tmp[1::2]) [(3, 4), (6, 9), (12, 14), (15, 16)]

=== END interv1 ===

=== interv2 === def interv2(inlist): ... for i,val in enumerate(inlist):
... if i==0:
... tmp = val
... elif val != valinc:
... yield [tmp, valinc]
... tmp = val
... valinc = val+1
... yield [tmp, valinc]
... list(interv2(inlist))

[[3, 4], [6, 9], [12, 14], [15, 16]]

=== END interv2 ===

Oh the siren call of every new feature in the language!
enumerate() just to get a first-time test, and then botch it??

Read the following; the replacement version uses a simple old-fashioned
inelegant flag, works with an empty sequence, and doesn't depend on the
input being sliceable or indexable.

HTH,
John

C:\junk>type interval.py
def interv2(inlist):
for i,val in enumerate(inlist):
if i==0:
tmp = val
elif val != valinc:
yield [tmp, valinc]
tmp = val
valinc = val+1
yield [tmp, valinc]

def interv3(inseq):
first = True
for val in inseq:
if first:
tmp = val
first = False
elif val != valinc:
yield [tmp, valinc]
tmp = val
valinc = val + 1
if not first:
yield [tmp, valinc]

tests = [
[3, 4, 6, 9, 12, 14, 15, 16],
[3, 3, 3],
[],
]

for func in (interv3, interv2):
for test in tests:
print "%s(%s) ->" % (func.__name__, test)
result = list(func(test))
print "\t%r" % result

C:\junk>interval.py
interv3([3, 4, 6, 9, 12, 14, 15, 16]) ->
[[3, 5], [6, 7], [9, 10], [12, 13], [14, 17]]
interv3([3, 3, 3]) ->
[[3, 4], [3, 4], [3, 4]]
interv3([]) ->
[]
interv2([3, 4, 6, 9, 12, 14, 15, 16]) ->
[[3, 5], [6, 7], [9, 10], [12, 13], [14, 17]]
interv2([3, 3, 3]) ->
[[3, 4], [3, 4], [3, 4]]
interv2([]) ->
Traceback (most recent call last):
File "C:\junk\interval.py", line 33, in ?
result = list(func(test))
File "C:\junk\interval.py", line 9, in interv2
yield [tmp, valinc]
UnboundLocalError: local variable 'tmp' referenced before assignment

C:\junk>
Jun 1 '06 #11
jo******@yahoo.fr wrote:
hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

best regards,

a list comprehension/itertools version (this won't work with an empty
list):

from itertools import groupby

a = [3, 6, 7, 8, 12, 13, 15]

result = [[3, 4], [6,9], [12, 14], [15, 16]]

b = [ list(g)[0] for k,g in groupby(range(a[0],a[-1]+2), lambda x:
x in a)]

c = [ b[i:i+2] for i in range(0,len(a),2) ]

assert c == result
Gerard

Jun 2 '06 #12

Gerard Flanagan wrote:
jo******@yahoo.fr wrote:
hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

i was able to to it without generators/yield but i think it could be
better with them, may be do you an idea?

best regards,

a list comprehension/itertools version (this won't work with an empty
list):

from itertools import groupby

a = [3, 6, 7, 8, 12, 13, 15]

result = [[3, 4], [6,9], [12, 14], [15, 16]]

b = [ list(g)[0] for k,g in groupby(range(a[0],a[-1]+2), lambda x:
x in a)]

c = [ b[i:i+2] for i in range(0,len(a),2) ]

assert c == result
Gerard

change len(a) to len(b) in case a has duplicates like
[3,3,3,6,7,8,12,13,15]

Gerard

Jun 2 '06 #13

Jim Segrave wrote:
In article <11**********************@f6g2000cwb.googlegroups. com>,

What I ran was more like the version below, but i did a quick
separation of the line that has the ';' in it and goofed.
> def interv2(inlist):

... for i,val in enumerate(inlist):
... if i==0:
... tmp = val
... elif val != valinc:
... yield [tmp, valinc]; tmp = val
... valinc = val+1
... yield [tmp, valinc]
...
> list(interv2(inlist))

[[3, 4], [6, 9], [12, 14], [15, 16]]

Fails on an empty list, as tmp is not defined when it hits the yield
--
Jim Segrave (je*@jes-2.demon.nl)

Yep, I still, purposfully, decided not to put any guard checks in
because:
Its very easy to add.
It detracts from the algorithm.
It might never be called with an empty list in-situ.
It is correct for the original posters testcases.
Yah gotta leave-em something to do ;-)

But you-too are right. Don't just cut-n-paste newsgroup solutions. they
need testing for your particular use. I doubt anyone on C.L.P. would be
malicious, but your actual use of a function having a wider scope to
inputs than mentioned might be fairy common.

(Damn, I wanted to use 'caveat emptor', but it does not apply as
nothing is being bought. Oh well :-)

Jun 2 '06 #14

John Machin wrote:
On 2/06/2006 8:36 AM, Paddy wrote:
Oh the siren call of every new feature in the language!
enumerate() just to get a first-time test, and then botch it??

Read the following; the replacement version uses a simple old-fashioned
inelegant flag, works with an empty sequence, and doesn't depend on the
input being sliceable or indexable.

HTH,
John

Thanks, un-botchable John. What if the poster doesn't send a null list?
What if the correct result for a null list is "spam"; or an exception?
What if.... ... we toned down the language?

Jun 2 '06 #15
In article <11*********************@j55g2000cwa.googlegroups. com>,

John Machin wrote:
On 2/06/2006 8:36 AM, Paddy wrote:

Oh the siren call of every new feature in the language!
enumerate() just to get a first-time test, and then botch it??

Read the following; the replacement version uses a simple old-fashioned
inelegant flag, works with an empty sequence, and doesn't depend on the
input being sliceable or indexable.

HTH,
John

Thanks, un-botchable John. What if the poster doesn't send a null list?
What if the correct result for a null list is "spam"; or an exception?
What if.... ... we toned down the language?

I'd still rate Ben Cartwright's solution as the best. The one I posted
tried to do the same, but I didn't see the right way to iterate over
the supplied list inside the function, wheras he did. It's short,
simple and correct.

If you really wanted to return 'spam' or an exception in his solution,
it's a simple if statement at the start of the function.

--
Jim Segrave (je*@jes-2.demon.nl)

Jun 2 '06 #16
jo******@yahoo.fr wrote:
i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

Know your itertools. From the examples section[1]:

"""
# Find runs of consecutive numbers using groupby. The key to the
# solution is differencing with a range so that consecutive numbers all
# appear in same group.
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
for k, g in groupby(enumerate(data), lambda (i,x):i-x): .... print map(operator.itemgetter(1), g)
....
[1]
[4, 5, 6]
[10]
[15, 16, 17, 18]
[22]
[25, 26, 27, 28]
"""
So I think something like this should work:
import itertools
def intervals(numbers): .... def key((i, n)):
.... return i - n
.... for key, group in itertools.groupby(enumerate(numbers), key):
.... group = list(group)
.... _, first_n = group[0]
.... _, last_n = group[-1]
.... yield first_n, last_n + 1
.... list(intervals([3, 6, 7, 8, 12, 13, 15]))

[(3, 4), (6, 9), (12, 14), (15, 16)]

If you really need lists instead of tuples, just put brackets around the
terms in the yield statement.

[1] http://docs.python.org/lib/itertools-example.html

STeVe
Jun 2 '06 #17

thanks to all !

my version was far less clever/long thant the one you posted, i'm going
to examine all these with much interest, and learn...

best regards.

Jun 3 '06 #18
jo******@yahoo.fr wrote:
hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

Just another 'itertools.groupby' variation. It considers the whole
range of numbers spanned by the dataset - eg. in your example,
range(3,17) - so possibly not very efficient if the range is large and
the data sparse within the range.

def get_intervals(data):
intervals = [ [], [] ]
for k,g in groupby(range(data[0],data[-1]+2), lambda x:x in data):
intervals[k].append( list(g)[0] ) #k is 0 or 1
return zip(intervals[1],intervals[0])

a = [3, 6, 7, 8, 12, 13, 15]

assert get_intervals(a) == [(3,4),(6, 9),(12,14),(15,16)]
Gerard

Jun 3 '06 #19
Gerard Flanagan wrote:
jo******@yahoo.fr wrote:
hello,

i'm looking for a way to have a list of number grouped by consecutive
interval, after a search, for example :

[3, 6, 7, 8, 12, 13, 15]

=>

[[3, 4], [6,9], [12, 14], [15, 16]]

(6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 =>
[6:9], and so on)

Just another 'itertools.groupby' variation. It considers the whole
range of numbers spanned by the dataset - eg. in your example,
range(3,17) - so possibly not very efficient if the range is large and
the data sparse within the range.

def get_intervals(data):
intervals = [ [], [] ]
for k,g in groupby(range(data[0],data[-1]+2), lambda x:x in data):
intervals[k].append( list(g)[0] ) #k is 0 or 1
return zip(intervals[1],intervals[0])

If you're gonna go this route, you should definitely use a set to check
containment; ``x in data`` is going to be costly for long lists.
Defining your key something like the following would be better::

data_set = set(data)
def key(x):
return x in data_set

STeVe

Jun 3 '06 #20
Hello
... _, first_n = group[0]

what is the meaning of the underscore "_" ? is it a special var ? or
should it be readed as a way of unpacking a tuple in a non useful var ?
like

lost, first_n = group[0]

best regards.

Jun 4 '06 #21
jo******@yahoo.fr wrote:
... _, first_n = group[0]

what is the meaning of the underscore "_" ? is it a special var ? or
should it be readed as a way of unpacking a tuple in a non useful var ?
like

lost, first_n = group[0]

Yep, it's just another name. "lost" would have worked just as well.
It's pretty typical around here to see "_" used when a tuple is unpacked
but some of the values aren't used. So if you see something like::

for _, dirs, _ in os.walk(top):
...

it just means that the "..." code is only going to use "dirs" (not the
root path or files that are also available from the os.walk() iterator).

STeVe
Jun 4 '06 #22
Yup! '_' is just used as a dummy. Its a pretty common idiom. There's
nothing special about that variable.

jo******@yahoo.fr wrote:
Hello
... _, first_n = group[0]

what is the meaning of the underscore "_" ? is it a special var ? or
should it be readed as a way of unpacking a tuple in a non useful var ?
like

lost, first_n = group[0]

best regards.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEgyKIrgn0plK5qqURAoloAJ9eU5hXHkBbQjL178EbQc XpShhplwCgqNP2
RWz6aT3MHAT34VO55ndM0KY=
=xuv0
-----END PGP SIGNATURE-----

Jun 4 '06 #23

### This discussion thread is closed

Replies have been disabled for this discussion.