472,805 Members | 911 Online

# Break up list into groups

All,

I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?

Here's what I've done and the expected output of the algorithm:

def splitIntoGroups(data):
groups = []
local = []

for value in data:
if 0x80 & value:
if len(local) 0:
groups.append(local)

local = []
local.append(value)
else:
local.append(value)

if len(local) 0:
groups.append(local)

return groups

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

print splitIntoGroups(l)

Desired result:

[[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
13], [240, 14], [241, 15]]

Thanks,

Dan McLeran

Jul 16 '07 #1
12 1532
On Mon, 2007-07-16 at 14:11 -0700, da********@yahoo.com wrote:
I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?

Here's what I've done and the expected output of the algorithm:

def splitIntoGroups(data):
groups = []
local = []

for value in data:
if 0x80 & value:
if len(local) 0:
groups.append(local)

local = []
local.append(value)
else:
local.append(value)

if len(local) 0:
groups.append(local)

return groups

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

print splitIntoGroups(l)

Desired result:

[[240, 1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12,
13], [240, 14], [241, 15]]
Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
>=240 starts a new group? If so:
groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if i >= 240:
current = []
groups.append(current)
current.append(i)
Jul 16 '07 #2
On Mon, 2007-07-16 at 16:31 -0500, marduk wrote:
Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
=240 starts a new group? If so:

groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if i >= 240:
current = []
groups.append(current)
current.append(i)

Misunderstood... actually that should have read

groups = []
current = [] # probably not necessary, but as a safety
for i in l:
if 240 & i:
current = []
groups.append(current)
current.append(i)

Jul 16 '07 #3
This is a perfect place for a generator:

<code>
seq = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

def gengroups(seq):
group = []
for val in seq:
if val & 0x80 and group:
yield group
group = []
group.append(val)
yield group

if __name__ == "__main__":
print list(gengroups(seq))
</code>

The above assumes that the first value in the input sequence will have
0x80 set. Your implementation seems to makes the same assumption
though.

Also, just a note...
<code>
if len(local) 0:
...
</code>

is better written

<code>
if local:
...
</code>

Jul 16 '07 #4
"da********@yahoo.com" <da********@yahoo.comwrites:
I can't seem to find an answer to this question anywhere, but I'm
still looking. My problem is I have a list of values like this:

l = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6, 0xF1, 7, 8, 0xF2, 9, 10, 11, 12,
13, 0xF0, 14, 0xF1, 15]

A value with bit 0x80 set delineates the start of a new packet of
information. What I want to do is to group the packets so that 1, 2, 3
go with the 1st packet tagged 0xF0, 4 ,5, 6 go with the 2nd packet
tagged 0xF0, 7 & 8 go with the packet tagged 0xF1 and so on. The
length of the data associated with each tag can vary. I've already
written an algorithm to do this but I was wondering if some
combination of itertools functions could do the job faster?
See:

Jul 16 '07 #5
On Jul 16, 3:56 pm, marduk <mar...@nbk.hopto.orgwrote:
On Mon, 2007-07-16 at 16:31 -0500, marduk wrote:
Assuming you meant '0xF0' instead of '0x80'.... do you mean any value
>=240 starts a new group? If so:
No, I meant 0x80. 0x80 is only the most significant bit of the 0xF0
value. Every time this bit is set indicates the start of a new group.
Anyway, I like this algorithm better than mine, less code. I've
modified yours to make the fn a parameter:

def splitIntoGroups(data, fn):
groups = []
current = []
for i in data:
if fn(i):
current = []
groups.append(current)
current.append(i)

return groups

print splitIntoGroups(l, lambda x : x & 0x80)

Jul 16 '07 #6
Here's an ugly way I wouldn't recommended (using itertools groupby):
>
pyfrom itertools import groupby
pyalist = [0xF0, 1, 2, 3, 0xF0, 4, 5, 6,
... 0xF1, 7, 8, 0xF2, 9, 10, 11, 12, 13,
... 0xF0, 14, 0xF1, 15]
pydef doit(alist):
... i = (list(g) for k,g in groupby(alist, lambda x: 0xf0&x))
... return [k for k in [j + i.next() for j in i] if len(k)>1]
...
pydoit(alist)

[[240, 1, 2, 3],
[240, 4, 5, 6],
[241, 7, 8],
[242, 9, 10, 11, 12, 13],
[240, 14],
[241, 15]]

James
Wow that is ugly. I think I like the simpler way better.

Thanks

Jul 16 '07 #7

Matt McCredie wrote:
That certainly is fast, unfortunately it doesn't pass all of the tests.
I came up with those tests so I don't know how important they are to the
original poster. I modified it and came up with a generator and a
non-generator version based (roughly) on your algorithm, that are almost
as quick, and pass all of the tests. Some of the modifications were done
just to make it quicker, so it would be fair when comparing against the
other methods. I hard-coded the comparison instead of using a function
and created a function that directly generates and returns a list
instead of a generator. I would probably use the generator version in my
code, but wrapping `list' around a generator adds about 4us (on my
machine). Anyway, getgroups7 passes all of the tests I mentioned and it
was timed at 10.37usec/pass. The down side: the code doesn't seem nearly
as elegant.

Matt
In most cases you wouldn't wrap the generator version in a list(), but use
it directly as a loop iterator.
A little renaming of variables helps it be a bit more elegant I think ...

def getgroups8(seq):
groups = []
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
groups.append(seq[start:stop])
start = stop
groups.append(seq[start:])
return groups

This passes all the tests and runs about the same speed.
Cheers,
Ron

<code>
def gengroups7(seq):
iseq = iter(xrange(len(seq)))
start = 0
for i in iseq:
if seq[i]&0x80:
start = i
break
else:
return
for i in iseq:
if seq[i]&0x80:
yield seq[start:i]
start = i
yield seq[start:]
def getgroups7(seq):
groups = []
iseq = iter(xrange(len(seq)))
start = 0
for i in iseq:
if seq[i]&0x80:
start = i
break
else:
return groups
for i in iseq:
if seq[i]&0x80:
groups.append(seq[start:i])
start = i
groups.append(seq[start:])
return groups

</code>

Jul 18 '07 #8

Matt McCredie wrote:
That certainly is fast, unfortunately it doesn't pass all of the tests.
I came up with those tests so I don't know how important they are to the
original poster. I modified it and came up with a generator and a
non-generator version based (roughly) on your algorithm, that are almost
as quick, and pass all of the tests. Some of the modifications were done
just to make it quicker, so it would be fair when comparing against the
other methods. I hard-coded the comparison instead of using a function
and created a function that directly generates and returns a list
instead of a generator. I would probably use the generator version in my
code, but wrapping `list' around a generator adds about 4us (on my
machine). Anyway, getgroups7 passes all of the tests I mentioned and it
was timed at 10.37usec/pass. The down side: the code doesn't seem nearly
as elegant.

Matt
In most cases you wouldn't wrap the generator version in a list(), but use
it directly as a loop iterator.
A little renaming of variables helps it be a bit more elegant I think ...

def getgroups8(seq):
groups = []
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
groups.append(seq[start:stop])
start = stop
groups.append(seq[start:])
return groups

This passes all the tests and runs about the same speed.
Cheers,
Ron

<code>
def gengroups7(seq):
iseq = iter(xrange(len(seq)))
start = 0
for i in iseq:
if seq[i]&0x80:
start = i
break
else:
return
for i in iseq:
if seq[i]&0x80:
yield seq[start:i]
start = i
yield seq[start:]
def getgroups7(seq):
groups = []
iseq = iter(xrange(len(seq)))
start = 0
for i in iseq:
if seq[i]&0x80:
start = i
break
else:
return groups
for i in iseq:
if seq[i]&0x80:
groups.append(seq[start:i])
start = i
groups.append(seq[start:])
return groups

</code>

Jul 18 '07 #9
Hello Dan,

Yet another option (using itertools.groupby):

from itertools import groupby

class GrouperToggler:
def __init__(self):
self.group = 1

def __call__(self, value):
# New packet, toggle group
if value & 0x80:
self.group = 1 - self.group
return self.group

def group(items):
for group, items in groupby(items, GrouperToggler()):
# groupby return [key, group_iterator]
yield [item for item in items]

i = [
0xF0, 1, 2, 3,
0xF0, 4, 5, 6,
0xF1, 7, 8,
0xF2, 9, 10, 11, 12, 13,
0xF0, 14,
0xF1, 15
]

for g in group(i):
print g

HTH,
--
Miki <mi*********@gmail.com>
http://pythonwise.blogspot.com

Jul 18 '07 #10
A little renaming of variables helps it be a bit more elegant I think ...
>
def getgroups8(seq):
groups = []
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
groups.append(seq[start:stop])
start = stop
groups.append(seq[start:])
return groups
Excellent work! One more modification and I get below 10us/pass:

def getgroups(seq):
groups = []
push = groups.append
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
push(seq[start:stop])
start = stop
push(seq[start:])
return groups

-Matt

Jul 19 '07 #11

Matimus wrote:
Excellent work! One more modification and I get below 10us/pass:

def getgroups(seq):
groups = []
push = groups.append
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
push(seq[start:stop])
start = stop
push(seq[start:])
return groups

-Matt

Looks good to me. :-)

So a generator versions would be...

(not tested)

def getgroups(seq):
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
yield seq[start:stop]
start = stop
yield seq[start:]

(I also wanted to compare this to Georges solution, maybe later.)

Now if there is some way to generalize this so it can be used in a broader
range of situations without loosing too much of it's efficiency. Of course
then maybe group by would be better. <shrug>

Cheers,
Ron
Jul 20 '07 #12

Matimus wrote:
Excellent work! One more modification and I get below 10us/pass:

def getgroups(seq):
groups = []
push = groups.append
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
push(seq[start:stop])
start = stop
push(seq[start:])
return groups

-Matt

Looks good to me. :-)

So a generator versions would be...

(not tested)

def getgroups(seq):
iseq = iter(xrange(len(seq)))
for start in iseq:
if seq[start] & 0x80:
for stop in iseq:
if seq[stop] & 0x80:
yield seq[start:stop]
start = stop
yield seq[start:]

(I also wanted to compare this to Georges solution, maybe later.)

Now if there is some way to generalize this so it can be used in a broader
range of situations without loosing too much of it's efficiency. Of course
then maybe group by would be better. <shrug>

Cheers,
Ron

Jul 20 '07 #13

This thread has been closed and replies have been disabled. Please start a new discussion.