multirember&co

bearophileHUGS

Once in while I too have something to ask. This is a little problem
that comes from a Scheme Book (I have left this thread because this
post contains too much Python code for a Scheme newsgroup):
http://groups.google.com/group/comp....9f78eb4457d08/

The function multiremberandc o is hard (for me still) if done in that
little Scheme subset, but it's very easy with Python. It collects two
lists from a given one, at the end it applies the generic given fun
function to the two collected lists and returns its result:

def multiremberandc o1(el, seq, fun):
l1, l2 = [], []
for x in seq:
if x == el:
l2.append(el)
else:
l1.append(el)
return fun(l1, l2)

data = [1, 'a', 3, 'a', 4, 5, 6, 'a']
print multiremberandc o1('a', data, lambda l1,l2: (len(l1), len(l2)))

More compact:

def multiremberandc o2(el, seq, fun):
l1, l2 = [], []
for x in seq:
[l1, l2][x == el].append(el)
return fun(l1, l2)
A bit cleaner (but I don't like it much):

def multiremberandc o3(el, seq, fun):
l1, l2 = [], []
for x in seq:
(l2 if x == el else l1).append(el)
return fun(l1, l2)
For fun I have tried to make it lazy, if may be useful if seq is a
very long iterable. So I've used tee:

from itertools import ifilter, tee

def multiremberandc o4(el, iseq, fun):
iseq1, iseq2 = tee(iseq)
iterable1 = ifilter(lambda x: x == el, iseq1)
iterable2 = ifilter(lambda x: x != el, iseq2)
return fun(iterable1, iterable2)

def leniter(seq):
count = 0
for el in seq:
count += 1
return count

idata = iter(data)
print multiremberandc o4('a', idata, lambda l1,l2: (leniter(l1),
leniter(l2)))
But from the docs: >in general, if one iterator is going to use most
or all of the data before the other iterator, it is faster to use
list() instead of tee().<

So I have tried to create two iterables for the fun function scanning
seq once only, but I haven't succed so far (to do it I have tried to
use two coroutines with the enhanced generators, sending el to one or
to the other according to the value of x == el, this time I don't show
my failed versions), do you have suggestions?

(Note: in some situations it may be useful to create a "splitting"
function that given an iterable and a fitering predicate, returns two
lazy generators, of the items that satisfy the predicate and of the
items that don't satisfy it, so this exercise isn't totally useless).

Bye,
bearophile

Apr 17 '07 #1

Subscribe Reply

1335

James Stroud

be************@ lycos.com wrote:

Once in while I too have something to ask. This is a little problem
that comes from a Scheme Book (I have left this thread because this
post contains too much Python code for a Scheme newsgroup):
http://groups.google.com/group/comp....9f78eb4457d08/

The function multiremberandc o is hard (for me still) if done in that
little Scheme subset, but it's very easy with Python. It collects two
lists from a given one, at the end it applies the generic given fun
function to the two collected lists and returns its result:

def multiremberandc o1(el, seq, fun):
l1, l2 = [], []
for x in seq:
if x == el:
l2.append(el)
else:
l1.append(el)
return fun(l1, l2)

data = [1, 'a', 3, 'a', 4, 5, 6, 'a']
print multiremberandc o1('a', data, lambda l1,l2: (len(l1), len(l2)))

More compact:

def multiremberandc o2(el, seq, fun):
l1, l2 = [], []
for x in seq:
[l1, l2][x == el].append(el)
return fun(l1, l2)
A bit cleaner (but I don't like it much):

def multiremberandc o3(el, seq, fun):
l1, l2 = [], []
for x in seq:
(l2 if x == el else l1).append(el)
return fun(l1, l2)
For fun I have tried to make it lazy, if may be useful if seq is a
very long iterable. So I've used tee:

from itertools import ifilter, tee

def multiremberandc o4(el, iseq, fun):
iseq1, iseq2 = tee(iseq)
iterable1 = ifilter(lambda x: x == el, iseq1)
iterable2 = ifilter(lambda x: x != el, iseq2)
return fun(iterable1, iterable2)

def leniter(seq):
count = 0
for el in seq:
count += 1
return count

idata = iter(data)
print multiremberandc o4('a', idata, lambda l1,l2: (leniter(l1),
leniter(l2)))
But from the docs: >in general, if one iterator is going to use most
or all of the data before the other iterator, it is faster to use
list() instead of tee().<

So I have tried to create two iterables for the fun function scanning
seq once only, but I haven't succed so far (to do it I have tried to
use two coroutines with the enhanced generators, sending el to one or
to the other according to the value of x == el, this time I don't show
my failed versions), do you have suggestions?

(Note: in some situations it may be useful to create a "splitting"
function that given an iterable and a fitering predicate, returns two
lazy generators, of the items that satisfy the predicate and of the
items that don't satisfy it, so this exercise isn't totally useless).

Bye,
bearophile

I think it might be provable that two iterators are required for this
exercise. In this example, for example, leniter() will be called on one
list and then the other. How will it get an accurate count of one list
without iterating through the other?

I.e.:

data = [1, 'a', 3, 'a', 4, 5, 6, 'a']
^
To count this 'a', it must have gone through the
numbers, so the leniter() can not act independently
on the lists using a single iterator.

James

Apr 17 '07 #2

Paddy

On Apr 17, 1:14 am, bearophileH...@ lycos.com wrote:

Once in while I too have something to ask. This is a little problem
that comes from a Scheme Book (I have left this thread because this
post contains too much Python code for a Scheme newsgroup):http://groups.google.com/group/comp....thread/thread/...

The function multiremberandc o is hard (for me still) if done in that
little Scheme subset, but it's very easy with Python. It collects two
lists from a given one, at the end it applies the generic given fun
function to the two collected lists and returns its result:

def multiremberandc o1(el, seq, fun):
l1, l2 = [], []
for x in seq:
if x == el:
l2.append(el)
else:
l1.append(el)
return fun(l1, l2)

Hi bearophile,

Couldn't you also use count rather than the explicit loop to
get something like:

def multiremberandc o1(el, seq, fun):
l1, l2 = [], []
c = seq.count(e1)
l1 = [el] * c
l2 = [el] * (len(seq) - c)
return fun(l1, l2)

I don't have the book so can't comment on its suitability,
but there you go.

- Paddy.

Apr 17 '07 #3

Paul Rubin

be************@ lycos.com writes:

So I have tried to create two iterables for the fun function scanning
seq once only, but I haven't succed so far (to do it I have tried to
use two coroutines with the enhanced generators, sending el to one or
to the other according to the value of x == el, this time I don't show
my failed versions), do you have suggestions?

I think there is not any way short of using something like list or
tee, if you want both output iterators to be available simultaneously.
Say there are 17 occurrences of el in the input iterator. The only
way to know this is to scan through the whole iterator. But now all
of its elements have to be available again at the output.

Apr 17 '07 #4

attn.steven.kuo

On Apr 16, 5:14 pm, bearophileH...@ lycos.com wrote:

Once in while I too have something to ask. This is a little problem
that comes from a Scheme Book (I have left this thread because this
post contains too much Python code for a Scheme newsgroup):http://groups.google.com/group/comp....thread/thread/...

(snipped)

>
For fun I have tried to make it lazy, if may be useful if seq is a
very long iterable. So I've used tee:

from itertools import ifilter, tee

def multiremberandc o4(el, iseq, fun):
iseq1, iseq2 = tee(iseq)
iterable1 = ifilter(lambda x: x == el, iseq1)
iterable2 = ifilter(lambda x: x != el, iseq2)
return fun(iterable1, iterable2)

def leniter(seq):
count = 0
for el in seq:
count += 1
return count

idata = iter(data)
print multiremberandc o4('a', idata, lambda l1,l2: (leniter(l1),
leniter(l2)))

But from the docs: >in general, if one iterator is going to use most
or all of the data before the other iterator, it is faster to use
list() instead of tee().<

So I have tried to create two iterables for the fun function scanning
seq once only, but I haven't succed so far (to do it I have tried to
use two coroutines with the enhanced generators, sending el to one or
to the other according to the value of x == el, this time I don't show
my failed versions), do you have suggestions?

Scan once, two iterables:

class It(object):
def __init__(self, iseq, fun, fun2):
self.iseq = iseq
self.wanted = fun
self.queue = []
self.divert = fun2
def append(self, item):
self.queue.appe nd(item)
def __iter__(self):
while True:
if self.queue:
yield self.queue.pop( 0)
else:
try:
item = self.iseq.next( )
if self.wanted(ite m):
yield item
else:
self.divert(ite m)
except StopIteration:
raise

class TwoFromOne(obje ct):
def __init__(self, iseq, el):
self.i1 = It(iseq, lambda x: x == el, lambda y:
self.i2.append( y))
self.i2 = It(iseq, lambda x: x != e1, lambda y:
self.i1.append( y))
def getiters(self):
return self.i1, self.i2

def leniter(seq):
count = 0
for el in seq:
count += 1
return count

data = [1, 'a', 3, 'a', 4, 5, 6, 'a']
lazy_eye = TwoFromOne(iter (data), 'a')
it1, it2 = lazy_eye.getite rs()
print (lambda i1, i2: (leniter(i1), leniter(i2)))(i t1, it2)

--
Hope this helps,
Steven

Apr 17 '07 #5

Peter Otten

be************@ lycos.com wrote:

Once in while I too have something to ask. This is a little problem
that comes from a Scheme Book (I have left this thread because this
post contains too much Python code for a Scheme newsgroup):

For fun I have tried to make it lazy, if may be useful if seq is a
very long iterable. So I've used tee:

from itertools import ifilter, tee

def multiremberandc o4(el, iseq, fun):
iseq1, iseq2 = tee(iseq)
iterable1 = ifilter(lambda x: x == el, iseq1)
iterable2 = ifilter(lambda x: x != el, iseq2)
return fun(iterable1, iterable2)

def leniter(seq):
count = 0
for el in seq:
count += 1
return count

idata = iter(data)
print multiremberandc o4('a', idata, lambda l1,l2: (leniter(l1),
leniter(l2)))
But from the docs: >in general, if one iterator is going to use most
or all of the data before the other iterator, it is faster to use
list() instead of tee().<

So I have tried to create two iterables for the fun function scanning
seq once only, but I haven't succed so far (to do it I have tried to
use two coroutines with the enhanced generators, sending el to one or
to the other according to the value of x == el, this time I don't show
my failed versions), do you have suggestions?

I think the showstopper here is fun() which could either combine or
operate indepently on both iterables, e. g.:

def fun(a, b): return [a*b for a, b in zip(a, b)]
def fun(a, b): return sum(a), sum(b)

If you changed multirembrandtc o() to accept two functions that are
guaranteed to be independent and to consume all items from their iterable
argument, you could run them in two threads and provide the iterable items
via a Queue that would block fun1() until pending items for fun2() are
processed.

You'd rather cut your ear off? No, that would be multivangoghco( )...

Peter

Apr 17 '07 #6

bearophileHUGS

Paddy:

def multiremberandc o1(el, seq, fun):
l1, l2 = [], []
c = seq.count(e1)
l1 = [el] * c
l2 = [el] * (len(seq) - c)
return fun(l1, l2)

Thank you Paddy, but unfortunately there is a bug in my first
function, this is more correct:

def multiremberandc o1b(el, seq, fun):
l1, l2 = [], []
for x in seq:
if x == el:
l2.append(x)
else:
l1.append(x)
return fun(l1, l2)
So your version may become:

def multiremberandc o1c(el, seq, fun):
l2 = [el] * seq.count(el)
l1 = filter(lambda x: x != el, seq)
return fun(l1, l2)

-------------------

I am grateful to everyone that has answered me, but Steven has
invented what I was looking for. It's code quite twisted on itself :-)
I have modified, simplified and (hopefully) improved Steven's code
like this (but it may be a bit slower, because the class It is inside
the function?):
import collections

def xsplitter(iseq, pred):
class It(object):
def __init__(self, fun, parity):
self.divert = fun
self.parity = parity
self.queue = collections.deq ue()
def append(self, item):
self.queue.appe nd(item)
def __iter__(self):
while True:
if self.queue:
yield self.queue.popl eft()
else:
try:
item = iseq.next()
if pred(item) == self.parity:
yield item
else:
self.divert(ite m)
except StopIteration:
raise

it1 = It(lambda y: it2.append(y), True)
it2 = It(lambda y: it1.append(y), False)
return it1, it2

idata = iter([1, 'a', 3, 'a', 4, 5, 6, 'a'])
it1, it2 = xsplitter(idata , lambda x: x == 'a')

from itertools import izip
for el1, el2 in izip(it1, it2):
print el1, el2
(The izip stops when the first iterable ends, with while-True loop and
more code that they can be printed fully). Now this code is the lazy
splitter I was talking about. I don't know if it can be useful, but I
like it enough. Probably it's useless if one iterable is used whole
before the other one (because a deque becomes filled up more and
more), so I think in most situations a much simpler eager list
splitter is enough. So I don't know if an improved version of it is
fit for the itertools module.

So far I haven't succed using the coroutine Python 2.5 allows using
the generators, and I think still that xsplitter can be done with two
coroutines instead of two It objects. Despite Steven's code I am
unable still to write a working code with coroutines (probably because
I haven't understood how they work yet). This time I take a breath and
I show my wrong code:
import collections

def xsplitter(iseq, pred):
def it(parity):
queue = collections.deq ue()
while True:
received = (yield)
if received is not None:
queue.append(re ceived)
if queue:
yield queue.popleft()
else:
try:
el = iseq.next()
if pred(el) == parity:
yield el
else:
its[not parity].send(el)
except StopIteration:
raise

its = [it(False), it(True)]
return its[True], its[False]

idata = iter([1, 'a', 3, 'a', 4, 5, 6, 'a'])
it1, it2 = xsplitter(idata , lambda x: x == 'a')

from itertools import izip
for el1, el2 in izip(it1, it2):
print el1, el2
It prints:
None None
None None
a 3
None None
a 4
None None
None None
None None
Can it be fixed? Are you able to fix it for me? (If you want you can
think of it as an exercise to understand Python coroutines :-) ).

Bye and thank you,
bearophile

Apr 17 '07 #7

Anton Vredegoor

be************@ lycos.com wrote:

I have modified, simplified and (hopefully) improved Steven's code
like this (but it may be a bit slower, because the class It is inside
the function?):

Here is a yet more simple version, I wonder if it still does the same
thing, whatever it is you are looking for :-)

def xsplitter(iseq, pred):

class It(object):
def __init__(self, fun, parity):
self.divert = fun
self.parity = parity
self.queue = collections.deq ue()

def append(self, item):
self.queue.appe nd(item)

def __iter__(self):
while self.queue:
yield self.queue.popl eft()
for item in iseq:
if pred(item) == self.parity:
yield item
else:
self.divert(ite m)
for x in self:
yield x

it1 = It(lambda y: it2.append(y), True)
it2 = It(lambda y: it1.append(y), False)
return it1, it2

A.

Apr 18 '07 #8

attn.steven.kuo

On Apr 17, 3:52 pm, bearophileH...@ lycos.com wrote:
(snipped)

>
So far I haven't succed using the coroutine Python 2.5 allows using
the generators, and I think still that xsplitter can be done with two
coroutines instead of two It objects. Despite Steven's code I am
unable still to write a working code with coroutines (probably because
I haven't understood how they work yet). This time I take a breath and
I show my wrong code:

import collections

def xsplitter(iseq, pred):
def it(parity):
queue = collections.deq ue()
while True:
received = (yield)
if received is not None:
queue.append(re ceived)
if queue:
yield queue.popleft()
else:
try:
el = iseq.next()
if pred(el) == parity:
yield el
else:
its[not parity].send(el)
except StopIteration:
raise

its = [it(False), it(True)]
return its[True], its[False]

idata = iter([1, 'a', 3, 'a', 4, 5, 6, 'a'])
it1, it2 = xsplitter(idata , lambda x: x == 'a')

from itertools import izip
for el1, el2 in izip(it1, it2):
print el1, el2

It prints:
None None
None None
a 3
None None
a 4
None None
None None
None None

Can it be fixed? Are you able to fix it for me? (If you want you can
think of it as an exercise to understand Python coroutines :-) ).

I don't think coroutine are what's needed here. In particular,
using 'send' will *deplete* the iterator -- the very iterator
to which one is trying to add an element.

If you don't wish to use objects, you can replace them with
a closure:

import collections

def xsplitter(iseq, pred):
queue = [ collections.deq ue(), collections.deq ue() ]
def it(parity):
while True:
if queue[parity]:
yield queue[parity].popleft()
else:
try:
el = iseq.next()
if pred(el) == parity:
yield el
else:
queue[not parity].append(el)
except StopIteration:
raise
its = [ it(False), it(True) ]
return its[True], its[False]
idata = iter([1, 'a', 3, 'a', 4, 5, 6, 'a'])
it1, it2 = xsplitter(idata , lambda x: x == 'a')

from itertools import izip
for el1, el2 in izip(it1, it2):
print el1, el2
Oh, I and do like your rewrite; it's much less
repetitive and cleaner than my original version.

--
Regards,
Steven

Apr 18 '07 #9

Anton Vredegoor

at************* @gmail.com wrote:

>
If you don't wish to use objects, you can replace them with
a closure:

import collections

def xsplitter(iseq, pred):
queue = [ collections.deq ue(), collections.deq ue() ]
def it(parity):
while True:
if queue[parity]:
yield queue[parity].popleft()
else:
try:
el = iseq.next()
if pred(el) == parity:
yield el
else:
queue[not parity].append(el)
except StopIteration:
raise
its = [ it(False), it(True) ]
return its[True], its[False]
idata = iter([1, 'a', 3, 'a', 4, 5, 6, 'a'])
it1, it2 = xsplitter(idata , lambda x: x == 'a')

from itertools import izip
for el1, el2 in izip(it1, it2):
print el1, el2
Oh, I and do like your rewrite; it's much less
repetitive and cleaner than my original version.

But still, the 'while True:' loop and the 'try-except' clause and the
explicit StopIteration are not necessary ...

from collections import deque

def xsplitter(seq, pred):
Q = deque(),deque()
it = iter(seq)
def gen(p):
while Q[p]: yield Q[p].popleft()
for x in it:
if pred(x) == p: yield x
else:
Q[~p].append(x)
for x in gen(p): yield x
return gen(1),gen(0)

def test():
L = 1, 'a', 3, 'a', 4, 5, 6, 'a'
it1, it2 = xsplitter(L, lambda x: x == 'a')
print it1.next()
print it2.next()
print it1.next()

if __name__=='__ma in__':
test()

A.

Apr 18 '07 #10