By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,646 Members | 1,175 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,646 IT Pros & Developers. It's quick & easy.

need help on generator...

P: n/a
Joh
hello,

i'm trying to understand how i could build following consecutive sets
from a root one using generator :

l = [1,2,3,4]

would like to produce :

[1,2], [2,3], [3,4], [1,2,3], [2,3,4]

but unfortunately can not, i guess i can do it by using sub generator
and maybe enumerate, please if you help could you explain a bit the
trick ? looks like this sub generator thing mess me up.

(i think it should may be also have [1,], [2,], [3,], [1,2,3,4] , and
then be filtered)

bests
Jul 18 '05 #1
Share this Question
Share on Google+
45 Replies


P: n/a
On 21 Jan 2005 05:58:03 -0800
jo******@yahoo.fr (Joh) wrote:
i'm trying to understand how i could build following consecutive sets
from a root one using generator :

l = [1,2,3,4]

would like to produce :

[1,2], [2,3], [3,4], [1,2,3], [2,3,4]

def consecutive_sets(l): .... for i in xrange(2, len(l)):
.... for j in xrange(0, len(l)-i+1):
.... yield l[j:j+i]
.... list(consecutive_sets([1,2,3,4]))

[[1, 2], [2, 3], [3, 4], [1, 2, 3], [2, 3, 4]]

--
Denis S. Otkidach
http://www.python.ru/ [ru]
Jul 18 '05 #2

P: n/a
Joh wrote:
hello,

i'm trying to understand how i could build following consecutive sets
from a root one using generator :

l = [1,2,3,4]

would like to produce :

[1,2], [2,3], [3,4], [1,2,3], [2,3,4]

Do you mean:

[1,2], [2,3], [3,4], [1,2,3], [2,3,4], [1,3,4]
(E.g. all elements in the power set except the empty set, the sets with
one element and the sets with all elements.)
Otherwise, please describe your desired sets in verbal - I cannot see
the point.

Jul 18 '05 #3

P: n/a
On Fri, 2005-01-21 at 17:14 +0300, Denis S. Otkidach wrote:
On 21 Jan 2005 05:58:03 -0800
jo******@yahoo.fr (Joh) wrote:
i'm trying to understand how i could build following consecutive sets
from a root one using generator :

l = [1,2,3,4]

would like to produce :

[1,2], [2,3], [3,4], [1,2,3], [2,3,4]

def consecutive_sets(l):

... for i in xrange(2, len(l)):
... for j in xrange(0, len(l)-i+1):
... yield l[j:j+i]


Since you have a much faster brain than I (though I ended up with
exactly the same thing barring variable names) and beat me to posting
the answer, I'll post the inevitable awful generator expression version
instead:

consecutive_sets = ( x[offset:offset+subset_size]
for subset_size in xrange(2, len(x))
for offset in xrange(0, len(x) + 1 - subset_size) )

--
Craig Ringer

Jul 18 '05 #4

P: n/a
Hi,

I recently read David Mertz (IBM DeveloperWorks) about generators and got
excited about using lazy constructs in my Python programming.

But besides the fact that generators are either produced with the new "yield"
reserved word or by defining the __new__ method in a class definition, I
don't know much about them.

In particular, I don't know what Python constructs does generate a generator.
I know this is now the case for reading lines in a file or with the new
"iterator" package. But what else ? Does Craig Ringer answer mean that list
comprehensions are lazy ? Where can I find a comprehensive list of all the
lazy constructions built in Python ? (I think that to easily distinguish lazy
from strict constructs is an absolute programmer need -- otherwise you always
end up wondering when is it that code is actually executed like in Haskell).

Thank you

Francis Girard
FRANCE

Le vendredi 21 Janvier 2005 15:38, Craig Ringer a écritÂ*:
On Fri, 2005-01-21 at 17:14 +0300, Denis S. Otkidach wrote:
On 21 Jan 2005 05:58:03 -0800

jo******@yahoo.fr (Joh) wrote:
i'm trying to understand how i could build following consecutive sets
from a root one using generator :

l = [1,2,3,4]

would like to produce :

[1,2], [2,3], [3,4], [1,2,3], [2,3,4]

>> def consecutive_sets(l):


... for i in xrange(2, len(l)):
... for j in xrange(0, len(l)-i+1):
... yield l[j:j+i]


Since you have a much faster brain than I (though I ended up with
exactly the same thing barring variable names) and beat me to posting
the answer, I'll post the inevitable awful generator expression version
instead:

consecutive_sets = ( x[offset:offset+subset_size]
for subset_size in xrange(2, len(x))
for offset in xrange(0, len(x) + 1 - subset_size) )

--
Craig Ringer


Jul 18 '05 #5

P: n/a
On Fri, 2005-01-21 at 22:38 +0800, Craig Ringer wrote:
consecutive_sets = ( x[offset:offset+subset_size]
for subset_size in xrange(2, len(x))
for offset in xrange(0, len(x) + 1 - subset_size) )


Where 'x' is list to operate on, as I should've initially noted. Sorry
for the reply-to-self.

I did say "awful" for a reason ;-)

--
Craig Ringer

Jul 18 '05 #6

P: n/a
Joh wrote:
i'm trying to understand how i could build following consecutive sets
from a root one using generator :

l = [1,2,3,4]

would like to produce :

[1,2], [2,3], [3,4], [1,2,3], [2,3,4]

but unfortunately can not, i guess i can do it by using sub generator
and maybe enumerate, please if you help could you explain a bit the
trick ? looks like this sub generator thing mess me up.


Here is an (untested) variant that accepts any iterable while trying to
remain memory-efficient. This makes it necessary to shuffle the order of
the output a bit.

from itertools import tee, islice

def gen(iterable, start, end):
it = iter(iterable)
while True:
it, a = tee(it)
a = tuple(islice(a, end-1))
for sz in xrange(start, len(a)+1):
yield a[:sz]
it.next()
if __name__ == "__main__":
print list(gen(range(1, 5), 2, 4))

# prints:
# [(1, 2), (1, 2, 3), (2, 3), (2, 3, 4), (3, 4)]

Peter

Jul 18 '05 #7

P: n/a
On Fri, 2005-01-21 at 16:05 +0100, Francis Girard wrote:
I recently read David Mertz (IBM DeveloperWorks) about generators and
got excited about using lazy constructs in my Python programming.
Speaking of totally great articles, and indirectly to lazyness (though
not lazyily evaluated constructs), I was really impressed by this
fantastic article on metaclasses:

http://gnosis.cx/publish/programming/metaclass_1.html
http://gnosis.cx/publish/programming/metaclass_2.html

which shows that they're really just not that hard. That saved me an
IMMENSE amount of utterly tedious coding just recently.
But besides the fact that generators are either produced with the new
"yield" reserved word or by defining the __new__ method in a class
definition, I don't know much about them.
They can also be created with a generator expression under Python 2.4. A
generator expression works much like a list comprehension, but returns a
generator instead of a list, and is evaluated lazily. (It also doesn't
pollute the outside namespace with its working variables).
print [ x for x in range(1,10)] [1, 2, 3, 4, 5, 6, 7, 8, 9] print ( x for x in xrange(1,10) ) <generator object at 0x401e40ac> print list(( x for x in xrange(1,10) ))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Not the use of xrange above for efficiency in the generator expressions.
These examples are trivial and pointless, but hopefully get the point
across.
In particular, I don't know what Python constructs does generate a
generator.
As far as I know, functions that use yield, and generator expressions. I
was unaware of the ability to create them using a class with a __new__
method, and need to check that out - I can imagine situations in which
it might be rather handy.

I'm not sure how many Python built-in functions and library modules
return generators for things.
I know this is now the case for reading lines in a file or with the
new "iterator" package. But what else ? Does Craig Ringer answer mean
that list comprehensions are lazy ?
Nope, but generator expressions are, and they're pretty similar.
Where can I find a comprehensive list of all the lazy constructions
built in Python ? (I think that to easily distinguish lazy from strict
constructs is an absolute programmer need -- otherwise you always end
up wondering when is it that code is actually executed like in
Haskell).


I'm afraid I can't help you with that. I tend to take the view that side
effects in lazily executed code are a bad plan, and use lazy execution
for things where there is no reason to care when the code is executed.

--
Craig Ringer
Jul 18 '05 #8

P: n/a
Le vendredi 21 Janvier 2005 16:06, Craig Ringer a écritÂ*:
On Fri, 2005-01-21 at 22:38 +0800, Craig Ringer wrote:
consecutive_sets = ( x[offset:offset+subset_size]
for subset_size in xrange(2, len(x))
for offset in xrange(0, len(x) + 1 - subset_size) )


Where 'x' is list to operate on, as I should've initially noted. Sorry
for the reply-to-self.

I did say "awful" for a reason ;-)

--
Craig Ringer


First, I think that you mean :

consecutive_sets = [ x[offset:offset+subset_size]
for subset_size in xrange(2, len(x))
for offset in xrange(0, len(x) + 1 - subset_size)]

(with square brackets).

Second,

this is not lazy anymore (like Denis S. Otkidach previous answer was) because
the __whole__ list get constructed __before__ any other piece of code have a
chance to execute. The type of consecutive_sets is simply a list, not a
generator.

I'm just trying to understand and obviously I'm missing the point.

Thank you

Francis Girard
FRANCE

Jul 18 '05 #9

P: n/a
Really, thank you Craig Ringer for your great answer.

I'm afraid I can't help you with that. I tend to take the view that side
effects in lazily executed code are a bad plan, and use lazy execution
for things where there is no reason to care when the code is executed.

I completly agree with this. But this is much more true in theory than in
practice. In practice you might end up with big big memory usage with lazy
constructs as the "system" has to store intermediate results (the execution
context in the case of Python). These kinds of phenomena happen all the time
in a language like Haskell -- at least for a beginner like me -- if you don't
pay attention to it ; and this makes the language a lot more difficult to
master. Thus you have to keep an eye on performance even though, in FP, you
shoould just have to "declare" your intentions and let the system manage the
execution path.

http://gnosis.cx/publish/programming/metaclass_1.html
http://gnosis.cx/publish/programming/metaclass_2.html
Thank you, I'll read that.

Francis Girard
FRANCE

Le vendredi 21 Janvier 2005 16:42, Craig Ringer a écritÂ*: On Fri, 2005-01-21 at 16:05 +0100, Francis Girard wrote:
I recently read David Mertz (IBM DeveloperWorks) about generators and
got excited about using lazy constructs in my Python programming.


Speaking of totally great articles, and indirectly to lazyness (though
not lazyily evaluated constructs), I was really impressed by this
fantastic article on metaclasses:

http://gnosis.cx/publish/programming/metaclass_1.html
http://gnosis.cx/publish/programming/metaclass_2.html

which shows that they're really just not that hard. That saved me an
IMMENSE amount of utterly tedious coding just recently.
But besides the fact that generators are either produced with the new
"yield" reserved word or by defining the __new__ method in a class
definition, I don't know much about them.


They can also be created with a generator expression under Python 2.4. A
generator expression works much like a list comprehension, but returns a
generator instead of a list, and is evaluated lazily. (It also doesn't
pollute the outside namespace with its working variables).
print [ x for x in range(1,10)]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
print ( x for x in xrange(1,10) )
<generator object at 0x401e40ac>
print list(( x for x in xrange(1,10) ))


[1, 2, 3, 4, 5, 6, 7, 8, 9]

Not the use of xrange above for efficiency in the generator expressions.
These examples are trivial and pointless, but hopefully get the point
across.
In particular, I don't know what Python constructs does generate a
generator.


As far as I know, functions that use yield, and generator expressions. I
was unaware of the ability to create them using a class with a __new__
method, and need to check that out - I can imagine situations in which
it might be rather handy.

I'm not sure how many Python built-in functions and library modules
return generators for things.
I know this is now the case for reading lines in a file or with the
new "iterator" package. But what else ? Does Craig Ringer answer mean
that list comprehensions are lazy ?


Nope, but generator expressions are, and they're pretty similar.
Where can I find a comprehensive list of all the lazy constructions
built in Python ? (I think that to easily distinguish lazy from strict
constructs is an absolute programmer need -- otherwise you always end
up wondering when is it that code is actually executed like in
Haskell).


I'm afraid I can't help you with that. I tend to take the view that side
effects in lazily executed code are a bad plan, and use lazy execution
for things where there is no reason to care when the code is executed.

--
Craig Ringer


Jul 18 '05 #10

P: n/a
On Fri, 2005-01-21 at 16:54 +0100, Francis Girard wrote:
First, I think that you mean :

consecutive_sets = [ x[offset:offset+subset_size]
for subset_size in xrange(2, len(x))
for offset in xrange(0, len(x) + 1 - subset_size)]

(with square brackets). I'm just trying to understand and obviously I'm missing the point.


Yep. This:

( x for x in xrange(10) )

will return a generator that calculates things as it goes, while this:

[ x for x in xrange(10) ]

will return a list.

Check out:
http://www.python.org/peps/pep-0289.html
http://docs.python.org/whatsnew/node4.html
http://www.python.org/dev/doc/newstyle/ref/genexpr.html
for details.

--
Craig Ringer

Jul 18 '05 #11

P: n/a
Thank you,

I immediately download version 2.4, switching from version 2.3.

Francis Girard
FRANCE

Le vendredi 21 Janvier 2005 17:34, Craig Ringer a écritÂ*:
On Fri, 2005-01-21 at 16:54 +0100, Francis Girard wrote:
First, I think that you mean :

consecutive_sets = [ x[offset:offset+subset_size]
for subset_size in xrange(2, len(x))
for offset in xrange(0, len(x) + 1 - subset_size)]

(with square brackets).

I'm just trying to understand and obviously I'm missing the point.


Yep. This:

( x for x in xrange(10) )

will return a generator that calculates things as it goes, while this:

[ x for x in xrange(10) ]

will return a list.

Check out:
http://www.python.org/peps/pep-0289.html
http://docs.python.org/whatsnew/node4.html
http://www.python.org/dev/doc/newstyle/ref/genexpr.html
for details.

--
Craig Ringer


Jul 18 '05 #12

P: n/a
Francis Girard wrote:
In particular, I don't know what Python constructs does generate a generator.
I know this is now the case for reading lines in a file or with the new
"iterator" package. But what else ? Does Craig Ringer answer mean that list
comprehensions are lazy ? Where can I find a comprehensive list of all the
lazy constructions built in Python ? (I think that to easily distinguish lazy
from strict constructs is an absolute programmer need -- otherwise you always
end up wondering when is it that code is actually executed like in Haskell).


I don't think there is a *list* as such, but there are some rules of thumb for
when lazy evaluation will take place (hopefully others will note any cases that
I missed):

1. Iterators (classes with a next() method, and an __iter__ method that returns
'self') are lazily evaluated, as itr.next() is called to retrieve each value. I
think you will find it is this method, rather than __new__ which is relevant to
creating class-based generators. Note that "for x in itr" automatically calls
itr.next() in order to obtain each new value of the loop variable.
This iterator protocol is the basis of lazy evaluation in Python, and is
described here:
http://www.python.org/dev/doc/devel/lib/typeiter.html

2. Iterables (classes with an __iter__ method) will return a lazy iterator via
iter(obj). Actual iterators return themselves from __iter__, so iter(obj) is a
good way to make sure you have an iterator.

3. Generators (functions that use 'yield' instead of 'return') and generator
expressions (like list comprehensions, but without the square brackets) are
simply concise ways of creating iterators.

4. The utility functions in the itertools module generally return iterators
rather than lists (this shouldn't suprise anyone!)

5. Several builtin functions return iterators rather than lists, specifically
xrange(), enumerate() and reversed(). Other builtins that yield sequences
(range(), sorted(), zip()) return lists.

However, be aware that some things which accept any iterable don't take
advantage of the lazy evaluation, and still cause the whole thing to be created
in memory at once - "".join(itr) is currently one such operation.

The sequence vs iterator distinction is somewhat unfortunate (since it breaks
with TOOWTDI), but completing the transition to an iterator based approach isn't
going to be possible until Python 3.0, when things that currently return lists
can be converted to return iterators (i.e. it has been suggested that the
fundamental construct in Python 3.x should be an iterator just as a list is the
fundamental construct in Python 2.x)

Regards,
Nick.

--
Nick Coghlan | nc******@email.com | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net
Jul 18 '05 #13

P: n/a
Francis Girard <fr************@free.fr> wrote:
...
But besides the fact that generators are either produced with the new "yield"
reserved word or by defining the __new__ method in a class definition, I
don't know much about them.
Having __new__ in a class definition has nothing much to do with
generators; it has to do with how the class is instantiated when you
call it. Perhaps you mean 'next' (and __iter__)? That makes instances
of the class iterators, just like iterators are what you get when you
call a generator.
In particular, I don't know what Python constructs does generate a generator.
A 'def' of a function whose body uses 'yield', and in 2.4 the new genexp
construct.
I know this is now the case for reading lines in a file or with the new
"iterator" package.
Nope, besides the fact that the module you're thinking of is named
'itertools': itertools uses a lot of C-coded special types, which are
iterators but not generators. Similarly, a file object is an iterator
but not a generator.
But what else ?
Since you appear to conflate generators and iterators, I guess the iter
built-in function is the main one you missed. iter(x), for any x,
either raises an exception (if x's type is not iterable) or else returns
an iterator.
Does Craig Ringer answer mean that list
comprehensions are lazy ?
Nope, those were generator expressions.
Where can I find a comprehensive list of all the
lazy constructions built in Python ?
That's yet a different question -- at least one needs to add the
built-in xrange, which is neither an iterator nor a generator but IS
lazy (a historical artefact, admittedly).

But fortunately Python's built-ins are not all THAT many, so that's
about it.
(I think that to easily distinguish lazy
from strict constructs is an absolute programmer need -- otherwise you always
end up wondering when is it that code is actually executed like in Haskell).


Encapsulation doesn't let you "easily distinguish" issues of
implementation. For example, the fact that a file is an iterator (its
items being its lines) doesn't tell you if that's internally implemented
in a lazy or eager way -- it tells you that you can code afile.next() to
get the next line, or "for line in afile:" to loop over them, but does
not tell you whether the code for the file object is reading each line
just when you ask for it, or whether it reads all lines before and just
keeps some state about the next one, or somewhere in between.

The answer for the current implementation, BTW, is "in between" -- some
buffering, but bounded consumption of memory -- but whether that tidbit
of pragmatics is part of the file specs, heh, that's anything but clear
(just as for other important tidbits of Python pragmatics, such as the
facts that list.sort is wickedly fast, 'x in alist' isn't, 'x in adict'
IS...).
Alex
Jul 18 '05 #14

P: n/a
Nick Coghlan <nc******@iinet.net.au> wrote:
5. Several builtin functions return iterators rather than lists, specifically
xrange(), enumerate() and reversed(). Other builtins that yield sequences
(range(), sorted(), zip()) return lists.


Yes for enumerate and reversed, no for xrange:
xx=xrange(7)
xx.next() Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'xrange' object has no attribute 'next'
it SHOULD return an iterator, no doubt, but it doesn't (can't, for
backwards compatibility reasons). Neither does it return a list: it
returns "an `xrange' object", a specialized type that's not an iterator,
though it's iterable. It's a type, btw:
xrange <type 'xrange'>


so it's not surprising that calling it returns instances of it
(enumerate and reversed are also types, but *WITH* 'next'...).
Alex
Jul 18 '05 #15

P: n/a
On Sat, 2005-01-22 at 10:10 +0100, Alex Martelli wrote:
The answer for the current implementation, BTW, is "in between" -- some
buffering, but bounded consumption of memory -- but whether that tidbit
of pragmatics is part of the file specs, heh, that's anything but clear
(just as for other important tidbits of Python pragmatics, such as the
facts that list.sort is wickedly fast, 'x in alist' isn't, 'x in adict'
IS...).


A particularly great example when it comes to unexpected buffering
effects is the file iterator. Take code that reads a header from a file
using an (implicit) iterator, then tries to read() the rest of the file.
Taking the example of reading an RFC822-like message into a list of
headers and a body blob:

..>>> inpath = '/tmp/msg.eml'
..>>> infile = open(inpath)
..>>> for line in infile:
..... if not line.strip():
..... break
..... headers.append(tuple(line.split(':',1)))
..>>> body = infile.read()
(By the way, if you ever implement this yourself for real, you should
probably be hurt - use the 'email' or 'rfc822' modules instead. For one
thing, reinventing the wheel is rarely a good idea. For another, the
above code is horrid - in particular it doesn't handle malformed headers
at all, isn't big on readability/comments, etc.)

If you run the above code on a saved email message, you'd expect 'body'
to contain the body of the message, right? Nope. The iterator created
from the file when you use it in that for loop does internal read-ahead
for efficiency, and has already read in the entire file or at least a
chunk more of it than you've read out of the iterator. It doesn't
attempt to hide this from the programmer, so the file position marker is
further into the file (possibly at the end on a smaller file) than you'd
expect given the data you've actually read in your program.

I'd be interested to know if there's a better solution to this than:

..>>> inpath = '/tmp/msg.eml'
..>>> infile = open(inpath)
..>>> initer = iter(infile)
..>>> headers = []
..>>> for line in initer:
..... if not line.strip():
..... break
..... headers.append(tuple(line.split(':',1)))
..>>> data = ''.join(x for x in initer)

because that seems like a pretty ugly hack (and please ignore the
variable names). Perhaps a way to get the file to seek back to the point
last read from the iterator when the iterator is destroyed?

--
Craig Ringer

Jul 18 '05 #16

P: n/a
Le samedi 22 Janvier 2005 10:10, Alex Martelli a écritÂ*:
Francis Girard <fr************@free.fr> wrote:
...
But besides the fact that generators are either produced with the new
"yield" reserved word or by defining the __new__ method in a class
definition, I don't know much about them.
Having __new__ in a class definition has nothing much to do with
generators; it has to do with how the class is instantiated when you
call it. Perhaps you mean 'next' (and __iter__)? That makes instances
of the class iterators, just like iterators are what you get when you
call a generator.


Yes, I meant "next".

In particular, I don't know what Python constructs does generate a
generator.


A 'def' of a function whose body uses 'yield', and in 2.4 the new genexp
construct.


Ok. I guess I'll have to update to version 2.4 (from 2.3) to follow the
discussion.
I know this is now the case for reading lines in a file or with the new
"iterator" package.


Nope, besides the fact that the module you're thinking of is named
'itertools': itertools uses a lot of C-coded special types, which are
iterators but not generators. Similarly, a file object is an iterator
but not a generator.
But what else ?


Since you appear to conflate generators and iterators, I guess the iter
built-in function is the main one you missed. iter(x), for any x,
either raises an exception (if x's type is not iterable) or else returns
an iterator.


You're absolutly right, I take the one for the other and vice-versa. If I
understand correctly, a "generator" produce something over which you can
iterate with the help of an "iterator". Can you iterate (in the strict sense
of an "iterator") over something not generated by a "generator" ?

Does Craig Ringer answer mean that list
comprehensions are lazy ?


Nope, those were generator expressions.
Where can I find a comprehensive list of all the
lazy constructions built in Python ?


That's yet a different question -- at least one needs to add the
built-in xrange, which is neither an iterator nor a generator but IS
lazy (a historical artefact, admittedly).

But fortunately Python's built-ins are not all THAT many, so that's
about it.
(I think that to easily distinguish lazy
from strict constructs is an absolute programmer need -- otherwise you
always end up wondering when is it that code is actually executed like in
Haskell).


Encapsulation doesn't let you "easily distinguish" issues of
implementation. For example, the fact that a file is an iterator (its
items being its lines) doesn't tell you if that's internally implemented
in a lazy or eager way -- it tells you that you can code afile.next() to
get the next line, or "for line in afile:" to loop over them, but does
not tell you whether the code for the file object is reading each line
just when you ask for it, or whether it reads all lines before and just
keeps some state about the next one, or somewhere in between.


You're right. I was much more talking (mistakenly) about lazy evaluation of
the arguments to a function (i.e. the function begins execution before its
arguments get evaluated) -- in such a case I think it should be specified
which arguments are "strict" and which are "lazy" -- but I don't think
there's such a thing in Python (... well not yet as Python get more and more
akin to FP).
The answer for the current implementation, BTW, is "in between" -- some
buffering, but bounded consumption of memory -- but whether that tidbit
of pragmatics is part of the file specs, heh, that's anything but clear
(just as for other important tidbits of Python pragmatics, such as the
facts that list.sort is wickedly fast, 'x in alist' isn't, 'x in adict'
IS...).
Alex


Thank you

Francis Girard
FRANCE

Jul 18 '05 #17

P: n/a
On Sat, 2005-01-22 at 17:46 +0800, I wrote:
I'd be interested to know if there's a better solution to this than:

.>>> inpath = '/tmp/msg.eml'
.>>> infile = open(inpath)
.>>> initer = iter(infile)
.>>> headers = []
.>>> for line in initer:
.... if not line.strip():
.... break
.... headers.append(tuple(line.split(':',1)))
.>>> data = ''.join(x for x in initer)

because that seems like a pretty ugly hack (and please ignore the
variable names). Perhaps a way to get the file to seek back to the point
last read from the iterator when the iterator is destroyed?


And now, answering my own question (sorry):

Answer: http://tinyurl.com/6avdt

so we can slightly simplify the above to:

..>>> inpath = '/tmp/msg.eml'
..>>> infile = open(inpath)
..>>> headers = []
..>>> for line in infile:
..... if not line.strip():
..... break
..... headers.append(tuple(line.split(':',1)))
..>>> data = ''.join(x for x in infile)

at the cost of making it less clear what's going on and having someone
later go "duh, why isn't he using read() here instead" but can't seem to
do much more than that.

Might it be worth providing a way to have file objects seek back to the
current position of the iterator when read() etc are called? If not, I
favour the suggestion in the referenced post - file should probably fail
noisily, or at least emit a warning.

What are others thoughts on this?

--
Craig Ringer

Jul 18 '05 #18

P: n/a
Francis Girard <fr************@free.fr> wrote:
...
A 'def' of a function whose body uses 'yield', and in 2.4 the new genexp
construct.
Ok. I guess I'll have to update to version 2.4 (from 2.3) to follow the
discussion.


It's worth upgrading even just for the extra speed;-).

Since you appear to conflate generators and iterators, I guess the iter
built-in function is the main one you missed. iter(x), for any x,
either raises an exception (if x's type is not iterable) or else returns
an iterator.


You're absolutly right, I take the one for the other and vice-versa. If I
understand correctly, a "generator" produce something over which you can
iterate with the help of an "iterator". Can you iterate (in the strict sense
of an "iterator") over something not generated by a "generator" ?


A generator function (commonly known as a generator), each time you call
it, produces a generator object AKA a generator-iterator. To wit:
def f(): yield 23 .... f <function f at 0x75fe70> x = f()
x <generator object at 0x75aa58> type(x) <type 'generator'>

A generator expression (genexp) also has a result which is a generator
object:
x = (23 for __ in [0])
type(x) <type 'generator'>
Iterators need not be generator-iterators, by any means. Generally, the
way to make sure you have an iterator is to call iter(...) on something;
if the something was already an iterator, NP, then iter's idempotent:
iter(x) is x

True
That's what "an iterator" means: some object x such that x.next is
callable without arguments and iter(x) is x.

Since iter(x) tries calling type(x).__iter__(x) [[slight simplification
here by ignoring custom metaclasses, see recent discussion on python-dev
as to why this is only 99% accurate, not 100% accurate]], one way to
code an iterator is as a class. For example:

class Repeater(object):
def __iter__(self): return self
def next(self): return 23

Any instance of Repeater is an iterator which, as it happens, has just
the same behavior as itertools.repeat(23), which is also the same
behavior you get from iterators obtained by calling:

def generepeat():
while True: yield 23

In other words, after:

a = Repeater()
b = itertools.repeat(23)
c = generepeat()

the behavior of a, b and c is indistinguishable, though you can easily
tell them apart by introspection -- type(a) != type(b) != type(c).
Python's penchant for duck typing -- behavior matters more, WAY more
than implementation details such as type() -- means we tend to consider
a, b and c fully equivalent. Focusing on ``generator'' is at this level
an implementation detail.
Most often, iterators (including generator-iterators) are used in a for
statement (or equivalently a for clause of a listcomp or genexp), which
is why one normally doesn't think about built-in ``iter'': it's called
automatically by these ``for'' syntax-forms. In other words,

for x in <<<whatever>>>:
...body...

is just like:

__tempiter = iter(<<<whatever>>>)
while True:
try: x = __tempiter.next()
except StopIteration: break
...body...

((Simplification alert: the ``for'' statement has an optional ``else''
which this allegedly "just like" form doesn't mimic exactly...))

You're right. I was much more talking (mistakenly) about lazy evaluation of
the arguments to a function (i.e. the function begins execution before its
arguments get evaluated) -- in such a case I think it should be specified
which arguments are "strict" and which are "lazy" -- but I don't think
there's such a thing in Python (... well not yet as Python get more and more
akin to FP).


Python's strict that way. To explicitly make some one argument "lazy",
sorta, you can put a "lambda:" in front of it at call time, but then you
have to "call the argument" to get it evaluated; a bit of a kludge.
There's a PEP out to allow a ``prefix :'' to mean just the same as this
"lambda:", but even though I co-authored it I don't think it lowers the
kludge quotient by all that much.

Guido, our beloved BDFL, is currently musing about optional typing of
arguments, which might perhaps open a tiny little crack towards letting
some arguments be lazy. I don't think Guido wants to go there, though.

My prediction is that even Python 3000 will be strict. At least this
makes some things obvious at each call-site without having to study the
way a function is defined, e.g., upon seeing
f(a+b, c*d)
you don't have to wonder, or study the ``def f'', to find out when the
addition and the multiplication happen -- they happen before f's body
gets a chance to run, and thus, in particular, if either operation
raises an exception, there's nothing f can do about it.

And that's a misunderstanding I _have_ seen repeatedly even in people
with a pretty good overall grasp of Python, evidenced in code such as:
self.assertRaises(SomeError, f(23))
with astonishment that -- if f(23) does indeed raise SomeError -- this
exception propagates, NOT caught by assertRaises; and if mistakenly
f(23) does NOT raise, you typically get a TypeError about None not being
callable. The way to do the above call, _since Python is strict_, is:
self.assertRaises(SomeError, f, 23)
i.e. give assertRaises the function (or other callable) and arguments to
pass to it -- THIS way, assertRaises performs the call under its own
control (inside the try clause of a try/except statement) and can and
does catch and report things appropriately.

The frequency of this misunderstanding is high enough to prove to me
that strictness is not FULLY understood by some "intermediate level"
Pythonistas. However, I doubt the right solution is to complicate
Python with the ability to have some arguments be strict, and other
lazy, much as sometimes one might yearn for it. _Maybe_ one could have
some form that makes ALL arguments lazy, presenting them as an iterator
to the function itself. But even then the form _should_, I suspect, be
obvious at the call site, rather than visible only in the "def"...

Alex
Jul 18 '05 #19

P: n/a
Craig Ringer <cr***@postnewspapers.com.au> wrote:
.>>> data = ''.join(x for x in infile)
Maybe ''.join(infile) is a better way to express this functionality?
Avoids 2.4 dependency and should be faster as well as more concise.

Might it be worth providing a way to have file objects seek back to the
current position of the iterator when read() etc are called? If not, I
It's certainly worth doing a patch and see what the python-dev crowd
thinks of it, I think; it might make it into 2.5.
favour the suggestion in the referenced post - file should probably fail
noisily, or at least emit a warning.


Under what conditions, exactly, would you want such an exception?
Alex
Jul 18 '05 #20

P: n/a
On Sat, 2005-01-22 at 12:20 +0100, Alex Martelli wrote:
Craig Ringer <cr***@postnewspapers.com.au> wrote:
.>>> data = ''.join(x for x in infile)


Maybe ''.join(infile) is a better way to express this functionality?
Avoids 2.4 dependency and should be faster as well as more concise.


Thanks - for some reason I hadn't clicked to that. Obvious in hindsight,
but I just completely missed it.
Might it be worth providing a way to have file objects seek back to the
current position of the iterator when read() etc are called? If not, I


It's certainly worth doing a patch and see what the python-dev crowd
thinks of it, I think; it might make it into 2.5.


I'll certainly look into doing so. I'm not dumb enough to say "Sure,
I'll do that" before looking into the code involved and thinking more
about what issues could pop up. Still, it's been added to my
increasingly frightening TODO list.
favour the suggestion in the referenced post - file should probably fail
noisily, or at least emit a warning.


Under what conditions, exactly, would you want such an exception?


When read() or other methods suffering from the same issue are called
after next() without an intervening seek(). It'd mean another state flag
for the file to keep track of - but I doubt it'd make any detectable
difference in performance given that there's disk I/O involved.

I'd be happier to change the behaviour so that a warning isn't
necessary, though, and I suspect it can be done without introducing
backward compatibility issues. Well, so long as nobody is relying on the
undefined file position after using an iterator - and I'm not dumb
enough to say nobody would ever do that.

I've really got myself into hot water now though - I'm going to have to
read over the `file' source code before impulsively saying anything
REALLY stupid.

--
Craig Ringer

Jul 18 '05 #21

P: n/a
Hi,

First,

My deepest thanks to Craig Ringer, Alex Martelli, Nick Coghlan and Terry Reedy
for having generously answered on the "Need help on need help on generator"
thread. I'm compiling the answers to sketch myself a global pictures about
iterators, generators, iterator-generators and laziness in python.

In the meantime, I couldn't resist to test the new Python features about
laziness on a classical FP problem, i.e. the "Hamming" problem.

The name of the game is to produce the sequence of integers satisfying the
following rules :

(i) The list is in ascending order, without duplicates
(ii) The list begins with the number 1
(iii) If the list contains the number x, then it also contains the numbers
2*x, 3*x, and 5*x
(iv) The list contains no other numbers.

The algorithm in FP is quite elegant. Simply suppose that the infinite
sequence is produced, then simply merge the three sequences (2*x,3*x,5*x) for
each x in the infinite sequence we supposed as already produced ; this is
O(n) complexity for n numbers.

I simply love those algorithms that run after their tails.

In haskell, the algorithm is translated as such :

-- BEGIN SNAP
-- hamming.hs

-- Merges two infinite lists
merge :: (Ord a) => [a] -> [a] -> [a]
merge (x:xs)(y:ys)
| x == y = x : merge xs ys
| x < y = x : merge xs (y:ys)
| otherwise = y : merge (x:xs) ys

-- Lazily produce the hamming sequence
hamming :: [Integer]
hamming
= 1 : merge (map (2*) hamming) (merge (map (3*) hamming) (map (5*) hamming))
-- END SNAP
In Python, I figured out this implementation :

-- BEGIN SNAP
import sys
from itertools import imap

## Merges two infinite lists
def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else: # if y < x:
yield y
y = ys.next()

## Lazily produce the hamming sequence
def hamming():
yield 1 ## Initialize the machine
for n in imerge(imap(lambda h: 2*h, hamming()),
imerge(imap(lambda h: 3*h, hamming()),
imap(lambda h: 5*h, hamming()))):
yield n
print "Falling out -- We should never get here !!"

for n in hamming():
sys.stderr.write("%s " % str(n)) ## stderr for unbuffered output
-- END SNAP
My goal is not to compare Haskell with Python on a classical FP problem, which
would be genuine stupidity.

Nevertheless, while the Haskell version prints Hamming sequence for as longas
I can stand it, and with very little memory consumation, the Python version
only prints :
>> hamming.py

1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 32 36 40 45 48 50 54 60 64 7275
80 81 90 96 100 108 120 125 128 135 144 150 160 162 180 192 200 216 225 240
243 250 256 270 288 300 320 324 360 375 384 400 405 432 450 480 486 500 512
540 576 600 625 640 648 675 720 729 750 768 800 810 864 900 960 972 1000 1024
1080 1125 1152 1200 1215 1250 1280 1296 1350 1440 1458 1500 1536 1600 1620
1728 1800 1875 1920 1944 2000 2025 2048 2160 2187 2250 2304 2400 2430 2500
2560 2592 2700 2880 2916 3000 3072 3125 3200 3240 3375 3456 3600 3645 3750
3840 3888 4000 4050 4096 4320 4374 4500 4608 4800 4860 5000 5120 5184 5400
5625 5760 5832 6000 6075 6144 6250 6400 6480 6561 6750 6912 7200 7290 7500
7680 7776 8000 8100 8192 8640 8748 9000 9216 9375 9600 9720 10000 10125 10240
10368 10800 10935 11250 11520 11664 12000 12150 12288 12500 12800 12960 13122
13500 13824 14400 14580 15000 15360 15552 15625 16000 16200 16384 16875 17280
17496 18000 18225 18432 18750 19200 19440 19683 20000 20250 20480 20736 21600
21870 22500 23040 23328 24000 24300 24576 25000 25600 25920 26244 27000 27648
28125 28800 29160 30000 30375 30720 31104 31250 32000 32400 32768 32805 33750
34560 34992 36000 36450 36864 37500 38400 38880 39366 40000 40500 40960 41472
43200 43740 45000 46080 46656 46875 48000 48600 49152 50000 50625 51200 51840
52488 54000 54675 55296 56250 57600
Processus arrêté

After 57600, my machine begins swapping like crazy and I do have to kill the
python processus.

I think I should not have this kind of behaviour, even using recursion, since
I'm only using lazy constructs all the time. At least, I would expect the
program to produce much more results before surrending.

What's going on ?

Thank you

Francis Girard
FRANCE

Jul 18 '05 #22

P: n/a
Your formulation in Python is recursive (hamming calls hamming()) and I
think that's why your program gives up fairly early.

Instead, use itertools.tee() [new in Python 2.4, or search the internet
for an implementation that works in 2.3] to give a copy of the output
sequence to each "multiply by N" function as well as one to be the
return value.

Here's my implementation, which matched your list early on but
effortlessly reached larger values. One large value it printed was
6412351813189632 (a Python long) which indeed has only the distinct
prime factors 2 and 3. (2**43 * 3**6)

Jeff

from itertools import tee
import sys

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else:
yield y
y = ys.next()

def hamming():
def _hamming(j, k):
yield 1
hamming = generators[j]
for i in hamming:
yield i * k
generators = []
generator = imerge(imerge(_hamming(0, 2), _hamming(1, 3)), _hamming(2, 5))
generators[:] = tee(generator, 4)
return generators[3]

for i in hamming():
print i,
sys.stdout.flush()

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFB9CTeJd01MZaTXX0RAlHPAJ4gJwuPxCzBdnWiM1/Rs3eUPk1HMwCeJUXh
selUGrwJc9T8wE6aCQOjq+Q=
=AW5F
-----END PGP SIGNATURE-----

Jul 18 '05 #23

P: n/a
[Francis Girard]
...
In the meantime, I couldn't resist to test the new Python features about
laziness on a classical FP problem, i.e. the "Hamming" problem. .... Nevertheless, while the Haskell version prints Hamming sequence for as long as
I can stand it, and with very little memory consumation, the Python version
only prints : .... I think I should not have this kind of behaviour, even using recursion, since
I'm only using lazy constructs all the time. At least, I would expect the
program to produce much more results before surrending.

What's going on ?


You can find an explanation in Lib/test/test_generators.py, which uses
this problem as an example; you can also find one efficient way to
write it there.
Jul 18 '05 #24

P: n/a
Ok, I think that the bottom line is this :

For all the algorithms that run after their tail in an FP way, like the
Hamming problem, or the Fibonacci sequence, (but unlike Sieve of Eratosthene
-- there's a subtle difference), i.e. all those algorithms that typically
rely upon recursion to get the beginning of the generated elements in order
to generate the next one ...

... so for this family of algorithms, we have, somehow, to abandon recursion,
and to explicitely maintain one or many lists of what had been generated.

One solution for this is suggested in "test_generators.py". But I think that
this solution is evil as it looks like recursion is used but it is not and it
depends heavily on how the m235 function (for example) is invoked.
Furthermore, it is _NOT_ memory efficient as pretended : it leaks ! It
internally maintains a lists of generated results that grows forever while it
is useless to maintain results that had been "consumed". Just for a gross
estimate, on my system, percentage of memory usage for this program grows
rapidly, reaching 21.6 % after 5 minutes. CPU usage varies between 31%-36%.
Ugly and inefficient.

The solution of Jeff Epler is far more elegant. The "itertools.tee" function
does just what we want. It internally maintain a list of what had been
generated and deletes the "consumed" elements as the algo unrolls. To follow
with my gross estimate, memory usage grows from 1.2% to 1.9% after 5 minutes
(probably only affected by the growing size of Huge Integer). CPU usage
varies between 27%-29%.
Beautiful and effecient.

You might think that we shouldn't be that fussy about memory usage on
generating 100 digits numbers but we're talking about a whole family of
useful FP algorithms ; and the best way to implement them should be
established.

For this family of algorithms, itertools.tee is the way to go.

I think that the semi-tutorial in "test_generators.py" should be updated to
use "tee". Or, at least, a severe warning comment should be written.

Thank you,

Francis Girard
FRANCE

Le dimanche 23 Janvier 2005 23:27, Jeff Epler a écritÂ*:
Your formulation in Python is recursive (hamming calls hamming()) and I
think that's why your program gives up fairly early.

Instead, use itertools.tee() [new in Python 2.4, or search the internet
for an implementation that works in 2.3] to give a copy of the output
sequence to each "multiply by N" function as well as one to be the
return value.

Here's my implementation, which matched your list early on but
effortlessly reached larger values. One large value it printed was
6412351813189632 (a Python long) which indeed has only the distinct
prime factors 2 and 3. (2**43 * 3**6)

Jeff

from itertools import tee
import sys

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else:
yield y
y = ys.next()

def hamming():
def _hamming(j, k):
yield 1
hamming = generators[j]
for i in hamming:
yield i * k
generators = []
generator = imerge(imerge(_hamming(0, 2), _hamming(1, 3)), _hamming(2,
5)) generators[:] = tee(generator, 4)
return generators[3]

for i in hamming():
print i,
sys.stdout.flush()


Jul 18 '05 #25

P: n/a
The following implementation is even more speaking as it makes self-evident
and almost mechanical how to translate algorithms that run after their tail
from recursion to "tee" usage :

*** BEGIN SNAP
from itertools import tee, imap
import sys

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else:
yield y
y = ys.next()


def hamming():
def _hamming():
yield 1
hamming2 = hammingGenerators[0]
hamming3 = hammingGenerators[1]
hamming5 = hammingGenerators[2]
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hammingGenerators = tee(_hamming(), 4)
return hammingGenerators[3]

for i in hamming():
print i,
sys.stdout.flush()
*** END SNAP

Here's an implementation of the fibonacci sequence that uses "tee" :

*** BEGIN SNAP
from itertools import tee
import sys

def fib():
def _fib():
yield 1
yield 1
curGen = fibGenerators[0]
curAheadGen = fibGenerators[1]
curAheadGen.next()
while True:
yield curGen.next() + curAheadGen.next()
fibGenerators = tee(_fib(), 3)
return fibGenerators[2]

for n in fib():
print n,
sys.stdout.flush()
*** END SNAP

Francis Girard
FRANCE
Le lundi 24 Janvier 2005 14:09, Francis Girard a écritÂ*:
Ok, I think that the bottom line is this :

For all the algorithms that run after their tail in an FP way, like the
Hamming problem, or the Fibonacci sequence, (but unlike Sieve of
Eratosthene -- there's a subtle difference), i.e. all those algorithms that
typically rely upon recursion to get the beginning of the generated
elements in order to generate the next one ...

... so for this family of algorithms, we have, somehow, to abandon
recursion, and to explicitely maintain one or many lists of what had been
generated.

One solution for this is suggested in "test_generators.py". But I think
that this solution is evil as it looks like recursion is used but it is not
and it depends heavily on how the m235 function (for example) is invoked.
Furthermore, it is _NOT_ memory efficient as pretended : it leaks ! It
internally maintains a lists of generated results that grows forever while
it is useless to maintain results that had been "consumed". Just for a
gross estimate, on my system, percentage of memory usage for this program
grows rapidly, reaching 21.6 % after 5 minutes. CPU usage varies between
31%-36%. Ugly and inefficient.

The solution of Jeff Epler is far more elegant. The "itertools.tee"
function does just what we want. It internally maintain a list of what had
been generated and deletes the "consumed" elements as the algo unrolls. To
follow with my gross estimate, memory usage grows from 1.2% to 1.9% after5
minutes (probably only affected by the growing size of Huge Integer). CPU
usage varies between 27%-29%.
Beautiful and effecient.

You might think that we shouldn't be that fussy about memory usage on
generating 100 digits numbers but we're talking about a whole family of
useful FP algorithms ; and the best way to implement them should be
established.

For this family of algorithms, itertools.tee is the way to go.

I think that the semi-tutorial in "test_generators.py" should be updated to
use "tee". Or, at least, a severe warning comment should be written.

Thank you,

Francis Girard
FRANCE

Le dimanche 23 Janvier 2005 23:27, Jeff Epler a écritÂ*:
Your formulation in Python is recursive (hamming calls hamming()) and I
think that's why your program gives up fairly early.

Instead, use itertools.tee() [new in Python 2.4, or search the internet
for an implementation that works in 2.3] to give a copy of the output
sequence to each "multiply by N" function as well as one to be the
return value.

Here's my implementation, which matched your list early on but
effortlessly reached larger values. One large value it printed was
6412351813189632 (a Python long) which indeed has only the distinct
prime factors 2 and 3. (2**43 * 3**6)

Jeff

from itertools import tee
import sys

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else:
yield y
y = ys.next()

def hamming():
def _hamming(j, k):
yield 1
hamming = generators[j]
for i in hamming:
yield i * k
generators = []
generator = imerge(imerge(_hamming(0, 2), _hamming(1, 3)),
_hamming(2, 5)) generators[:] = tee(generator, 4)
return generators[3]

for i in hamming():
print i,
sys.stdout.flush()


Jul 18 '05 #26

P: n/a
[Francis Girard]
For all the algorithms that run after their tail in an FP way, like the
Hamming problem, or the Fibonacci sequence, (but unlike Sieve of Eratosthene
-- there's a subtle difference), i.e. all those algorithms that typically
rely upon recursion to get the beginning of the generated elements in order
to generate the next one ...

... so for this family of algorithms, we have, somehow, to abandon recursion,
and to explicitely maintain one or many lists of what had been generated.

One solution for this is suggested in "test_generators.py". But I think that
this solution is evil as it looks like recursion is used but it is not and it
depends heavily on how the m235 function (for example) is invoked.
Well, yes -- "Heh. Here's one way to get a shared list, complete with
an excruciating namespace renaming trick" was intended to warn you in
advance that it wasn't pretty <wink>.
Furthermore, it is _NOT_ memory efficient as pretended : it leaks !
Yes. But there are two solutions to the problem in that file, and the
second one is in fact extremely memory-efficient compared to the first
one. "Efficiency" here was intended in a relative sense.
It internally maintains a lists of generated results that grows forever while it
is useless to maintain results that had been "consumed". Just for a gross
estimate, on my system, percentage of memory usage for this program grows
rapidly, reaching 21.6 % after 5 minutes. CPU usage varies between 31%-36%.
Ugly and inefficient.
Try the first solution in the file for a better understanding of what
"inefficient" means <wink>.
The solution of Jeff Epler is far more elegant. The "itertools.tee" function
does just what we want. It internally maintain a list of what had been
generated and deletes the "consumed" elements as the algo unrolls. To follow
with my gross estimate, memory usage grows from 1.2% to 1.9% after 5 minutes
(probably only affected by the growing size of Huge Integer). CPU usage
varies between 27%-29%.
Beautiful and effecient.
Yes, it is better. tee() didn't exist when generators (and
test_generators.py) were written, so of course nothing in the test
file uses them.
You might think that we shouldn't be that fussy about memory usage on
generating 100 digits numbers but we're talking about a whole family of
useful FP algorithms ; and the best way to implement them should be
established.
Possibly -- there really aren't many Pythonistas who care about this.
For this family of algorithms, itertools.tee is the way to go.

I think that the semi-tutorial in "test_generators.py" should be updated to
use "tee". Or, at least, a severe warning comment should be written.


Please submit a patch. The purpose of that file is to test
generators, so you should add a third way of doing it, not replace the
two ways already there. It should also contain prose explaining why
the third way is better (just as there's prose now explaining why the
second way is better than the first).
Jul 18 '05 #27

P: n/a
Ok I'll submit the patch with the prose pretty soon.
Thank you
Francis Girard
FRANCE

Le mardi 25 Janvier 2005 04:29, Tim Peters a écrit*:
[Francis Girard]
For all the algorithms that run after their tail in an FP way, like the
Hamming problem, or the Fibonacci sequence, (but unlike Sieve of
Eratosthene -- there's a subtle difference), i.e. all those algorithms
that typically rely upon recursion to get the beginning of the generated
elements in order to generate the next one ...

... so for this family of algorithms, we have, somehow, to abandon
recursion, and to explicitely maintain one or many lists of what had been
generated.

One solution for this is suggested in "test_generators.py". But I think
that this solution is evil as it looks like recursion is used but it is
not and it depends heavily on how the m235 function (for example) is
invoked.


Well, yes -- "Heh. Here's one way to get a shared list, complete with
an excruciating namespace renaming trick" was intended to warn you in
advance that it wasn't pretty <wink>.
Furthermore, it is _NOT_ memory efficient as pretended : it leaks !


Yes. But there are two solutions to the problem in that file, and the
second one is in fact extremely memory-efficient compared to the first
one. "Efficiency" here was intended in a relative sense.
It internally maintains a lists of generated results that grows forever
while it is useless to maintain results that had been "consumed". Just
for a gross estimate, on my system, percentage of memory usage for this
program grows rapidly, reaching 21.6 % after 5 minutes. CPU usage varies
between 31%-36%. Ugly and inefficient.


Try the first solution in the file for a better understanding of what
"inefficient" means <wink>.
The solution of Jeff Epler is far more elegant. The "itertools.tee"
function does just what we want. It internally maintain a list of what
had been generated and deletes the "consumed" elements as the algo
unrolls. To follow with my gross estimate, memory usage grows from 1.2%
to 1.9% after 5 minutes (probably only affected by the growing size of
Huge Integer). CPU usage varies between 27%-29%.
Beautiful and effecient.


Yes, it is better. tee() didn't exist when generators (and
test_generators.py) were written, so of course nothing in the test
file uses them.
You might think that we shouldn't be that fussy about memory usage on
generating 100 digits numbers but we're talking about a whole family of
useful FP algorithms ; and the best way to implement them should be
established.


Possibly -- there really aren't many Pythonistas who care about this.
For this family of algorithms, itertools.tee is the way to go.

I think that the semi-tutorial in "test_generators.py" should be updated
to use "tee". Or, at least, a severe warning comment should be written.


Please submit a patch. The purpose of that file is to test
generators, so you should add a third way of doing it, not replace the
two ways already there. It should also contain prose explaining why
the third way is better (just as there's prose now explaining why the
second way is better than the first).


Jul 18 '05 #28

P: n/a
Francis Girard wrote:
The following implementation is even more speaking as it makes self-evident
and almost mechanical how to translate algorithms that run after their tail
from recursion to "tee" usage :


Thanks, Francis and Jeff for raising a fascinating topic. I've enjoyed trying
to get my head around both the algorithm and your non-recursive implementation.

Here's a version of your implementation that uses a helper class to make the
algorithm itself prettier.

from itertools import tee, imap

def hamming():
def _hamming():
yield 1
for n in imerge(2 * hamming, imerge(3 * hamming, 5 * hamming)):
yield n

hamming = Tee(_hamming())
return iter(hamming)
class Tee(object):
"""Provides an indepent iterator (using tee) on every iteration request
Also implements lazy iterator arithmetic"""
def __init__(self, iterator):
self.iter = tee(iterator,1)[0]
def __iter__(self):
return self.iter.__copy__()
def __mul__(self, number):
return imap(lambda x: x * number,self.__iter__())

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else: # if y < x:
yield y
y = ys.next()
hg = hamming()
for i in range(10000):

.... n = hg.next()
.... if i % 1000 == 0: print i, n
....
0 1
1000 51840000
2000 8100000000
3000 279936000000
4000 4707158941350
5000 50960793600000
6000 409600000000000
7000 2638827906662400
8000 14332723200000000
9000 68024448000000000
Regards

Michael

Jul 18 '05 #29

P: n/a
Francis Girard <fr************@free.fr> wrote:
def hamming():
def _hamming():
yield 1
hamming2 = hammingGenerators[0]
hamming3 = hammingGenerators[1]
hamming5 = hammingGenerators[2]
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hammingGenerators = tee(_hamming(), 4)
return hammingGenerators[3]


If you are after readability, you might prefer this...

def hamming():
def _hamming():
yield 1
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hamming2, hamming3, hamming5, result = tee(_hamming(), 4)
return result

PS interesting thread - never heard of Hamming sequences before!
--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick
Jul 18 '05 #30

P: n/a
On 25 Jan 2005 08:30:03 GMT, Nick Craig-Wood <ni**@craig-wood.com> wrote:
Francis Girard <fr************@free.fr> wrote:
def hamming():
def _hamming():
yield 1
hamming2 = hammingGenerators[0]
hamming3 = hammingGenerators[1]
hamming5 = hammingGenerators[2]
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hammingGenerators = tee(_hamming(), 4)
return hammingGenerators[3]


If you are after readability, you might prefer this...

def hamming():
def _hamming():
yield 1
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hamming2, hamming3, hamming5, result = tee(_hamming(), 4)
return result

PS interesting thread - never heard of Hamming sequences before!


Are the long words really that helpful?

def hamming():
def _hamming():
yield 1
for n in imerge(imap(lambda h: 2*h, iter(hg2)),
imerge(imap(lambda h: 3*h, iter(hg3)),
imap(lambda h: 5*h, iter(hg5)))):
yield n
hg2, hg3, hg5, result = tee(_hamming(), 4) # four hamming generators
return result

Regards,
Bengt Richter
Jul 18 '05 #31

P: n/a
Francis Girard <fr************@free.fr> wrote:
The following implementation is even more speaking as it makes self-evident
and almost mechanical how to translate algorithms that run after their tail
from recursion to "tee" usage :

*** BEGIN SNAP
from itertools import tee, imap
import sys

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else:
yield y
y = ys.next()
Thinking about this some more leads me to believe a general purpose
imerge taking any number of arguments will look neater, eg

def imerge(*generators):
values = [ g.next() for g in generators ]
while True:
x = min(values)
yield x
for i in range(len(values)):
if values[i] == x:
values[i] = generators[i].next()
def hamming():
def _hamming():
yield 1
hamming2 = hammingGenerators[0]
hamming3 = hammingGenerators[1]
hamming5 = hammingGenerators[2]
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hammingGenerators = tee(_hamming(), 4)
return hammingGenerators[3]


This means that this can be further simplified thus,

def hamming():
def _hamming():
yield 1
for n in imerge( imap(lambda h: 2*h, hamming2),
imap(lambda h: 3*h, hamming3),
imap(lambda h: 5*h, hamming5) ):
yield n
hamming2, hamming3, hamming5, result = tee(_hamming(), 4)
return result

(Note the iter(...) seemed not to be doing anything useful so I
removed them)

--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick
Jul 18 '05 #32

P: n/a
Nick Craig-Wood wrote:
Thinking about this some more leads me to believe a general purpose
imerge taking any number of arguments will look neater, eg

def imerge(*generators):
values = [ g.next() for g in generators ]
while True:
x = min(values)
yield x
for i in range(len(values)):
if values[i] == x:
values[i] = generators[i].next()

This kinda looks like it dies after the first generator is exhausted,
but I'm not certain. An alternate version that doesn't search for 'i':

py> def imerge(*iterables):
.... iters = [iter(i) for i in iterables]
.... values = [i.next() for i in iters]
.... while iters:
.... x, i = min((val, i) for i, val in enumerate(values))
.... yield x
.... try:
.... values[i] = iters[i].next()
.... except StopIteration:
.... del iters[i]
.... del values[i]
....
py> list(imerge([1, 4, 7], [2, 5, 8], [3, 6, 9]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
py> list(imerge([3, 6, 9], [1, 4, 7], [2, 5, 8]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
py> list(imerge([1, 4, 7], [3, 6, 9], [2, 5, 8]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
This means that this can be further simplified thus,

def hamming():
def _hamming():
yield 1
for n in imerge( imap(lambda h: 2*h, hamming2),
imap(lambda h: 3*h, hamming3),
imap(lambda h: 5*h, hamming5) ):
yield n
hamming2, hamming3, hamming5, result = tee(_hamming(), 4)
return result


Nice.

Steve
Jul 18 '05 #33

P: n/a
Steven Bethard <st************@gmail.com> wrote:
Nick Craig-Wood wrote:
Thinking about this some more leads me to believe a general purpose
imerge taking any number of arguments will look neater, eg

def imerge(*generators):
values = [ g.next() for g in generators ]
while True:
x = min(values)
yield x
for i in range(len(values)):
if values[i] == x:
values[i] = generators[i].next()

This kinda looks like it dies after the first generator is exhausted,
but I'm not certain.


Yes it will stop iterating then (rather like zip() on lists of unequal
size). Not sure what the specification should be! It works for the
hamming problem though.
list(imerge(iter([1, 2]), iter([1, 2, 3]), iter([1, 2, 3, 4]))) [1, 2]
An alternate version that doesn't search for 'i':

py> def imerge(*iterables):
... iters = [iter(i) for i in iterables]
... values = [i.next() for i in iters]
... while iters:
... x, i = min((val, i) for i, val in enumerate(values))
... yield x
... try:
... values[i] = iters[i].next()
... except StopIteration:
... del iters[i]
... del values[i]
...
py> list(imerge([1, 4, 7], [2, 5, 8], [3, 6, 9]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
py> list(imerge([3, 6, 9], [1, 4, 7], [2, 5, 8]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
py> list(imerge([1, 4, 7], [3, 6, 9], [2, 5, 8]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]


This isn't quite right...
list(imerge([1, 2, 3], [1, 2, 3], [1, 2, 3]))

[1, 1, 1, 2, 2, 2, 3, 3, 3]

This should produce
[1, 2, 3]

So I'm afraid the searching *is* necessary - you've got to find all
the generators with the min value and move them on.

--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick
Jul 18 '05 #34

P: n/a
Nick Craig-Wood wrote:
Steven Bethard <st************@gmail.com> wrote:
Nick Craig-Wood wrote:
Thinking about this some more leads me to believe a general purpose
imerge taking any number of arguments will look neater, eg

def imerge(*generators):
values = [ g.next() for g in generators ]
while True:
x = min(values)
yield x
for i in range(len(values)):
if values[i] == x:
values[i] = generators[i].next()


This kinda looks like it dies after the first generator is exhausted,
but I'm not certain.

Yes it will stop iterating then (rather like zip() on lists of unequal
size). Not sure what the specification should be! It works for the
hamming problem though.


Actually, it stops iterating on lists of equal size too:

py> def imerge(*iterators):
.... iterators = [iter(i) for i in iterators]
.... values = [i.next() for i in iterators]
.... while True:
.... x = min(values)
.... yield x
.... for i, val in enumerate(values):
.... if val == x:
.... values[i] = iterators[i].next()
....
py> list(imerge([1, 4, 7], [2, 5, 8], [3, 6, 9]))
[1, 2, 3, 4, 5, 6, 7]

Note that 8 and 9 are not in the list.

Steve
Jul 18 '05 #35

P: n/a
Nick Craig-Wood wrote:
Steven Bethard <st************@gmail.com> wrote:
Nick Craig-Wood wrote:
Thinking about this some more leads me to believe a general purpose
imerge taking any number of arguments will look neater, eg

def imerge(*generators):
values = [ g.next() for g in generators ]
while True:
x = min(values)
yield x
for i in range(len(values)):
if values[i] == x:
values[i] = generators[i].next()


This kinda looks like it dies after the first generator is exhausted,
but I'm not certain.

Yes it will stop iterating then (rather like zip() on lists of unequal
size). Not sure what the specification should be! It works for the
hamming problem though.

list(imerge(iter([1, 2]), iter([1, 2, 3]), iter([1, 2, 3, 4])))
[1, 2]

An alternate version that doesn't search for 'i':

py> def imerge(*iterables):
... iters = [iter(i) for i in iterables]
... values = [i.next() for i in iters]
... while iters:
... x, i = min((val, i) for i, val in enumerate(values))
... yield x
... try:
... values[i] = iters[i].next()
... except StopIteration:
... del iters[i]
... del values[i]
...
py> list(imerge([1, 4, 7], [2, 5, 8], [3, 6, 9]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
py> list(imerge([3, 6, 9], [1, 4, 7], [2, 5, 8]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
py> list(imerge([1, 4, 7], [3, 6, 9], [2, 5, 8]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

This isn't quite right...

list(imerge([1, 2, 3], [1, 2, 3], [1, 2, 3]))
[1, 1, 1, 2, 2, 2, 3, 3, 3]

This should produce
[1, 2, 3]

So I'm afraid the searching *is* necessary - you've got to find all
the generators with the min value and move them on.

Here's a dict-based implementation: cute, but slow, at least for a small number
of iterators
def imerge(*iterables): ... cache = {}
... iterators = map(iter,iterables)
... number = len(iterables)
... exhausted = 0
... while 1:
... for it in iterators:
... try:
... cache.setdefault(it.next(),[]).append(it)
... except StopIteration:
... exhausted += 1
... if exhausted == number:
... raise StopIteration
... val = min(cache)
... iterators = cache.pop(val)
... yield val
list(imerge([1, 2, 3, 6], [1, 2, 3, 7], [1, 2, 3, 4, 5])) [1, 2, 3, 4, 5, 6, 7]


Michael

Jul 18 '05 #36

P: n/a
Bengt Richter wrote:
On 25 Jan 2005 08:30:03 GMT, Nick Craig-Wood <ni**@craig-wood.com> wrote:
If you are after readability, you might prefer this...

def hamming():
def _hamming():
yield 1
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hamming2, hamming3, hamming5, result = tee(_hamming(), 4)
return result


Are the long words really that helpful?

def hamming():
def _hamming():
yield 1
for n in imerge(imap(lambda h: 2*h, iter(hg2)),
imerge(imap(lambda h: 3*h, iter(hg3)),
imap(lambda h: 5*h, iter(hg5)))):
yield n
hg2, hg3, hg5, result = tee(_hamming(), 4) # four hamming generators
return result


Well, judging by the fact that shortening the identifiers made it so
that you felt the need to add a comment indicating what they were
identifying, I'd say that yes, the long words *are* helpful. ;)
Comments are good, but self-documenting code is even better.

Jeff Shannon
Technician/Programmer
Credit International

Jul 18 '05 #37

P: n/a
On Tue, 25 Jan 2005 11:46:04 -0800, Jeff Shannon <je**@ccvcorp.com> wrote:
Bengt Richter wrote:
On 25 Jan 2005 08:30:03 GMT, Nick Craig-Wood <ni**@craig-wood.com> wrote:
If you are after readability, you might prefer this...

def hamming():
def _hamming():
yield 1
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hamming2, hamming3, hamming5, result = tee(_hamming(), 4)
return result


Are the long words really that helpful?

def hamming():
def _hamming():
yield 1
for n in imerge(imap(lambda h: 2*h, iter(hg2)),
imerge(imap(lambda h: 3*h, iter(hg3)),
imap(lambda h: 5*h, iter(hg5)))):
yield n
hg2, hg3, hg5, result = tee(_hamming(), 4) # four hamming generators
return result


Well, judging by the fact that shortening the identifiers made it so
that you felt the need to add a comment indicating what they were
identifying, I'd say that yes, the long words *are* helpful. ;)
Comments are good, but self-documenting code is even better.


The comment was a conscious factoring decision, in terms of documentation ;-)

Regards,
Bengt Richter
Jul 18 '05 #38

P: n/a
Le mardi 25 Janvier 2005 09:01, Michael Spencer a écritÂ*:
Francis Girard wrote:
The following implementation is even more speaking as it makes
self-evident and almost mechanical how to translate algorithms that run
after their tail from recursion to "tee" usage :
Thanks, Francis and Jeff for raising a fascinating topic. I've enjoyed
trying to get my head around both the algorithm and your non-recursive
implementation.


Yes, it's been fun.
Here's a version of your implementation that uses a helper class to make
the algorithm itself prettier.

from itertools import tee, imap

def hamming():
def _hamming():
yield 1
for n in imerge(2 * hamming, imerge(3 * hamming, 5 * hamming)):
yield n

hamming = Tee(_hamming())
return iter(hamming)
class Tee(object):
"""Provides an indepent iterator (using tee) on every iteration
request Also implements lazy iterator arithmetic"""
def __init__(self, iterator):
self.iter = tee(iterator,1)[0]
def __iter__(self):
return self.iter.__copy__()
def __mul__(self, number):
return imap(lambda x: x * number,self.__iter__())

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else: # if y < x:
yield y
y = ys.next()
>>> hg = hamming()
>>> for i in range(10000):

... n = hg.next()
... if i % 1000 == 0: print i, n
...
0 1
1000 51840000
2000 8100000000
3000 279936000000
4000 4707158941350
5000 50960793600000
6000 409600000000000
7000 2638827906662400
8000 14332723200000000
9000 68024448000000000


Interesting idea.
Regards

Michael


Jul 18 '05 #39

P: n/a
Le mardi 25 Janvier 2005 19:52, Steven Bethard a écrit*:

Thank you Nick and Steven for the idea of a more generic imerge.

To work with the Hamming problem, the imerge function _must_not_ allow
duplicates and we can assume all of the specified iteratables are of the same
size, i.e. infinite !

Therefore, Nick's solution fulfills the need. But it is admittedly confusing
to call the function "imerge" when it doesn't merge everything (including
duplicates). Anyway both solution sheds new light and brings us a bit
farther.

That's the beauty of many brains from all over the world collabarating.
Really, it makes me emotive thinking about it.

For the imerge function, what we really need to make the formulation clear is
a way to look at the next element of an iteratable without consuming it. Or
else, a way to put back "consumed" elements in the front an iteration flow,
much like the list constructors in FP languages like Haskell.

It is simple to encapsulate an iterator inside another iterator class that
would do just that. Here's one. The additional "fst" method returns the next
element to consume without consuming it and the "isBottom" checks if there is
something left to consume from the iteration flow, without actually consuming
anything. I named the class "IteratorDeiterator" for lack of imagination :

-- BEGIN SNAP
class IteratorDeiterator:
def __init__(self, iterator):
self._iterator = iterator.__iter__()
self._firstVal = None ## Avoid consuming if not requested from outside
## Works only if iterator itself can't return None

def __iter__(self): return self

def next(self):
valReturn = self._firstVal
if valReturn is None:
valReturn = self._iterator.next()
self._firstVal = None
return valReturn

def fst(self):
if self._firstVal is None:
self._firstVal = self._iterator.next()
return self._firstVal

def isBottom(self):
if self._firstVal is None:
try: self._firstVal = self._iterator.next()
except StopIteration:
return True
return False
-- END SNAP

Now the imerge_infinite which merges infinite lists, while allowing
duplicates, is quite straightforward :

-- BEGIN SNAP
def imerge_infinite(*iterators):
vItTopable = [IteratorDeiterator(it) for it in iterators]
while vItTopable:
yield reduce(lambda itSmallest, it:
itSmallest.fst() > it.fst() and it or itSmallest,
vItTopable).next()
-- END SNAP

To merge finite lists of possibly different lengths is a bit more work as you
have to eliminate iterators that reached the bottom before the others. This
is quite easily done by simply filtering them out.
The imerge_finite function below merges "lists" of possibly different sizes.

-- BEGIN SNAP
def imerge_finite(*iterators):
vItDeit = [IteratorDeiterator(it) for it in iterators]
vItDeit = filter(lambda it: not it.isBottom(), vItDeit)
while vItDeit:
yield reduce(lambda itSmallest, it:
itSmallest.fst() > it.fst() and it or itSmallest,
vItDeit).next()
vItDeit = filter(lambda it: not it.isBottom(), vItDeit)
-- END SNAP
Now, we want versions of these two merge functions that do not allow
duplicates. Building upon what we've already done in a semi FP way, we just
filter out the duplicates from the iteration streams returned by the above
functions. The job is the same for the infinite and finite versions, hence
the imerge_nodup generic function.

-- BEGIN SNAP
def imerge_nodup(fImerge, *iterators):
it = fImerge(*iterators)
el = it.next()
yield el
while True:
el = dropwhile(lambda _next: _next == el, it).next()
yield el

imerge_inf_nodup = \
lambda *iterators: imerge_nodup(imerge_infinite, *iterators)
imerge_finite_nodup = \
lambda *iterators: imerge_nodup(imerge_finite, *iterators)
-- END SNAP

I used the lambda notation for imerge_inf_nodup and imerge_finite_nodup to
avoid another useless for-yield loop. Here the notation really just express
function equivalence with a bounded argument (it would be great to have a
notation for this in Python, i.e. binding a function argument to yield a new
function).

The merge function to use with hamming() is imerge_inf_nodup.

Regards,

Francis Girard
FRANCE
Nick Craig-Wood wrote:
Steven Bethard <st************@gmail.com> wrote:
Nick Craig-Wood wrote:
Thinking about this some more leads me to believe a general purpose
imerge taking any number of arguments will look neater, eg

def imerge(*generators):
values = [ g.next() for g in generators ]
while True:
x = min(values)
yield x
for i in range(len(values)):
if values[i] == x:
values[i] = generators[i].next()

This kinda looks like it dies after the first generator is exhausted,
but I'm not certain.


Yes it will stop iterating then (rather like zip() on lists of unequal
size). Not sure what the specification should be! It works for the
hamming problem though.


Actually, it stops iterating on lists of equal size too:

py> def imerge(*iterators):
... iterators = [iter(i) for i in iterators]
... values = [i.next() for i in iterators]
... while True:
... x = min(values)
... yield x
... for i, val in enumerate(values):
... if val == x:
... values[i] = iterators[i].next()
...
py> list(imerge([1, 4, 7], [2, 5, 8], [3, 6, 9]))
[1, 2, 3, 4, 5, 6, 7]

Note that 8 and 9 are not in the list.

Steve


Jul 18 '05 #40

P: n/a
Francis Girard wrote:
For the imerge function, what we really need to make the formulation clear is
a way to look at the next element of an iteratable without consuming it. Or
else, a way to put back "consumed" elements in the front an iteration flow,
much like the list constructors in FP languages like Haskell.

It is simple to encapsulate an iterator inside another iterator class that
would do just that. Here's one. The additional "fst" method returns the next
element to consume without consuming it and the "isBottom" checks if there is
something left to consume from the iteration flow, without actually consuming
anything. I named the class "IteratorDeiterator" for lack of imagination :

[snip]

Another implementation is my peekable class:

http://aspn.activestate.com/ASPN/Coo.../Recipe/304373

It allows you to peek several elements ahead if that's necessary.

Steve
Jul 18 '05 #41

P: n/a
Francis Girard <fr************@free.fr> wrote:
Thank you Nick and Steven for the idea of a more generic imerge.
You are welcome :-) [It came to me while walking the children to school!]

[snip] class IteratorDeiterator:
def __init__(self, iterator):
self._iterator = iterator.__iter__()
self._firstVal = None ## Avoid consuming if not requested from outside
## Works only if iterator itself can't return None
You can use a sentinel here if you want to avoid the "can't return
None" limitation. For a sentinel you need an object your iterator
couldn't possibly return. You can make one up, eg

self._sentinel = object()
self._firstVal = self._sentinel

Or you could use self (but I'm not 100% sure that your recursive
functions wouldn't return it!)
def __iter__(self): return self

def next(self):
valReturn = self._firstVal
if valReturn is None:
and
if valReturn is self._sentinel:
valReturn = self._iterator.next()
self._firstVal = None


self._firstVal = self._sentinel

etc..

[snip more code]

Thanks for some more examples of fp-style code. I find it hard to get
my head round so its been good exercise!
--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick
Jul 18 '05 #42

P: n/a
Le jeudi 27 Janvier 2005 10:30, Nick Craig-Wood a écritÂ*:
Francis Girard <fr************@free.fr> wrote:
Thank you Nick and Steven for the idea of a more generic imerge.
You are welcome :-) [It came to me while walking the children to school!]


Sometimes fresh air and children purity is all what it takes. Much better than
coffee, cigarrette and flat screen.
[snip]
class IteratorDeiterator:
def __init__(self, iterator):
self._iterator = iterator.__iter__()
self._firstVal = None ## Avoid consuming if not requested from
outside ## Works only if iterator itself can't return None
You can use a sentinel here if you want to avoid the "can't return
None" limitation. For a sentinel you need an object your iterator
couldn't possibly return. You can make one up, eg


Great idea. I'll use it.
self._sentinel = object()
self._firstVal = self._sentinel

Or you could use self (but I'm not 100% sure that your recursive
functions wouldn't return it!)
def __iter__(self): return self

def next(self):
valReturn = self._firstVal
if valReturn is None:
and

if valReturn is self._sentinel:
valReturn = self._iterator.next()
self._firstVal = None


self._firstVal = self._sentinel

etc..

[snip more code]

Thanks for some more examples of fp-style code. I find it hard to get
my head round so its been good exercise!


Introduction to functional programming
Richard Bird and Philip Wadler
Prentice Hall
1988

This is the very best intro I ever read. The book is without hype, doesn't
show its age and is language neutral.
Authors are world leaders in the field today. Only really strong guys have the
kindness to do understandable introductions without trying to hide the
difficulties (because they are strong enough to face them with simplicity).

It's been a real pleasure.

Regards,

Francis Girard
FRANCE

--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick


Jul 18 '05 #43

P: n/a
Francis Girard <fr************@free.fr> writes:
Thank you Nick and Steven for the idea of a more generic imerge.


If you want to get fancy, the merge should use a priority queue (like
in the heapsort algorithm) instead of a linear scan through the
incoming iters, to find the next item to output. That lowers the
running time to O(n log k) instead of O(n*k), where k is the number of
iterators and n is the length.
Jul 18 '05 #44

P: n/a
Paul Rubin wrote:
Francis Girard <fr************@free.fr> writes:
Thank you Nick and Steven for the idea of a more generic imerge.

If you want to get fancy, the merge should use a priority queue (like
in the heapsort algorithm) instead of a linear scan through the
incoming iters, to find the next item to output. That lowers the
running time to O(n log k) instead of O(n*k), where k is the number of
iterators and n is the length.


I looked at a heapq solution but didn't see any clean way of dealing with
multiple iterators having equal values. The dict solution below deals cleanly
with that, since one key can be shared by any number of iterators. Extracting
the minimum, and the associated iterables is fast, but the overall solution is
still slower than the brute force approach for the 3 hamming iterators.
def imerge(*iterables): ... cache = {}
... iterators = map(iter,iterables)
... number = len(iterables)
... exhausted = 0
... while 1:
# First time through, looked at all of them
# Subsequently, update only the minimum iterators
... for it in iterators:
... try:
# Key each iterator by its next() value
# Multiple iterators may share the same key
... cache.setdefault(it.next(),[]).append(it)
... except StopIteration:
... exhausted += 1
... if exhausted == number:
... raise StopIteration
# Get the lowest value
... val = min(cache)
# and all the iterators that have that value
... iterators = cache.pop(val)
... yield val
list(imerge([1, 2, 3, 6], [1, 2, 3, 7], [1, 2, 3, 4, 5])) [1, 2, 3, 4, 5, 6, 7]


Michael

Jul 18 '05 #45

P: n/a
Le vendredi 28 Janvier 2005 08:27, Paul Rubin a écrit*:
Francis Girard <fr************@free.fr> writes:
Thank you Nick and Steven for the idea of a more generic imerge.


If you want to get fancy, the merge should use a priority queue (like
in the heapsort algorithm) instead of a linear scan through the
incoming iters, to find the next item to output. That lowers the
running time to O(n log k) instead of O(n*k), where k is the number of
iterators and n is the length.


The goal was only to show some FP construct on small problems. Here the number
of iterators are intended to be fairly small.

Otherwise, yes, you could exploit the fact that, at one loop execution, you
already seen most of the elements at previous loop execution, storing them in
some heap structure and therefore not having to recan the whole list of
"already seen" iterator values at each loop execution.

Thank you

Francis Girard

Jul 18 '05 #46

This discussion thread is closed

Replies have been disabled for this discussion.