By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
 449,220 Members | 1,718 Online
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,220 IT Pros & Developers. It's quick & easy.

# Itertools

 P: n/a The new itertools stuff is pretty cool. One thing that bothers me though is that there seems to be no way to copy an iterator. How does one work around this? Is there a trick that I am missing? Without being able to 'split' an iterator, I can't think of how to do some of the cool things you can do with lazy lists in Haskell. It seems quite often you want to be able to do this 'splitting', and if I had more coffee I would post an example :) The ones I can think of are also defined recursively, which is another kettle of fish. But I hope that someone knows what I am talking about and has thought of a way... Cheers! Jul 18 '05 #1
Share this Question
8 Replies

 P: n/a On Tue, 29 Jul 2003 08:32:14 -0400, "Ronald Legere" wrote: The new itertools stuff is pretty cool. One thing that bothers me though isthatthere seems to be no way to copy an iterator. How does one work around this?Isthere a trick that I am missing?Without being able to 'split' an iterator, I can'tthink of how to do some of the cool things you can do with lazy lists inHaskell. It seemsquite often you want to be able to do this 'splitting', and if Ihad more coffee I would post an example :) The ones Ican think of are also defined recursively, which is another kettle of fish. Iterables can be more far general than lazy lists: think of a socket as an iterable churning out lines of text, for example. How would one go about copying these general iterables? You can take slices of iterables, though - can't remember the name, islice? But since you don't know how to take copies of iterables in general this may not be that useful. But I hope that someone knowswhat I am talking about and has thought of a way...Cheers! With my best regards, G. Rodrigues Jul 18 '05 #2

 P: n/a Just tested the code, amend as stated (remove the - lines, and add the + lines): --- START OF CODE --- import copy class IDup(object): def __init__(self,iterin): self.__iter = iterin self.__iterno = 0 self.__iteritems = [] self.__hasstopped = None - def registerIter(): + def registerIter(self): iterno = self.__iterno self.__iterno += 1 self.__iteritems.append([]) + return iterno def getNext(self,iterno): if self.__iteritems[iterno]: iteritem = self.__iteritems[iterno].pop(0) elif self.__hasstopped is not None: raise self.__hasstopped else: try: iteritem = self.__iter.next() except StopIteration, e: self.__hasstopped = e raise for id, i in enumerate(self.__iteritems): if id <> iterno: i.append(copy.deepcopy(iteritem)) return iteritem class IDupped(object): def __init__(self,idup): self.__idup = idup self.__iterno = idup.registerIter() def next(self): return self.__idup.getNext(self.__iterno) + def __iter__(self): + return self def isplit(iterin,splitno=2): idup = IDup(iterin) iduppeds = [] for i in range(splitno): iduppeds.append(IDupped(idup)) return tuple(iduppeds) test = ["hello","how","are","you?"] x, y = isplit(iter(test)) for i in x: print i for i in y: print i --- END OF CODE --- This works as stated. Heiko. Jul 18 '05 #3

 P: n/a Untested code that does basically this, but beware of any traps (read and understand the code before trying to use it!!!), uses Python 2.3 functionality, but could be amended: --- START OF CODE --- import copy class IDup(object): def __init__(self,iterin): self.__iter = iterin self.__iterno = 0 self.__iteritems = [] self.__hasstopped = None def registerIter(): iterno = self.__iterno self.__iterno += 1 self.__iteritems.append([]) def getNext(self,iterno): if self.__iteritems[iterno]: iteritem = self.__iteritems[iterno].pop(0) elif self.__hasstopped is not None: raise self.__hasstopped else: try: iteritem = self.__iter.next() except StopIteration, e: self.__hasstopped = e raise for id, i in enumerate(self.__iteritems): if id <> iterno: i.append(copy.deepcopy(iteritem)) return iteritem class IDupped(object): def __init__(self,idup): self.__idup = idup self.__iterno = idup.registerIter() def next(self): return self.__idup.getNext(self.__iterno) def isplit(iterin,splitno=2): idup = IDup(iterin) iduppeds = [] for i in range(splitno): iduppeds.append(IDupped(idup)) return tuple(iduppeds) test = ["hello","how","are","you?"] x, y = isplit(iter(test)) for i in x: print i for i in y: print i --- END OF CODE --- Pitfalls: 1) You may not use the original iterator after it has been split. 2) Your iterator must return values which are copyable. If you wish for the values to be truly equal (even in ID) or they are not copyable, then just remove the copy.deepcopy from the code. HTH! Heiko. Jul 18 '05 #4

 P: n/a Ronald Legere wrote on 2003-07-29: The new itertools stuff is pretty cool. One thing that bothers me though is that there seems to be no way to copy an iterator. How does one work around this? Is there a trick that I am missing? Without being able to 'split' an iterator, I can't think of how to do some of the cool things you can do with lazy lists in Haskell. It seems quite often you want to be able to do this 'splitting', and if I had more coffee I would post an example :) The ones I can think of are also defined recursively, which is another kettle of fish. Strange, the need seems to come up more freqeuntly since I implemented it ;-). Or is it just me paying more attention to it? After a couple of design iterations, I've got: http://www.technion.ac.il/~cben/python/streams.py [NEWS: The last changes were addition of __nonzero__ methods, clued by discussion on c.l.py , and a `.force(count=None)` method to force computing values.] I concluded that it's best to separate the two abstractions, providing ability to convert both ways: - An iterable immutable (but lazily computed) linked list, with O(1) tail access and ability to cons in front of it. - An iterator object whose `.next()` method is necessarily destructive, on another hand. Turns out the most flexible implementation of the iterator is simply a pointer to a linked list; this way you can split the iterator and even "unyield" values back onto it. See the module for more details. In general it's most clean to separate iterables from iterators. Iterables should be changed by iterating. Any feedback welcome. I'd like to make it as Pythonic as possible. An perhaps make it standard. Particular open questions: - Terminology. There are too many names for the same things ;-). - The linked lists are called "streams" by me because they are lazy and that's what such beasts are called in functional languages. But I still have many refernces to them as linked lists, especially w.r.t. the `Cons` class which happens to be the base class of `Stream`. So `Stream` is defined as a lazy linked list node. Should I try to squash refences to "linked lists", treating them as "stream nodes which are already computed" to reduce confusion with Python lists? - car/cdr vs. first/rest vs. head/tail. I chose head/tail. car/cdr would be opaque to people without lisp background. first/rest are of different length :-). - `Cons` still requires lisp background but I don't know any name that would be recognizable to non-lispers anyway. And it's not a bad name. - Is `StreamIter` the best name? - I called the operation for creating new iterator from same place "forking". Alternative names: `split`, `copy`, `clone`, `dup`, "lookahead", etc. What's best? - Consing in front of a StreamIter's stream is called `.unyeild`. The point is that it's the opposite of the ``yield`` statement but the name is a brave new invention. Alternatives: `prev` (antonym of `.next()`, probably misguided since it doesn't return the previous value but rather takes it from you), `pushback`, `unread`, `stuff`, etc. - Functionality: - Should the operations (`.head`/`.tail`, indexing) on the underlying stream be proxied by `StreamIter` or should the programmer have to spell out e.g. ``iterator.stream.head`` as now? - What to do about __repr__ and __str__? The streams could well be infinite and forcing lookahead for even a few values is unacceptable as this could be expensive. - What to do about __hash__ and __eq__? - Should I weakly memoize the `Stream` for each given iterable? That would reduce the risk of wrapping a destructive iterator twice with distinct streams, so that each produced item is seen by only one stream, with access-order-dependant interleaving, violating the functional intent of the laziness. OTOH, this would make the programmer less aware of the risk. Besides, I'm not sure what to return on the second attempt to wrap same iterator after some of it has already been consumed since the first time. - Would some convenience operations for lookahead, e.g. `.startswith()` be useful? - Anything else anybody needs? -- Beni Cherniavsky Put a backslash at the evening to continue hacking onto the next day. Jul 18 '05 #5

 P: n/a Beni Cherniavsky wrote: Any feedback welcome. I'd like to make it as Pythonic as possible. An perhaps make it standard. Particular open questions: - The linked lists are called "streams" by me because they are lazy and that's what such beasts are called in functional languages. Stream is used differently in Tcl and Unix. Minded separation Python from functional approach (deprecation of filter, map; doubtfulness of implementing tail recursion optimization) probably it is not so inspiring for regular python programmer. Lazy "streams" are a good thing provided they can be easily used for organizing data pathes and connections. - `Cons` still requires lisp background but I don't know any name that would be recognizable to non-lispers anyway. And it's not a bad name. Construct, Glue, Merge? Taken readability any whole word is better. - I called the operation for creating new iterator from same place "forking". Alternative names: `split`, `copy`, `clone`, `dup`, "lookahead", etc. What's best? copy is already using construct in python. Why not use it? - Anything else anybody needs? If I got the intend of Stream correctly, I wish use them as Tcl streams, so I'd need combining and filtering and i/o. Just 0.02 Mike Jul 18 '05 #6

 P: n/a Mike Rovner wrote on 2003-07-30: Beni Cherniavsky wrote: Any feedback welcome. I'd like to make it as Pythonic as possible. An perhaps make it standard. Particular open questions: - The linked lists are called "streams" by me because they are lazy and that's what such beasts are called in functional languages. Stream is used differently in Tcl and Unix. Minded separation Python from functional approach (deprecation of filter, map; doubtfulness of implementing tail recursion optimization) probably it is not so inspiring for regular python programmer. Point taken. But what do you call it then? Lazy linked lists has only one single-word name that I'm aware of and that is "streams"; there is no name for it in Tcl and Unix becauit is almost never used in them, as far as I know. Lazy "streams" are a good thing provided they can be easily used for organizing data pathes and connections. Python already provides the minimal construct for connecting producers and consumers of data: the iterator protocol. It is minimalistic and becomes inconvenient when you want to use the same value more than one time, for lookahead, backtracking or simply connecting the same source to many consumers. All this is quite easy with the head/tail access to streams; it's a very powerfull abstraction. Since streams are implemented as linked lists, they don't leak memory for infinite iterators, unless you keep a reference to a fixed position in the stream. Over streams I've built an iterator which also provides the same powers but more in line with Python's iterator protocol. - `Cons` still requires lisp background but I don't know any name that would be recognizable to non-lispers anyway. And it's not a bad name. Construct, Glue, Merge? Taken readability any whole word is better. `Construct` is too long. Compare with `str`, `int`, `dict` rather than `string`, `integer` and `dictionary`. How would `Glue` and `Merge` be meaningful here? The only associasions of "glue" are what TeX puts between boxes and "glue layers" between different languages or libraries. "Merge" sounds like the process of taking several sequences of values and interlevaing them in some way (like in merge sort; BTW, it needs a lookahead of 1 for each stream, so it should be convenient to implement with streams). But this has nothing to do with what `Cons` does: creating a new stream by prepending given head to given tail. `Cons` is the only term I know of that unabigously refers to this operation, again coming from functional languages . There is another possibility: rename `Cons` to `Stream` and `Stream` to `LazyStream`; `Stream` would then be a factory function creating either depending on whether it got 1 or two arguments. Would this be better? - I called the operation for creating new iterator from same place "forking". Alternative names: `split`, `copy`, `clone`, `dup`, "lookahead", etc. What's best? copy is already using construct in python. Why not use it? Copy was adviced against by Erik Max Francis, see: http://groups.google.com/groups?selm...%40alcyone.com However, it really copies the iterator, it's just very fast. So I think I will indeed name this `.copy()` (aliased as `.__copy__`). - Anything else anybody needs? If I got the intend of Stream correctly, I wish use them as Tcl streams, so I'd need combining and filtering and i/o. You can use a `Stream` or `StreamIter` over a file object to get lookahead and mulitple passes over it, without requiring the file to be seekable! That's one of the main uses. It also gives you ability to "unread" any amount of data. Note that if you simply apply it to a file the stream operates on a line at a time; you will need to wrap the file with a custom generator if you wish to operate at character level; working at token level might be the best compromise. I'm not familiar with Tcl streams, so I don't understand the "combining and filtering" reference - can you elaborate? -- Beni Cherniavsky Put a backslash at the evening to continue hacking onto the next day. Jul 18 '05 #7

 P: n/a Beni Cherniavsky wrote: Mike Rovner wrote on 2003-07-30: I'm sorry for taking so long to answer (I had a project deadline to meet ;) - done). Beni Cherniavsky wrote: Point taken. But what do you call it then? Lazy linked lists has only one single-word name that I'm aware of and that is "streams"; there is no name for it in Tcl and Unix becauit is almost never used in them, as far as I know. I made an intensive search on internet and convinced now that term 'streams' is globaly associated with i/o. I still think using this term for universal data structure will be misleading. However I can't come with a good word (my best is LazyList). - `Cons` still requires lisp background but I don't know any name that would be recognizable to non-lispers anyway. And it's not a bad name. Construct, Glue, Merge? Taken readability any whole word is better. `Construct` is too long. Compare with `str`, `int`, `dict` rather than `string`, `integer` and `dictionary`. But they all are (long-lived) built-ins. AFAIK now there is a tendency to make language more readable (isinstance, enumerate to name a few hmm recent additions) How would `Glue` and `Merge` be meaningful here? The only associasions of "glue" are what TeX puts between boxes and "glue layers" between different languages or libraries. "Merge" sounds like the process of taking several sequences of values and interlevaing them in some way (like in merge sort; BTW, it needs a lookahead of 1 for each stream, so it should be convenient to implement with streams). But this has nothing to do with what `Cons` does: creating a new stream by prepending given head to given tail. `Cons` is the only term I know of that unabigously refers to this operation, again coming from functional languages . There is another possibility: rename `Cons` to `Stream` and `Stream` to `LazyStream`; `Stream` would then be a factory function creating either depending on whether it got 1 or two arguments. Would this be better? No. In discussion about list.insert() and/or range() difference in behaivor dependent on args was prononced A Bad Thing (IIRC). - Anything else anybody needs? If I got the intend of Stream correctly, I wish use them as Tcl streams, so I'd need combining and filtering and i/o. You can use a `Stream` or `StreamIter` over a file object to get lookahead and mulitple passes over it, without requiring the file to be seekable! That's one of the main uses. It also gives you ability to "unread" any amount of data. Note that if you simply apply it to a file the stream operates on a line at a time; you will need to wrap the file with a custom generator if you wish to operate at character level; working at token level might be the best compromise. I'm not familiar with Tcl streams, so I don't understand the "combining and filtering" reference - can you elaborate? I didn't touch tcl for several years ;) But the idea is pretty simple: to pass file object throw 'filter' to get 'another' file(-like) object. import compress as Z for line in tarfile.open('-','r',Z.open(filename)).extractiter() instead of stream = tarfile.open('-','r',Z.open(filename)) for elem in tarfile.open('-','r',Z.open(filename)).getmembers(): for line in stream.extract(elem): ... or something like that. I admit that later example based on new 2.3 syntax fulfill my (long being) expectations quite well. Thanks, Mike Jul 18 '05 #8

 P: n/a According to Mike Rovner : I made an intensive search on internet and convinced now that term 'streams' is globaly associated with i/o. I still think using this term for universal data structure will be misleading. However I can't come with a good word (my best is LazyList). Once upon a time, in the days of 386BSD, certain people claimed to be reinnovating Sys V streams for 386BSD, which were to be called "currents". Somehow this episode stuck in my mind. (I think it was the sound of a balloon bursting from too much hot air. ;-) -- Ng Pheng Siong http://firewall.rulemaker.net -+- Manage Your Firewall Rulebase Changes http://www.post1.com/home/ngps -+- Open Source Python Crypto & SSL Jul 18 '05 #9

### This discussion thread is closed

Replies have been disabled for this discussion.