473,403 Members | 2,323 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,403 software developers and data experts.

Itertools

The new itertools stuff is pretty cool. One thing that bothers me though is
that
there seems to be no way to copy an iterator. How does one work around this?
Is
there a trick that I am missing?

Without being able to 'split' an iterator, I can't
think of how to do some of the cool things you can do with lazy lists in
Haskell. It seems
quite often you want to be able to do this 'splitting', and if I
had more coffee I would post an example :) The ones I
can think of are also defined recursively, which is another kettle of fish.

But I hope that someone knows
what I am talking about and has thought of a way...
Cheers!

Jul 18 '05 #1
8 2943
On Tue, 29 Jul 2003 08:32:14 -0400, "Ronald Legere"
<le****@ll.mit.edu> wrote:
The new itertools stuff is pretty cool. One thing that bothers me though is
that
there seems to be no way to copy an iterator. How does one work around this?
Is
there a trick that I am missing?

Without being able to 'split' an iterator, I can't
think of how to do some of the cool things you can do with lazy lists in
Haskell. It seems
quite often you want to be able to do this 'splitting', and if I
had more coffee I would post an example :) The ones I
can think of are also defined recursively, which is another kettle of fish.

Iterables can be more far general than lazy lists: think of a socket
as an iterable churning out lines of text, for example. How would one
go about copying these general iterables?

You can take slices of iterables, though - can't remember the name,
islice? But since you don't know how to take copies of iterables in
general this may not be that useful.
But I hope that someone knows
what I am talking about and has thought of a way...
Cheers!


With my best regards,
G. Rodrigues
Jul 18 '05 #2
Just tested the code, amend as stated (remove the - lines, and add
the + lines):

--- START OF CODE ---

import copy

class IDup(object):
def __init__(self,iterin):
self.__iter = iterin
self.__iterno = 0
self.__iteritems = []
self.__hasstopped = None

- def registerIter():
+ def registerIter(self):
iterno = self.__iterno
self.__iterno += 1
self.__iteritems.append([])
+ return iterno

def getNext(self,iterno):
if self.__iteritems[iterno]:
iteritem = self.__iteritems[iterno].pop(0)
elif self.__hasstopped is not None:
raise self.__hasstopped
else:
try:
iteritem = self.__iter.next()
except StopIteration, e:
self.__hasstopped = e
raise
for id, i in enumerate(self.__iteritems):
if id <> iterno:
i.append(copy.deepcopy(iteritem))
return iteritem

class IDupped(object):
def __init__(self,idup):
self.__idup = idup
self.__iterno = idup.registerIter()

def next(self):
return self.__idup.getNext(self.__iterno)

+ def __iter__(self):
+ return self

def isplit(iterin,splitno=2):
idup = IDup(iterin)
iduppeds = []
for i in range(splitno):
iduppeds.append(IDupped(idup))
return tuple(iduppeds)

test = ["hello","how","are","you?"]
x, y = isplit(iter(test))

for i in x:
print i
for i in y:
print i

--- END OF CODE ---

This works as stated.

Heiko.
Jul 18 '05 #3
Untested code that does basically this, but beware of any traps (read
and understand the code before trying to use it!!!), uses Python 2.3
functionality, but could be amended:

--- START OF CODE ---

import copy

class IDup(object):
def __init__(self,iterin):
self.__iter = iterin
self.__iterno = 0
self.__iteritems = []
self.__hasstopped = None

def registerIter():
iterno = self.__iterno
self.__iterno += 1
self.__iteritems.append([])

def getNext(self,iterno):
if self.__iteritems[iterno]:
iteritem = self.__iteritems[iterno].pop(0)
elif self.__hasstopped is not None:
raise self.__hasstopped
else:
try:
iteritem = self.__iter.next()
except StopIteration, e:
self.__hasstopped = e
raise
for id, i in enumerate(self.__iteritems):
if id <> iterno:
i.append(copy.deepcopy(iteritem))
return iteritem

class IDupped(object):
def __init__(self,idup):
self.__idup = idup
self.__iterno = idup.registerIter()

def next(self):
return self.__idup.getNext(self.__iterno)

def isplit(iterin,splitno=2):
idup = IDup(iterin)
iduppeds = []
for i in range(splitno):
iduppeds.append(IDupped(idup))
return tuple(iduppeds)

test = ["hello","how","are","you?"]
x, y = isplit(iter(test))

for i in x:
print i
for i in y:
print i

--- END OF CODE ---

Pitfalls:

1) You may not use the original iterator after it has been split.
2) Your iterator must return values which are copyable. If you wish for
the values to be truly equal (even in ID) or they are not copyable, then
just remove the copy.deepcopy from the code.

HTH!

Heiko.
Jul 18 '05 #4
Ronald Legere wrote on 2003-07-29:
The new itertools stuff is pretty cool. One thing that bothers me
though is that there seems to be no way to copy an iterator. How
does one work around this? Is there a trick that I am missing?

Without being able to 'split' an iterator, I can't think of how to
do some of the cool things you can do with lazy lists in Haskell. It
seems quite often you want to be able to do this 'splitting', and if
I had more coffee I would post an example :) The ones I can think of
are also defined recursively, which is another kettle of fish.

Strange, the need seems to come up more freqeuntly since I implemented
it ;-). Or is it just me paying more attention to it? After a couple
of design iterations, I've got:

http://www.technion.ac.il/~cben/python/streams.py

[NEWS: The last changes were addition of __nonzero__ methods, clued by
discussion on c.l.py <wink>, and a `.force(count=None)` method to
force computing values.]

I concluded that it's best to separate the two abstractions, providing
ability to convert both ways:

- An iterable immutable (but lazily computed) linked list, with O(1)
tail access and ability to cons in front of it.

- An iterator object whose `.next()` method is necessarily
destructive, on another hand. Turns out the most flexible
implementation of the iterator is simply a pointer to a linked list;
this way you can split the iterator and even "unyield" values back
onto it. See the module for more details.

In general it's most clean to separate iterables from iterators.
Iterables should be changed by iterating.

Any feedback welcome. I'd like to make it as Pythonic as possible.
An perhaps make it standard. Particular open questions:

- Terminology. There are too many names for the same things ;-).

- The linked lists are called "streams" by me because they are lazy
and that's what such beasts are called in functional languages.
But I still have many refernces to them as linked lists,
especially w.r.t. the `Cons` class which happens to be the base
class of `Stream`. So `Stream` is defined as a lazy linked list
node. Should I try to squash refences to "linked lists", treating
them as "stream nodes which are already computed" to reduce
confusion with Python lists?

- car/cdr vs. first/rest vs. head/tail. I chose head/tail.
car/cdr would be opaque to people without lisp background.
first/rest are of different length :-).

- `Cons` still requires lisp background but I don't know any name
that would be recognizable to non-lispers anyway. And it's not
a bad name.

- Is `StreamIter` the best name?

- I called the operation for creating new iterator from same place
"forking". Alternative names: `split`, `copy`, `clone`, `dup`,
"lookahead", etc. What's best?

- Consing in front of a StreamIter's stream is called `.unyeild`.
The point is that it's the opposite of the ``yield`` statement
but the name is a brave new invention. Alternatives: `prev`
(antonym of `.next()`, probably misguided since it doesn't
return the previous value but rather takes it from you),
`pushback`, `unread`, `stuff`, etc.

- Functionality:

- Should the operations (`.head`/`.tail`, indexing) on the
underlying stream be proxied by `StreamIter` or should the
programmer have to spell out e.g. ``iterator.stream.head`` as now?

- What to do about __repr__ and __str__? The streams could well be
infinite and forcing lookahead for even a few values is
unacceptable as this could be expensive.

- What to do about __hash__ and __eq__?

- Should I weakly memoize the `Stream` for each given iterable?
That would reduce the risk of wrapping a destructive iterator
twice with distinct streams, so that each produced item is seen by
only one stream, with access-order-dependant interleaving,
violating the functional intent of the laziness. OTOH, this would
make the programmer less aware of the risk. Besides, I'm not sure
what to return on the second attempt to wrap same iterator after
some of it has already been consumed since the first time.

- Would some convenience operations for lookahead, e.g.
`.startswith()` be useful?

- Anything else anybody needs?

--
Beni Cherniavsky <cb**@tx.technion.ac.il>

Put a backslash at the evening to continue hacking onto the next day.

Jul 18 '05 #5
Beni Cherniavsky wrote:
Any feedback welcome. I'd like to make it as Pythonic as possible. An perhaps make it standard. Particular open questions: - The linked lists are called "streams" by me because they are lazy
and that's what such beasts are called in functional languages.
Stream is used differently in Tcl and Unix.
Minded separation Python from functional approach (deprecation of filter,
map;
doubtfulness of implementing tail recursion optimization) probably it is not
so
inspiring for regular python programmer.

Lazy "streams" are a good thing provided they can be easily used for
organizing data pathes and connections.
- `Cons` still requires lisp background but I don't know any name
that would be recognizable to non-lispers anyway. And it's not
a bad name.

Construct, Glue, Merge?
Taken readability any whole word is better.
- I called the operation for creating new iterator from same place
"forking". Alternative names: `split`, `copy`, `clone`, `dup`,
"lookahead", etc. What's best?
copy is already using construct in python. Why not use it?
- Anything else anybody needs?


If I got the intend of Stream correctly, I wish use them as Tcl streams,
so I'd need combining and filtering and i/o.

Just 0.02

Mike


Jul 18 '05 #6
Mike Rovner wrote on 2003-07-30:
Beni Cherniavsky wrote:
Any feedback welcome. I'd like to make it as Pythonic as possible.
An perhaps make it standard. Particular open questions:

- The linked lists are called "streams" by me because they are lazy
and that's what such beasts are called in functional languages.


Stream is used differently in Tcl and Unix. Minded separation
Python from functional approach (deprecation of filter, map;
doubtfulness of implementing tail recursion optimization) probably
it is not so inspiring for regular python programmer.

Point taken. But what do you call it then? Lazy linked lists has
only one single-word name that I'm aware of and that is "streams";
there is no name for it in Tcl and Unix becauit is almost never used
in them, as far as I know.
Lazy "streams" are a good thing provided they can be easily used for
organizing data pathes and connections.

Python already provides the minimal construct for connecting producers
and consumers of data: the iterator protocol. It is minimalistic and
becomes inconvenient when you want to use the same value more than one
time, for lookahead, backtracking or simply connecting the same source
to many consumers. All this is quite easy with the head/tail access
to streams; it's a very powerfull abstraction. Since streams are
implemented as linked lists, they don't leak memory for infinite
iterators, unless you keep a reference to a fixed position in the
stream. Over streams I've built an iterator which also provides the
same powers but more in line with Python's iterator protocol.
- `Cons` still requires lisp background but I don't know any name
that would be recognizable to non-lispers anyway. And it's not
a bad name.


Construct, Glue, Merge?
Taken readability any whole word is better.

`Construct` is too long. Compare with `str`, `int`, `dict` rather
than `string`, `integer` and `dictionary`.

How would `Glue` and `Merge` be meaningful here? The only
associasions of "glue" are what TeX puts between boxes and "glue
layers" between different languages or libraries. "Merge" sounds like
the process of taking several sequences of values and interlevaing
them in some way (like in merge sort; BTW, it needs a lookahead of 1
for each stream, so it should be convenient to implement with
streams). But this has nothing to do with what `Cons` does: creating
a new stream by prepending given head to given tail. `Cons` is the
only term I know of that unabigously refers to this operation, again
coming from functional languages <wink>.

There is another possibility: rename `Cons` to `Stream` and `Stream`
to `LazyStream`; `Stream` would then be a factory function creating
either depending on whether it got 1 or two arguments. Would this be
better?
- I called the operation for creating new iterator from same place
"forking". Alternative names: `split`, `copy`, `clone`, `dup`,
"lookahead", etc. What's best?


copy is already using construct in python. Why not use it?

Copy was adviced against by Erik Max Francis, see:

http://groups.google.com/groups?selm...%40alcyone.com

However, it really copies the iterator, it's just very fast. So I
think I will indeed name this `.copy()` (aliased as `.__copy__`).
- Anything else anybody needs?


If I got the intend of Stream correctly, I wish use them as Tcl streams,
so I'd need combining and filtering and i/o.

You can use a `Stream` or `StreamIter` over a file object to get
lookahead and mulitple passes over it, without requiring the file to
be seekable! That's one of the main uses. It also gives you ability
to "unread" any amount of data. Note that if you simply apply it to a
file the stream operates on a line at a time; you will need to wrap
the file with a custom generator if you wish to operate at character
level; working at token level might be the best compromise.

I'm not familiar with Tcl streams, so I don't understand the
"combining and filtering" reference - can you elaborate?

--
Beni Cherniavsky <cb**@tx.technion.ac.il>

Put a backslash at the evening to continue hacking onto the next day.

Jul 18 '05 #7
Beni Cherniavsky wrote:
Mike Rovner wrote on 2003-07-30:
I'm sorry for taking so long to answer (I had a project deadline to meet
;) - done).
Beni Cherniavsky wrote:

Point taken. But what do you call it then? Lazy linked lists has
only one single-word name that I'm aware of and that is "streams";
there is no name for it in Tcl and Unix becauit is almost never used
in them, as far as I know.


I made an intensive search on internet and convinced now that term 'streams'
is globaly associated with i/o. I still think using this term for universal
data
structure will be misleading. However I can't come with a good word
(my best is LazyList).
- `Cons` still requires lisp background but I don't know any
name that would be recognizable to non-lispers anyway. And
it's not a bad name.


Construct, Glue, Merge?
Taken readability any whole word is better.

`Construct` is too long. Compare with `str`, `int`, `dict` rather
than `string`, `integer` and `dictionary`.


But they all are (long-lived) built-ins. AFAIK now there is a tendency
to make language more readable (isinstance, enumerate to name a few
hmm recent additions)
How would `Glue` and `Merge` be meaningful here? The only
associasions of "glue" are what TeX puts between boxes and "glue
layers" between different languages or libraries. "Merge" sounds like
the process of taking several sequences of values and interlevaing
them in some way (like in merge sort; BTW, it needs a lookahead of 1
for each stream, so it should be convenient to implement with
streams). But this has nothing to do with what `Cons` does: creating
a new stream by prepending given head to given tail. `Cons` is the
only term I know of that unabigously refers to this operation, again
coming from functional languages <wink>.

There is another possibility: rename `Cons` to `Stream` and `Stream`
to `LazyStream`; `Stream` would then be a factory function creating
either depending on whether it got 1 or two arguments. Would this be
better?


No. In discussion about list.insert() and/or range() difference in behaivor
dependent on args was prononced A Bad Thing (IIRC).
- Anything else anybody needs?


If I got the intend of Stream correctly, I wish use them as Tcl
streams, so I'd need combining and filtering and i/o.

You can use a `Stream` or `StreamIter` over a file object to get
lookahead and mulitple passes over it, without requiring the file to
be seekable! That's one of the main uses. It also gives you ability
to "unread" any amount of data. Note that if you simply apply it to a
file the stream operates on a line at a time; you will need to wrap
the file with a custom generator if you wish to operate at character
level; working at token level might be the best compromise.

I'm not familiar with Tcl streams, so I don't understand the
"combining and filtering" reference - can you elaborate?


I didn't touch tcl for several years ;)
But the idea is pretty simple: to pass file object throw 'filter' to get
'another' file(-like) object.

import compress as Z
for line in tarfile.open('-','r',Z.open(filename)).extractiter()

instead of

stream = tarfile.open('-','r',Z.open(filename))
for elem in tarfile.open('-','r',Z.open(filename)).getmembers():
for line in stream.extract(elem):
...

or something like that. I admit that later example based on new 2.3 syntax
fulfill my (long being) expectations quite well.

Thanks,
Mike


Jul 18 '05 #8
According to Mike Rovner <mi**@bindkey.com>:
I made an intensive search on internet and convinced now that term 'streams'
is globaly associated with i/o. I still think using this term for universal
data
structure will be misleading. However I can't come with a good word
(my best is LazyList).


Once upon a time, in the days of 386BSD, certain people claimed to be
reinnovating Sys V streams for 386BSD, which were to be called "currents".

Somehow this episode stuck in my mind. (I think it was the sound of a
balloon bursting from too much hot air. ;-)
--
Ng Pheng Siong <ng**@netmemetic.com>

http://firewall.rulemaker.net -+- Manage Your Firewall Rulebase Changes
http://www.post1.com/home/ngps -+- Open Source Python Crypto & SSL
Jul 18 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Dan Williams | last post by:
Is there any interest in adding a "Haskellish" take function to itertools? I know its easy enough to roll my own, I've been using: take = lambda iter, n: which works, but I was wondering if...
10
by: Jeremy Fincher | last post by:
Sometimes I find myself simply wanting the length of an iterator. For example, to collect some (somewhat useless ;)) statistics about a program of mine, I've got code like this: objs =...
6
by: Robert Brewer | last post by:
def warehouse(stock, factory=None): """warehouse(stock, factory=None) -> iavailable, iremainder. Iterate over stock, yielding each value. Once the 'stock' sequence is exhausted, the factory...
1
by: anton muhin | last post by:
Hello, everybody! Trying to solve the problem in the subj, I found that I miss some iterator-related tools. Mostly consequental application of the same function to some argument (if I'm not...
1
by: Steven Bethard | last post by:
Is there a reason that itertools.islice doesn't support None arguments for start and step? This would be handy for use with slice objects: >>> r = range(20) >>> s1 = slice(2, 10, 2) >>> s2 =...
18
by: Ville Vainio | last post by:
For quick-and-dirty stuff, it's often convenient to flatten a sequence (which perl does, surprise surprise, by default): ]]] -> One such implementation is at ...
21
by: Steven Bethard | last post by:
Jack Diederich wrote: > > itertools to iter transition, huh? I slipped that one in, I mentioned > it to Raymond at PyCon and he didn't flinch. It would be nice not to > have to sprinkle 'import...
41
by: rurpy | last post by:
The code below should be pretty self-explanatory. I want to read two files in parallel, so that I can print corresponding lines from each, side by side. itertools.izip() seems the obvious way to...
23
by: Mathias Panzenboeck | last post by:
I wrote a few functions which IMHO are missing in python(s itertools). You can download them here: http://sourceforge.net/project/showfiles.php?group_id=165721&package_id=212104 A short...
17
by: Raymond Hettinger | last post by:
I'm considering deprecating these two functions and would like some feedback from the community or from people who have a background in functional programming. * I'm concerned that use cases for...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.