473,403 Members | 2,323 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,403 software developers and data experts.

Python 3000, zip, *args and iterators

So, as I understand it, in Python 3000, zip will basically be replaced
with izip, meaning that instead of returning a list, it will return an
iterator. This is great for situations like:

zip(*[iter1, iter2, iter3])

where I want to receive tuples of (item1, item2, item3) from the
iterables. But it doesn't work well for a situation like:

zip(*tuple_iter)

where tuple_iter is an iterator to tuples of the form
(item1, item2, item3) and I want to receive three iterators, one to the
item1s, one to the item2s and one to the item3s. I don't think this
is too unreasonable of a desire as the current zip, in a situation like:

zip(*tuple_list)

where tuple_list is a list of tuples of the form (item1, item2, item3),
returns a list of three tuples, one of the item1s, one of the item2s and
one of the item3s.

Of course, the reason this doesn't work currently is that the fn(*itr)
notation converts 'itr' into a tuple, exhausting the iterator:
def g(x): .... for i in xrange(x):
.... yield (i, i+1, i+2)
.... print "exhausted"
.... zip(*g(4)) exhausted
[(0, 1, 2, 3), (1, 2, 3, 4), (2, 3, 4, 5)] it.izip(*g(4)) exhausted
<itertools.izip object at 0x01157710> x, y, z = it.izip(*g(4)) exhausted x, y, z ((0, 1, 2, 3), (1, 2, 3, 4), (2, 3, 4, 5))

What I would prefer is something like:
zip(*g(4)) <iterator object at ...> x, y, z = zip(*g(4))
x, y, z (<iterator object at ...>, <iterator object at ..., <iterator object at ...)

Of course, I can write a separate function that will do what I want
here[1] -- my question is if Python's builtin zip will support this in
Python 3000. It's certainly not a trivial change -- it requires some
pretty substantially backwards incompatible changes in how *args is
parsed for a function call -- namely that fn(*itr) only extracts as many
of the items in the iterable as necessary, e.g.
def h(x, y, *args): .... print x, y, args
.... print list(it.islice(args, 4))
.... h(*it.count())

0 1 count(2)
[2, 3, 4, 5]

So I guess my real question is, should I expect Python 3000 to play
nicely with *args and iterators? Are there reasons (besides backwards
incompatibility) that parsing *args this way would be bad?
Steve
[1] In fact, with the help of the folks from this list, I did:
http://aspn.activestate.com/ASPN/Coo.../Recipe/302325
Jul 18 '05 #1
10 2186

"Steven Bethard" <st************@gmail.com> wrote in message
news:3yGzd.289258$HA.962@attbi_s01...
So, as I understand it, in Python 3000, zip will basically be replaced
with izip, meaning that instead of returning a list, it will return an
iterator.


I think it worth repeating that Python 3 is at yet something of a
pipedream, as indicated by the joke name Python 3000 (that also being in
part a satire on Windows 2000, and the like). So, while Guido has said he
would like to make Python iterator-oriented in the way that it used to be
list-oriented, nothing is set in stone, certainly not the details.

Guido has also said that he would like there to be funding to pay him to
spend a year on its development. He wants to take that long so there will
be adequate discussion, thought, and testing so he can 'get it right' as
least in the sense of having everything work well together.

Terry J. Reedy

Jul 18 '05 #2
Terry Reedy wrote:
"Steven Bethard" <st************@gmail.com> wrote in message
news:3yGzd.289258$HA.962@attbi_s01...
So, as I understand it, in Python 3000, zip will basically be replaced
with izip, meaning that instead of returning a list, it will return an
iterator.
I think it worth repeating that Python 3 is at yet something of a
pipedream, as indicated by the joke name Python 3000 (that also being in
part a satire on Windows 2000, and the like).


True, true. And worth repeating.
So, while Guido has said he
would like to make Python iterator-oriented in the way that it used to be
list-oriented, nothing is set in stone, certainly not the details.


Right, though my understanding of PEP 3000[1] is that though "Python
3000" may never exist, the PEP is there as a road-map of where Python
as a language would like to go. I guess the point of my question is to
find out if this kind of nice interaction of *args and iterators is
something that's in the road-map. If it is, then maybe there are parts
of it that could be implemented in a way that's backwards compatible,
even if the full system wouldn't be available for some time. (Perhaps
something along the lines of "from __future__ import iter_args".)

Steve

[1] http://www.python.org/peps/pep-3000.html
Jul 18 '05 #3

"Steven Bethard" <st************@gmail.com> wrote in message
news:O5Jzd.566495$wV.471519@attbi_s54...
Terry Reedy wrote:
I think it worth repeating that Python 3 is at yet something of a
pipedream, as indicated by the joke name Python 3000
Right, though my understanding of PEP 3000[1] is that though "Python
3000" may never exist, the PEP is there as a road-map of where Python as
a language would like to go.
A major backwards compatibility break will not happen without a major
number change to Py3. And I expect it to happen -- the 'as yet' was
intentional. In fact, here is my New Year's prediction (with subjective
certainty > .5):

a. The PyPy project will succeed.
b. Python3 (actually, the reference implementation thereof) will be written
in Python3 (perhaps with 'draft' in Py2).
c. We will see it within 5 years.

We will see if I am any better than the tabloid 'psychics'.
I guess the point of my question is to find out if this kind of nice
interaction of *args and iterators is something that's in the road-map.
If it is, then maybe there are parts of it that could be implemented in a
way that's backwards compatible, even if the full system wouldn't be
available for some time. (Perhaps something along the lines of "from
__future__ import iter_args".)


You can certainly share your concerns with the PEP author. I believe that
there is also a PyWiki page that you can directly add to.

Terry J. Reedy

Jul 18 '05 #4
Terry Reedy wrote:
"Steven Bethard" <st************@gmail.com> wrote in message
news:O5Jzd.566495$wV.471519@attbi_s54...
I guess the point of my question is to find out if this kind of nice
interaction of *args and iterators is something that's in the road-map.
If it is, then maybe there are parts of it that could be implemented in a
way that's backwards compatible, even if the full system wouldn't be
available for some time. (Perhaps something along the lines of "from
__future__ import iter_args".)


You can certainly share your concerns with the PEP author. I believe that
there is also a PyWiki page that you can directly add to.


Yeah, I found the wiki page too[1]. Does anyone know if it's okay to
add things to this page? I had avoided doing so since it gives as its
description "This page lists features that GvR has mentioned as goals
for Python 3.0" which sounds like it's not intended for commentary by
the general Python community.

Maybe I should start a Python3.0Wishlist page?

Steve

[1]http://www.python.org/moin/Python3_2e0

P.S. I thought about posting to python-dev where GvR might hear directly
about this kind of thing, but it seems a little premature since most
predictions put Python 3.0 at least 3-5 years from now.
Jul 18 '05 #5
[Steven Bethard]
What I would prefer is something like:
>>> zip(*g(4)) <iterator object at ...> >>> x, y, z = zip(*g(4))
>>> x, y, z
(<iterator object at ...>, <iterator object at ..., <iterator object

at ...)
.. . . So I guess my real question is, should I expect Python 3000 to play
nicely with *args and iterators? Are there reasons (besides backwards incompatibility) that parsing *args this way would be bad? .. . . In fact, with the help of the folks from this list, I did:
http://aspn.activestate.com/ASPN/Coo.../Recipe/302325


* The answer to the first question is Yes. The point of Python 3000 is
building on what was learned and writing a simpler, cleaner language
without the encumbrance of backwards compatibility.

* However, IMHO, the proposed behavior doesn't qualify as "playing
nicely".

* Your excellent recipe provides a good basis for discussion and it
highlights some of the issues around the proposed behavior:

1: The current implementation's behavior is easy to learn, easy to
explain, and does what most folks expect (not folks who are pushing the
iterator and *arg protocols to the outer limits). In contrast, the
proposed recipe is somewhat complex and its implications are not
immediately obvious. The itertools.tee() component is of extra concern
because it invisibly introduces memory intensive characteristics into
an otherwise lightweight, low-overhead function.

2. It is instructive to look at Guido's reactions to other *args
proposals. His receptivity to a,b,*c=it wanes whenever someone then
requests support for a,*b,c=it. Likewise, he considers zip(*args) as a
transpose function to be an abuse of the *arg protocol. IOW,
supporting "odd" usages does not bode well for a proposal.

3. The recipe discussion and newsgroup posting present only toy
examples -- real use cases have not yet emerged. If some do emerge, I
suspect that each problem will have a better solution (using existing
tools) than the one being proposed. If so, then adopting the proposal
will have the negative effect of leading folks away from the correct
solution.
Raymond Hettinger
"Not everything that can be done, should be done."

Jul 18 '05 #6
Raymond Hettinger <py****@rcn.com> wrote:
...
"Not everything that can be done, should be done."


Or, to quote Scripture...:

"'Everything is permissible for me' -- but not everything is beneficial"
(1 Cor 6:12)...
Alex
Jul 18 '05 #7
Raymond Hettinger wrote:
[...]

"Not everything that can be done, should be done."


.... and not everything that should be done, can be done.

regards
Steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Holden Web LLC +1 703 861 4237 +1 800 494 3119
Jul 18 '05 #8
Raymond Hettinger wrote:
[Steven Bethard]
What I would prefer is something like:
>>> zip(*g(4))<iterator object at ...>
>>> x, y, z = zip(*g(4))
>>> x, y, z

(<iterator object at ...>, <iterator object at ..., <iterator object

at ...)

2. It is instructive to look at Guido's reactions to other *args
proposals. His receptivity to a,b,*c=it wanes whenever someone then
requests support for a,*b,c=it.


Yeah, I've seen his responses to those kind of suggestions. I don't
think what I'm suggesting (at least in terms of *args) is quite as
extreme though -- I'm still only talking about *args in function
definitions. I'm just suggesting that in a function with a *args in the
def, the args variable be an iterator instead of a tuple. (This doesn't
entirely solve my zip problem of course, but it's the only *args change
I was suggesting.)
Likewise, he considers zip(*args) as a
transpose function to be an abuse of the *arg protocol.
Ahh, I didn't know that. Is there another (preferred) way to do this?
3. The recipe discussion and newsgroup posting present only toy
examples -- real use cases have not yet emerged.


Ok, I'll try to give you one of my use cases. It's a little
complicated, so sorry if my explanation goes on for a bit here.

Basically, I'm parsing one file format to another. The files can be
quite large, so it's important to use iterators wherever possible. My
conversion function is a generator that generates a (label,
feature_dict) pair for each line in the input file.

Now, two possible things can happen at this point (depending on
parameters from the user):

CASE 1: I output the (label, feature_dict) pairs as is, with code
something like:

for label, feature_dict in generator:
write_instance(label, feature_dict)

This is, of course, the simple case.

CASE 2: I need to apply a windowing function to the iterables so that
each line includes not only its feature_dict's values, but also the
values of some of the surrounding feature_dicts. Note that I only want
to window the feature_dicts, not the labels. This gives me code
something like:

labels, feature_dicts = starzip(generator)
for label, feature_window in izip(labels, window(feature_dicts)):
write_instance(label, combine_dicts(feature_widow))

Note that I can't write the code like:

for label, feature_dict in generator:
feature_dict = combine_dicts(window(feature_dict)) # WRONG!
write_instance(label, feature_dict)

because window produces an iterable from an *iterable* of feature_dicts,
not from a single feature_dict. So basically what I've done here is to
"transpose" (to use your word) the iterators, apply my function, and
then transpose the iterators back.
Hopefully this gives a little better justification for starzip? If you
have a cleaner way to do this kind of thing, I'd welcome any suggestions
of course.
If zip(*) is discouraged as a transpose function, maybe I should be
lobbying for adding a transpose function instead? (For now, of course,
it would go into itertools, but when iterators become the standard in
Python 3.0, maybe it could be moved into the builtins...)
Thanks for your comments!

Steve
Jul 18 '05 #9
[Steven Bethard] I'm just suggesting that in a function with a
*args in the def, the args variable be an iterator instead of
a tuple.
So people would lose the useful abilities to check len(args) or extract
an argument with args[1]?

Besides, if a function really wants an iterator, then its signature
should accept one directly -- no need for the star operator.
Likewise, he considers zip(*args) as a
transpose function to be an abuse of the *arg protocol.


Ahh, I didn't know that. Is there another (preferred) way to do

this?

I prefer the abusive approach ;-) however, the Right Way (tm) is
probably nested list comps or just plain for-loops. And, if you have
numeric, there is an obvious preferred approach.
So basically what I've done here is to
"transpose" (to use your word) the iterators, apply my function, and
then transpose the iterators back.


If you follow the data movements, you'll find that iterators provide no
advantage here. To execute transpose(map(f, transpose(iterator)), the
whole iterator necessarily has to be read into memory so that the first
function application will have all of its arguments present -- using
the star operator only obscures that fact.

Realizing that the input has to be in memory anyway, then you might as
well take advantage of the code simplication offered by indexing:
def twistedmap(f, iterable):

.... data = list(iterable)
.... rows = range(len(data))
.... for col in xrange(len(data[0])):
.... args = [data[row][col] for rows in rows]
.... yield f(*args)

Raymond Hettinger

Jul 18 '05 #10
Raymond Hettinger wrote:
[Steven Bethard] I'm just suggesting that in a function with a
*args in the def, the args variable be an iterator instead of
a tuple.

So people would lose the useful abilities to check len(args) or extract
an argument with args[1]?


No more than you lose these abilities with any other iterators:

def f(x, y, *args):
args = list(args) # or tuple(args)
if len(args) == 3:
print args[0], args[1], args[2]

True, if you do want to check argument counts, this is an extra step of
work. I personally find that most of my functions with *args parameters
look like:

def f(x, y, *args):
do_something1(x)
do_something2(y)
for arg in args:
do_something3(arg)

where having *args be an iterable would not be a problem.
So basically what I've done here is to
"transpose" (to use your word) the iterators, apply my function, and
then transpose the iterators back.


If you follow the data movements, you'll find that iterators provide no
advantage here. To execute transpose(map(f, transpose(iterator)), the
whole iterator necessarily has to be read into memory so that the first
function application will have all of its arguments present -- using
the star operator only obscures that fact.


I'm not sure I follow you here. Looking at my code:

labels, feature_dicts = starzip(generator)
for label, feature_window in izip(labels, window(feature_dicts)):
write_instance(label, combine_dicts(feature_widow))

A few points:

(1) starzip uses itertools.tee, so it is not going to read the entire
contents of the generator in at once as long as the two parallel
iterators do not run out of sync

(2) window does not exhaust the iterator passed to it; instead, it uses
the items of that iterator to generate a new iterator in sync with the
original, so izip(labels, window(feature_dicts)) will keep the labels
and feature_dicts iterators in sync.

(3) the for loop just iterates over the izip iterator, so it should be
consuming (label, feature_window) pairs in sync.

I assume you disagree with one of these points or you wouldn't say that
"iterators provide no advantage here". Could you explain what doesn't
work here?

Steve
Jul 18 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

220
by: Brandon J. Van Every | last post by:
What's better about Ruby than Python? I'm sure there's something. What is it? This is not a troll. I'm language shopping and I want people's answers. I don't know beans about Ruby or have...
3
by: daniel narf | last post by:
Hi i am sure most of you have read the article of Andrew Kuchling about focusing more in the standart library than language newFeatures/tweaking and probably i as many others would like to know...
12
by: beliavsky | last post by:
I just came across the slides for Guido van Rossum's "Python Regrets" talk, given in 2002. It worries me that much of my Python code would be broken if all of his ideas were implemented. He doesn't...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.