469,290 Members | 1,882 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,290 developers. It's quick & easy.

key argument for max?

I was wondering if there's any plans to add a "key" argument to max
(and min) like was done for sort(ed)? I fairly often run into a
situation where I have something like:

counts = {}
for item in iterable:
counts[item] = counts.get(item, 0) + 1
_, max_item = max((count, item) for item, count in counts.items())

It would be nice to be able to write that last like like:

max(counts, key=counts.__getitem__)

If this seems like a good idea, maybe someone could point me in the
right direction and I could try to make a patch? I've never looked at
the Python sources before, and it's been a long time since I've
written any C, but if it's not too hard...

Steve
--
You can wordify anything if you just verb it.
- Bucky Katt, Get Fuzzy
Jul 18 '05 #1
6 8878
Steven Bethard <st************@gmail.com> wrote:
I was wondering if there's any plans to add a "key" argument to max
(and min) like was done for sort(ed)? I fairly often run into a
Unfortunately, no. I think that the excellent key= optional argument to
sort should be extended to every order-related Python tidbit; min and
max are examples, but so should the heapq module's functions.
Unfortunately, this idea wasn't liked by the python-dev consensus back
in 2.4's formative stages, and it's now too late for any functional
changes to 2.4. So, we can take it up again for 2.5 at some time.
situation where I have something like:

counts = {}
for item in iterable:
counts[item] = counts.get(item, 0) + 1
_, max_item = max((count, item) for item, count in counts.items())

It would be nice to be able to write that last like like:

max(counts, key=counts.__getitem__)
One crucial aspect that any key= should maintain is never to compare
items that happen to have equal keys. So, I think the semantics of that
line should rather be those of:

max((counts[item], i, item) for i, item in enumerate(counts))[-1]

i.e. inserting an index (arbitrary in this case, as counts is a
dictionary) that will never be equal so no two items are compared.

If you _want_ to compare items, when they have the same count, you
should say that explicitly -- the default should be that key= means no
item comparison, for uniformity. Unfortunately being explicit _and_
compact at the same time generally requires a lambda, such as:

max(counts, key=lambda item: counts[item], item)

If this seems like a good idea, maybe someone could point me in the
right direction and I could try to make a patch? I've never looked at
the Python sources before, and it's been a long time since I've
written any C, but if it's not too hard...


It's not too hard, but such a patch would not be accepted for Python
2.4, and it's a bit early to start making patches for 2.5, sigh.
Alex
Jul 18 '05 #2
Alex Martelli <aleaxit <at> yahoo.com> writes:

Steven Bethard <steven.bethard <at> gmail.com> wrote:

If this seems like a good idea, maybe someone could point me in the
right direction and I could try to make a patch? I've never looked at
the Python sources before, and it's been a long time since I've
written any C, but if it's not too hard...


It's not too hard, but such a patch would not be accepted for Python
2.4, and it's a bit early to start making patches for 2.5, sigh.


Sorry, I was unclear here. I did only intend this for 2.5 -- I usually figure
as soon as I've downloaded the beta, it's too late to consider any new
features. ;) Obviously no patches for 2.5 will go in until after 2.4 is
released (when is this planned for? end of the year?), but I'm not going
anywhere, so I figured I could work ahead a bit. =)

Steve
Jul 18 '05 #3
[Steven Bethard]
I was wondering if there's any plans to add a "key" argument to max
(and min) like was done for sort(ed)? I fairly often run into a

[Alex Martelli] Unfortunately, no. I think that the excellent key= optional argument to
sort should be extended to every order-related Python tidbit; min and
max are examples, but so should the heapq module's functions.
Unfortunately, this idea wasn't liked by the python-dev consensus back
in 2.4's formative stages, and it's now too late for any functional
changes to 2.4. So, we can take it up again for 2.5 at some time.


It should definitely be pursued for Py2.5. I'm the one who shot it
down the first time it came up and now wish that it had gotten a full
hearing.

The issue was political. Getting a key= argument for sort() was a big
win; however, proposing to propagate the idea could have jeopardized
everything by arousing the ire of the gathering anti-change movement.

Raymond Hettinger
P.S. The heapq module's new nsmallest() and nlargest() functions are
suitable for the key= argument; however, the other functions are not.
Unfortunately, the heap is not a collection object in its own right.
I don't see a way to attach an ordering function or a reversed=
argument for the whole heap. Such arguments would have to be passed
on every call to heapify(), heappop(), heappush(), and heapreplace().
Jul 18 '05 #4
Raymond Hettinger <py****@rcn.com> wrote:
...
Unfortunately, no. I think that the excellent key= optional argument to
sort should be extended to every order-related Python tidbit; min and
max are examples, but so should the heapq module's functions.
... It should definitely be pursued for Py2.5. I'm the one who shot it
down the first time it came up and now wish that it had gotten a full
hearing.
Yeah, I remember well.
The issue was political. Getting a key= argument for sort() was a big
win; however, proposing to propagate the idea could have jeopardized
everything by arousing the ire of the gathering anti-change movement.
You and I disagreed very intensely about it at the time, as you'll
recall. I'm not sure if you now really _do_ wish it had gotten a full
hearing, as you said in the first paragraph, or still think it was a
politically preferable move to avoid it, as you appear to imply here.
No matter, I guess, what's past is past. I'm sure you realize the way
you moved to prevent that hearing was the main reason I stepped away
from python-dev afterwards.

P.S. The heapq module's new nsmallest() and nlargest() functions are
suitable for the key= argument; however, the other functions are not.
Incidentally, those two functions are as good in usefulness as they are
badly misplaced in location. If one wants the "three smallest items",
the issue of what algorithm is used to get them should not be primary.
Unfortunately, I can't suggest a better location in today's stdlib.
Maybe that's an argument for AMK's idea that redoing the library should
be focus #1 of 2.5. The idea that useful functionality either doesn't
get in, or sneaks into some odd corner, does suggest that the library's
overall organization isn't all it could be.
Unfortunately, the heap is not a collection object in its own right.
I don't see a way to attach an ordering function or a reversed=
argument for the whole heap. Such arguments would have to be passed
on every call to heapify(), heappop(), heappush(), and heapreplace().


That one's easier to fix -- we can make a collections.heap type (module
collections feels so threadbare now, with just deque in it!-).
Alex
Jul 18 '05 #5
[Steven Bethard]
I was wondering if there's any plans to add a "key" argument to max
(and min) like was done for sort(ed)?
Someday maybe. In the meantime, here's a recipe:
from itertools import imap, izip, tee

def newmax(*args, **kwds):
"""Works just like max() but allows an optional key= argument.

If several keys are equal and maximum, may return any one of them.
"""
if 'key' not in kwds:
return max(*args)
func = kwds['key']

if len(args) == 1:
v = args[0]
else:
v = args
v1, v2 = tee(v)
return max(izip(imap(func, v1), v2))[-1]
If this seems like a good idea, maybe someone could point me in the
right direction and I could try to make a patch? I've never looked at
the Python sources before, and it's been a long time since I've
written any C, but if it's not too hard...


See dist/src/Python/bltinmodule.c
Add the keyword argument.
If it exists apply the function on each iteration before the
comparision step.

The only part that requires extra care is realizing that the key
function and the comparison step may call arbitrary Python code, so
don't assume that any mutable objects are still in the same state
before and after those calls.

Raymond Hettinger
Jul 18 '05 #6
Raymond Hettinger <py****@rcn.com> wrote:
...
from itertools import imap, izip, tee

def newmax(*args, **kwds):
"""Works just like max() but allows an optional key= argument.

If several keys are equal and maximum, may return any one of them.
"""
if 'key' not in kwds:
return max(*args)
func = kwds['key']
Errors should not pass silently, unless explicitly silenced. I don't
like the idea of somebody mistakenly calling newmax(foo, kye=bar) and
having to find out "downstream" that the mistaken 'kye=bar' was just
silently ignored. So, I would suggest rather using something like:

func = kwds.pop('key', None)
if kwds:
raise TypeError, (
'newmax() got an unexpected keyword argument %r' %
(kwds.keys()[0],))
if func is None:
return max(*args)
if len(args) == 1:
v = args[0]
else:
v = args
v1, v2 = tee(v)
return max(izip(imap(func, v1), v2))[-1]


Unfortunately, this does not replicate an important functionality you
gave to the key= argument of .sort: that items themselves are NEVER
compared, even when they have equal keys. So, for example, I can have
something like:
c=[2j, 3+1j, 3, 2, 1+3j]
sorted(c, key=abs)

[2j, 2, 3, (3+1j), (1+3j)]

but newmax(c, key=abs) would raise a TypeError.

It suffices, of course, to add 'count' to the names imported from
itertools, and change the function's last statement to:

return max(izip(imap(func, v1), count(), v2))[-1]

As a side effect, I think this makes
newmax(foo, key=bar) === sorted(foo, key=bar)[-1]
for any foo and bar. Not sure whether this stronger spec is worth
committing to, as opposed to 'may return any one of them'. But it sure
seems to me that the guarantee that comes with key=, that two items with
equal keys will never get compared to break the tie, _IS_ well worth
preserving for all uses of key=... -- it saves the day when items just
cannot be compared, and it avoids a performance trap when items _can_ be
compared but comparisons may be costly.
Alex
Jul 18 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Steven Bethard | last post: by
5 posts views Thread by Booted Cat | last post: by
17 posts views Thread by Charles Sullivan | last post: by
11 posts views Thread by vbgunz | last post: by
4 posts views Thread by Patient Guy | last post: by
8 posts views Thread by A. Anderson | last post: by
4 posts views Thread by robert | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.