reduce() anomaly?

Stephen C. Waterbury

This seems like it ought to work, according to the
description of reduce(), but it doesn't. Is this
a bug, or am I missing something?

Python 2.3.2 (#1, Oct 20 2003, 01:04:35)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

d1 = {'a':1}
d2 = {'b':2}
d3 = {'c':3}
l = [d1, d2, d3]
d4 = reduce(lambda x, y: x.update(y), l) Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update' d4 = reduce(lambda x, y: x.update(y), l, {})

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update'

- Steve.

Jul 18 '05

Subscribe Post Reply

226

12293

Alex Martelli

Ville Vainio wrote:
...

I also think that reduce, sum, map and filter (and lots of others,
__builtins__ has *too much stuff*) should be removed from builtins,
but that will probably take some time (1997 years?). LC's and genexpes
will take care of most of that stuff. And people can always do:
There are no way in which LC's and genexp can "take care of" CONSUMING
iterables and iterators -- they PRODUCE them (and itertools also do
some production, and, mostly, composable transformation). map and
filter, sure; sum, reduce, min, max, can never be replaced by any
LC nor any genexp. I think we need a separate module for _consumers_
of iterables/iterators, and I have repeatedly posted about it. But,
sure, I'd gladly agree that there's no real call for any of these to
be a built-in.

from funtional import *
# map, filter, reduce, curry, ... (I want lots of these :)
I'd rather put map/filter/reduce in a 'legacy' module -- warts and
all, e.g. map's horrible habit of None-padding shorter sequences,
EEK -- and have a designed-from-scratch functional module without
the warts. What about starting one as a sourceforge project, as I
mentioned elsewhere?

There are also tons of functions that should be in sys, math or
whatever:

reload, repr, divmod, max, min, hash, id, compile, hex...
Probably, yes. What functions, as opposed to types, do you
think should absolutely be in the builtins rather than in a separate
module, and (unless it's obvious) why? Of course __import__ had
better stay or we won't be able to get anything imported, for
example:-). The attribute-related functions hasattr, getattr,
setattr, and delattr, seem very fundamental (but I'd want the
same status for the item-functions currently over in operator),
as well as (at least some of them -- delattr isn't -- but I do
think that trying to discriminate too finely would make things
too hard to learn) very frequently used in code. What else?
iter, len, pow [for the crucial 3-arguments case], range (or some
preferable variant that returns an iterator), and zip seem pretty
fundamental; chr and ord might be suitable as str methods (in
the case of chr, a staticmethod, no doubt). Many functions that
are quite handy for interactive use, such as locals, globals,
dir, vars, ..., are not all that frequently used in programs --
so they might live in a module that the interactive mode gets
automatically, rather than being built-ins.

Having beginners learn 'import' before they do raw_input (or
output, which should also be a function, not a statement) may
not agree with somebody's didactics, so, we should consider how
to deal with those.

All of this would be perfect for the mailing list on Python 3.0
if the latter existed. Posting it to c.l.py makes it unlikely
Guido will ever consider the discussion's resuts, anyway. The
problem is that, with 3.0 at least 2 years away, there seems to
be little incentive to actually go and make such a list, so that
its archives may come in handy in the future.

What's your pet deprecation candidate? I have always thought
`backticks` as repr has got to be the most useless feature around.

Pretty bad, yes. 'apply' at least, while useless, doesn't make
Python's syntax any more complicated, while `...` does.
Alex

Jul 18 '05 #101

Alex Martelli

Douglas Alan wrote:
...

The argument that some programmers might be too lazy to want to learn
powerful, simple, and elegant features that can be taught in seconds,
Programmers may be quite ready to learn anything whose claimed "power"
IS actually there. But this complex of thread started with somebody
asking for good use cases for reduce, and NONE has been shown.

Programmers need not have any interest in spending what may well be
more than 'seconds' in learning something that will never given them
any *practical* return. Computer "scientists", maybe, have to be, by
some definition of "scientist". Fortunately, *programmers* are still
allowed to be very pragmatical types instead.

Besides, if you weren't exposed at all to LISP (or a LISP-like
language) while getting a CS degree, it wasn't a very good CS
program! They're going to teach you AI techniques in a different

Aha, showing your true colors: I see you have now moved from a
"CS 101 course" (which is given to majors in MANY disciplines)
to a "CS degree". Quite a different thing. Around here (for the
same length of course, say 3 years), graduates in "informatical
engineering", for example, may well not know what a Turing machine
is, nor be able to name any "AI technique" of practical use (they
may have learned, e.g. alpha-beta pruning, which historically did
originate within AI, but may be more practical today to learn in
completely different contexts) -- but they're likely to know more
than graduates in "informatics" (computer science) about, e.g.,
statistics, and/or how to organize a security-audit. (I have to
phrase it in terms of likelihood because most majors do offer quite
a bit of choice in terms of what exact courses you can take).

I wouldn't be surprised to find an "ingegnere informatico" (3-years
degree, i.e. BS-equivalent) who doesn't understand 'reduce' at
first; and once he or she has grasped it, I _would_ be surprised
not to get challenged with an "OK, now, what IS it GOOD for?".

When Plato was asked the same question about Geometry, he had a
slave give the querant a gold coin then throw him out of the school:
caring about USEFULNESS was OH so icky to Greek philosophers! I
would not be surprised if a similar stance (maybe not quite as
overt) lingers in several academic environments. Fortunately,
though, it seems to me that Engineering faculties have managed
to stay pretty free of it (yep, I'm an engineer, and proud of it,
and proud that my daughter has chosen an engineering major too --
my son chose Economics, a lovely calling to be sure, and is having
a much harder time selecting the courses that concentrate on the
_useful_ stuff ["how do I make a cool million before I'm 30" kind
of stuff:-)] from those which are the Economics equivalent of
"Suppose cows were spheres of uniform density"...:-).
Alex

Jul 18 '05 #102

Terry Reedy

"Patrick Maupin" <pm*****@speakeasy.net> wrote in message
news:65**************************@posting.google.c om...

"Terry Reedy" wrote:
I think it a big mistake (that should be repaired in 3.0) that the result start value was made optional, leading to unnecessary empty-seq exceptions. I also think the order is wrong and should also be fixed. I believe it was placed last, after the seq of update values, so that it could be made optional (which it should not be). But it should
instead come before the seq, to match the update function (result,
seq-item) arg order. This reversal has confused people, including
Guido.
That makes some sense, but you'd _certainly_ have to move and/or

rename the function in that case.
I agree on moving. I might rename reduce as induce simultaneously.
Although this is not quite your preferred form, perhaps those in such a rush to remove reduce() should consider this slight enhancement to
sum(), to mollify those who don't want to see the functionality disappear, but who could care less about the name of the function which provides the functionality (e.g. sum(mylist,1,operator.mul) is slightly
counterintuitive).
The problem with sum(seq, [base=0, [op = add]]) is that it *still* has
the result base case and the seq params in what I consider to be the
wrong order (unless the optional update function has the order of its
parameters reverses from reduce) -- and for the same reason of
optionality.

I need to write a coherent summary of my views on the ++s and --s of
reduce.
functions will be removed at all cost. As soon as the implementers
start porting their _own_ code to 3.0, I believe we'll starting getting useful feedback, either of the form "itertools et al. has everything
I need", or "boy, this SUCKS! I really need map!"
I presume that part of why Guido wants a year to do 3.0 is just so he
can test alternatives with ports of his own code.
(I'm curious, though, why you included "apply" in this list. I've
personally never needed it since the * enhancement to the call

syntax.)

I presume you mean as opposed to delete it altogether. Because apply
can be used with map while * and ** cannot and at least one person has
said he uses that.

Terry J. Reedy

Jul 18 '05 #103

Alex Martelli

Douglas Alan wrote:
...

That is non-trivial to most, based on my experience in explaining it
to other people (which for the most part have been computational
physicists, chemists, and biologists).
I find this truly hard to believe. APL was a favorite among
physicists who worked at John's Hopkins Applied Physics Laboratory
where I lived for a year when I was in high school, and you wouldn't

Interesting. I worked for many years in an environment where physicists
doing research could freely choose between APL and Fortran (IBM Research),
and while there was a hard-core of maybe 10%-15% of them who'd _never_
leave APL for any reason whatsoever, an overwhelmingly larger number
of physicist was at least as keen on Fortran. I have no hard data on
the subject, but it appears to me that Fortran has always been way more
popular than APL among physicists as a whole.
thing. In fact, people seemed to like reduce() and friends -- people
seemed to think it was a much more fun way to program, rather than
using boring ol' loops.

....while most physicists I worked with were adamant that they wanted
to continue coding loops and have the _compiler_ vectorize them or
parallelize them or whatever. Even getting them to switch to Linpack
etc from SOME of those loops was a battle at first, though as I recall
the many advantages did eventually win them over.

Anyway, computational scientists using Python should be using Numeric
(if they aren't, they're sadly misguided). Numeric's ufuncs ARE the
right way to do the equivalent of APL's +/ (which is quite a different
beast than ANYusercodedfunction/ would be...), and show clear and
obvious advantages in so doing:

[alex@lancelot tmp]$ timeit.py -c -s'import operator; xs=range(999)'
'x=reduce(operator.add, xs)'
1000 loops, best of 3: 290 usec per loop

[alex@lancelot tmp]$ timeit.py -c -s'xs=range(999)' 's=sum(xs)'
10000 loops, best of 3: 114 usec per loop

[alex@lancelot tmp]$ timeit.py -c -s'import Numeric as N;
xs=N.arrayrange(999)' 'x=N.add.reduce(xs)'
100000 loops, best of 3: 9.3 usec per loop

Now *THAT* is more like it: 10+ times FASTER than sum, rather than
2+ times SLOWER! Of course, you do have to use it right: in this
snippet, if you initialize xs wrongly...:

[alex@lancelot tmp]$ timeit.py -c -s'import Numeric as N; xs=range(999)'
'x=N.add.reduce(xs)'
100 loops, best of 3: 2.1e+03 usec per loop

....then you can say goodbye to performance, as you see. But when
used skilfully, Numeric (or its successor numarray, I'm sure -- I
just don't have real experience with the latter yet) is just what
numerically heavy computations in Python require.
Alex

Jul 18 '05 #104

Alex Martelli

Dave Brueck wrote:
...

Why, pray-tell, would you want an OO program to do:

results = [ func(x) for x in sequence ]

... instead of ...

results = sequence.map(func) ??

Because I find the first much more readable (and IMO the "an OO program to
do" bit is irrelevent from a practical point of view).

I entirely agree with both points. They're even clearer when the
contrast is between, e.g.:

results = [ x+23 for x in sequence ]

and:

results = sequence.map(lambda x: x+23)

where using the HOF approach forces you to understand (and read) lambda too.
Alex

Jul 18 '05 #105

Alex Martelli

ei****@metacarta.com wrote:
...

that kind of array handling. But the point here is that "reduce" is
fundamental: x/i5 (where x is multiplication-sign and i is iota) is
a lot like reduce(int.__mul__, range(1,6)), it's just "readable" if
I disagree about "is a lot like" -- I think Numeric Python's

mul.reduce(arrayrange(1, 6))

is much closer to APL (sure, spelling "iota 5" as "arrayrange(1, 6)"
may be alien, but the point is that .reduce in Numeric, like / in APL,
is a property of OPERATORS [ufuncs, in Numeric], NOT a generic FP
tool like Python's built-in reduce).
of poking around.) On the other hand, that readability does assume
you're thinking in terms of throwing arrays around, which can be
an... *odd* way of looking at things, though of course when it fits,
it's very nice.

That's the reason I'm happier to have it in a separate package, be
it Numeric or its successor numarray, than built into the language --
just like, e.g., regular expressions are a separate module in Python
but built into languages such as Perl or Ruby. Keeping the core
language simpler and other modules/packages rich is a strategy which
I think Python uses to excellent effect.
Alex

Jul 18 '05 #106

Alex Martelli

Robin Becker wrote:
...

on oop in this thread nobody has pointed out that we could have

sequence.sum()
sequence.reduce(operator.add[,init])

or even

sequence.filter(func) etc etc

and similar. That would make these frighteningly incomprehensible ;)
concepts seem less like functional programming. Personally I wouldn't
like that to happen.

Maybe nobody "pointed it out" because it's utterly absurd to even
conceive of somehow magically adding all of those methods to *EVERY*
iterator under heavens?! E.g., today, this works just fine:

class BAH(object): .... def __init__(self): self.seq = [1, 5, 3, 7, 19]
.... def __iter__(self): return self
.... def next(self):
.... try: return self.seq.pop()
.... except IndexError: raise StopIteration
.... sum(BAH())

35

having instances of class BAH automagically spout such methods as
..sum(), etc, would be an absurdity.

Numeric's design -- considering that *functions* able to do .reduce
and the like are very special and should expose those as methods
(they're known as "ufuncs" in Numeric), taking the array as argument,
is a design which looks quite sound to me. So, for example,
Numeric.sum(x, axis) boils down (after some error checks &c) to
Numeric.add.reduce(x, axis) [summing, of course, IS by far important
enough that having to spell it out as add.reduce would be silly].
Alex

Jul 18 '05 #107

Alex Martelli

Douglas Alan wrote:
...

How's that? I've never used a programming language that has sum() in
it. (Or at least not that I was aware of.) In fact, the *Python* I
use doesn't even have sum() in it! I've used a number of languages
Never used a modern Fortran? Or never bothered to _learn_ it properly
"because it's a language for peasants"?-)
that have reduce(). If I didn't already know that it exists, I
wouldn't even think to look in the manual for a function that adds up
a sequence of numbers, since such a function is so uncommon and
special-purpose.

Right -- that's why the Fortran standard committee bothered to include
it, the Numeric Python designers bothered to define sum specifically
as an alias for add.reduce, and so on -- because it's so uncommon.

Indeed, that's why 'sum' is a commonly used word in English -- exactly
because nobody's every DOING anything like that -- while, of course,
'reduce' is totally obvious -- why, _anybody_ who's ever taken
Chemistry 101 knows it means "to remove oxygen"!
Alex

Jul 18 '05 #108

David Eppstein

In article <9s*******************@news1.tin.it>,
Alex Martelli <al***@aleax.it> wrote:

How's that? I've never used a programming language that has sum() in
it. (Or at least not that I was aware of.) In fact, the *Python* I
use doesn't even have sum() in it! I've used a number of languages

Never used a modern Fortran? Or never bothered to _learn_ it properly
"because it's a language for peasants"?-)

For that matter, Haskell, the third thing I get from googling
"functional programming language", has had sum since it was first
created (version 1.0 report 1990), despite its being easily replaceable
with reduce (or I guess in Haskell terms foldl).

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #109

Alex Martelli

Douglas Alan wrote:
...

Well, that's the argument you seem to be making -- that reduce() is
superfluous because a sum() and max() that work on sequences were
added to the language.
"added"? 'max' worked that way since well before I met Python. But
to be consistent with your other arguments, no doubt you'd argue for
a .sort() followed by [-1] as "more general" than max...

I can already see what's going to happen with sum(): Ultimately,
people will realize that they may want to perform more general types
of sums, using alternate addition operations.
Not gonna happen - this _might_ happen if Python was a
design-by-committee language, but it's not.

According to Alex Martelli, max() and min() are likely to be extended
in this fashion. Why not sum() next?

I was just presenting my personal opinion regarding the fact that
the max and min functions should have an optional key= argument,
simply because the .sort method now (2.4) has one too. If I implied
that anything I feel should be done is automatically "likely" to be
accepted by Guido, I apologize for mis-communicating; he throws stones
at far more of my ideas than he embraces.

To you, perhaps. Not me, and not a lot of other people.

Well, perhaps you can explain your confusion to me? What could
possibly be unintuitive about a function that is just like sum(), yet
it allows you to specify the addition operation that you want to use?

Addition, as you have remarked elsewhere, is intrinsically suitable
for being applied to multiple operands; one is therefore naturally
disposed to think in terms of "add them all up", "sum them all", and
the like, NOT in terms of "iteratively perform
total = operator.add(total, item)
for all items in the sequence of numbers". But, reduce as it exists
in Python today cannot be clearly defined EXCEPT by such an iteration
with some generic "callable which accepts two arguments" specified
in lieu of the operator.add. Given this total generality, even though
hardly ever does it make sense to USE it, reduce becomes quite complex.

You've even argued in this thread, incorrectly, that reduce could be
eliminated if only all binary operators were able to accept arbitrary
numbers of arguments. This, plus your insistence that 'reduce' is
"just like" APL's / (which does NOT accept arbitrary functions on its
left -- just specific operators such as +), indicate a _misconception_
of reduce on your part. I'm sure you don't hold it consciously, but
these miscommunications indicate that even you, reduce's paladin, do
NOT properly grasp reduce "intuitively".

Having (e.g.) add accept a sequence argument (like max does), or, for
explicitness and potentially performance, grow an add.reduce attribute
just like in Numeric, would give no particular problems. I'd still
want (just like Numeric does) to have sum as a name for add.reduce,
because it's by far the most common case and avoids getting into the
issue of what the (expletive delete) does "reducing" have to do with
anything that "add.reduce" does (it's really a curve ball compared
with the many meanings of "reduce" in English, after all). But, most
importantly, such a design would avoid the veritable traps to which
the current, too-general 'reduce' subjects the poor learner who's
quite reasonably going to believe that all that generality MUST be
good for something if it's been designed into a built-in. We've seen
quite a few such inappropriate uses on this thread, after all.

As I said, I'm not particularly good at guessing what Guido will or
won't bless, but I suspect that a patch to the operator module to
grow appropriate attributes on its functions, or even just make them
accept multiple arguments, might be well-received (to avoid wasting
work one might want a PEP or prePEP about that first, of course).

Of course, you can get the same effect by defining a class with your
own special __add__ operator, and then encapsulating all the objects
you want to add in this new class, and then using sum(), but that's a
rather high overhead way to accomplish the same thing that reduce()
lets you do very easily.

In practice, I judge it quite unlikely that people will go out of their
way to abuse sum as I've seen them abuse reduce -- reduce lets you
abuse it very easily, as you mention, while sum needs you to really
jump through hoops to do so.
Alex

Jul 18 '05 #110

Dang Griffith

On Wed, 12 Nov 2003 13:19:15 -0500, Douglas Alan <ne****@mit.edu>
wrote:

I would say that if you didn't get introduced to at least the concept
of functional programming and a hint of how it works, then something
was seriously wrong with your CS education.
|>oug

I've been lurking this thread for a while, but missed the beginning.
Can I ask where you got your CS education (school and year)?
From the conversation, it seems that right or wrong, your education
exceeds those in this group. Where should I send my children
such that they receive a better CS education that I?
--dang

Jul 18 '05 #111

Terry Reedy

"Douglas Alan" <ne****@mit.edu> wrote in message
news:lc************@gaffa.mit.edu...

Even without any algebra, any kid can tell you that 1 + 2 is the same as 2 + 1. Replace 1 and 2 by a and b and you get the same result.
Yes, but they are still two *different* ways to to get to that

result. Starting with a and adding b to it, is not the same thing as starting with b and adding a to it.
I disagree. Number addition (like several other functions) is a bag
(multiset) reduction operator that reduces a (possibly empty)
collection of similar items to one of the same type. Abstractly, the
reduction is simultaneous. One appropriate notation for such
operators is a simple labelled loop (in 2D) containing the items to be
collected together. In linear text, such loops can be represented by
matched symbols, such as ( ), [ ], { }, which then represent the
intersection of a closed loop with a 'too-narrow' text band (which
thereby chops off the top and bottom of the loop). When operands
scattered in the loop interior are projected down to a 1+D band for
text representation, they gain an ordering that is an artifact of that
limited representation.

(+ item item ...) and sum(item, item, ...) are both labelled loops
with items arbitrarily but necessarily ordered for linear text
representation.
It is only the commutative law of arithmetic,
as any good second grade student can tell you,
that guarantees that the result will be the same.

Well, second graders are mislead by the representation of an n-ary bag
operation with infix notation that implies binarity and order
relevance. Learning about reduce should help one unlearn these
implications ;-).

Even with infix notation, there is a choice between making the default
interpretation be order relevance or order irrelavance. If
mathematicians had made the second choice, therr would be no need for
a 'commutative law' One would say that a+b == b+a because that is the
way it normally is, while a-b != b-a because of the OrderRelevance Law
of subtraction.

< On the other hand, not all mathematical groups are albelian,
< and consequently, a + b != b + a for all mathematical groups.

Yes, there are order-relevant binary-only operations. Number addition
is not one of them.

[Having written the above, I am less enthralled with the overloading
of '+' and operator.add to also mean ordered concatenation of
sequences. The fact that operator.add becomes O(n**2) (usually
unnecessarily) instead of O(n) for such overloadings, to the point
that sum() specially excludes such overloadings for strings,
reinforces this doubt. Perhaps there should be at least be an
operator.cat that would generalize ''.join(strseq)]

Terry J. Reedy

Jul 18 '05 #112

Alex Martelli

Douglas Alan wrote:

Alex Martelli <al***@aleax.it> writes:
no disagreement, reduce is in line with that philosophy sum is a
shortcut and as others have said is less general.
'sum' is _way simpler_: _everybody_ understands what it means to sum a
bunch of numbers, _without_ necessarily having studied computer science.
Your claim is silly. sum() is not *way* simpler than reduce(), and
anyone can be explained reduce() in 10 seconds: "reduce() is just like

As you hedged elsewhere, anyone can be EXPLAINED it, but not anyone
will UNDERSTAND the explanation instantly. Now THAT hedging is way
silly, and of course Shakespeare did it best:

"""
Glendower: I can call spirits from the vasty deep.

Hotspur: Why, so can I, or so can any man; But will they come when you do
call for them?
"""
sum(), only with reduce() you can specify whatever addition function
you would like."
This explanation is misleading by overqualification (that "addition" is
the overqualification). A non-overqualified explanation would have to
mention that ANY callable that can accept two arguments, and return a
result suitable to be its first argument next, can be passed; and it would
have to touch on what happens when the sequence is empty, and the optional
third argument. Yes, you COULD cram it into 10 seconds -- people won't
UNDERSTAND what you're saying, of course, but clearly you don't care
about such trifles, as you have abundantly shown.

The claim, made by somebody else, that _every_ CS 101 course teaches the
functionality of 'reduce' is not just false, but utterly absurd:
'reduce', 'foldl', and similar higher-order functions, were not taught to
me back when _I_ took my first university exam in CS [it used Fortran as
te main language],

Then you weren't taught Computer Science -- you were taught Fortran
programming. Computer Science teaches general concepts, not specific
languages.

CS teaches general concepts and in all decent CS 101 course it starts
doing so with concrete examples (because THAT is how most people learn
best), ideally with an executable computer language so you can TRY things.
It is quite likely that higher-order functions won't be taught yet in
CS 101, or that only a very few specific cases will be shown and reduce
(or foldr or however you want to call it) will not be one of them.

Your false and utterly absurd claim about the content of _every_ CS 101
course has been totally demolished by now -- by many people, including
CS majors. I'm sure you'd be happy to "undo" it, but you've invested
so much emotional energy on it and made it a lynchpin of your argument
in favour of 'reduce' that I understand that's just too hard. Still,
by standing on such rotten foundations, your argument is reduced (heh)
to near-absurdity...

they were not taught to my son in _his_ equivalent course [it used
Pascal], and are not going to be taught to my daughter in _her_
equivalent course [it uses C].

Then your children were done a great diservice by receiving a poor
education. (Assuming that is that they wanted to learn Computer
Science, and not Programming in Pascal or Programming in C.)

They wanted to learn enough computer science to get on with their
real interests in life -- telecom engineering and financial economics
respectively. They may well take second or third courses on CS issues
in the future. But within the confines of a Bachelor (3-years degree)
in such disciplines, heavily different from computer science but able to
use several CS conceptual and practical tools effectively for other
purposes, I think they were served perfectly well by "Foundations
of Informatics" (aka CS 101 -- we don't number things like that here,
and "Informatics" is strongly preferred over "CS", and in particular
it's the preference of local computer scientists) covering a lot of
important concepts AND practical exercises, but not ALL higher-order
functions. Personally I think that they wasted effort in teaching
about pointers, but I guess that's debatable.

Python's purpose is not, and has never been, to maximize the generality
of the constructs it offers.

Whoever said anything about "maximizing generality"? If one's mantra

Your only argument FOR reduce is its maximal generality (and the falsehood
that it's taught in every CS 101 course).
elegant features understood by anyone who has studied Computer Science
If a CS 101 course (passed with top marks) counts as having "studied
Computer Science", then 'reduce' is not such a feature. It's not very
elegant either, btw, but that's a subjective opinion -- the fact that
CS 101 courses mostly do NOT teach reduce/foldl is a FACT. So, by
this reasoning of yours, but based on realities rather than falsehoods,
reduce should go away.
*utterly* wrong manner. You should be assuming that your audience are
the smart people that they are, rather than the idiots you are
assuming them to be.
They're smart _and pragmatical_, NOT interested in (very arguable...)
elegance _for its own sake_, for the most part. So, they ask for GOOD
use cases for reduce, if they're taught it -- and... none can be found
throughout the Standard Python library and everywhere else I've checked
into real programs (written by me or my clients, or downloaded from
the net). Every single use of reduce I've found would be faster,
cleaner, simpler with sum, an explicit loop, or other ways yet.

Keep the language small and simple.

Provide only one way to do an operation.

It is not true these principles are rare among computer languages --
they are quite common. Most such language (like most computer
languages in general) just never obtained any wide usage.

URLs please (or non-URL references if you must). I had never seen
these two principles written down for a general-purpose language
before I met C -- and haven't, since, except for Python. I'll be
quite interested in being able to quote other languages explicitly
adopting them as design criteria.

place at the right time, it had a lightweight implementation, was

The lightweight implementation clearly follows from the two
principles above -- C had very-lightweight, fast, portable
compilers early on for just the same reason.
Alex

Jul 18 '05 #113

Emile van Sebille

Alex Martelli:

... no doubt you'd argue for
a .sort() followed by [-1] as "more general" than max...

Well, you must admit that writing pentultimate(a) has it all over
a.sorted()[-2] ;-)

--

Emile van Sebille
em***@fenx.com

Jul 18 '05 #114

Dang Griffith

On Fri, 14 Nov 2003 18:07:44 GMT, Dang Griffith
<no*****@noemail4u.com> wrote:

On Wed, 12 Nov 2003 13:19:15 -0500, Douglas Alan <ne****@mit.edu>
wrote:
I would say that if you didn't get introduced to at least the concept
of functional programming and a hint of how it works, then something
was seriously wrong with your CS education.
|>oug

I've been lurking this thread for a while, but missed the beginning.
Can I ask where you got your CS education (school and year)?
From the conversation, it seems that right or wrong, your education
exceeds those in this group. Where should I send my children
such that they receive a better CS education that I?
--dang

Nevermind--I saw more posts and summarily reached a conclusion,
reducing the alternatives to one.
--dang

Jul 18 '05 #115

Terry Reedy

"Alex Martelli" <al***@aleax.it> wrote in message
news:WQ*******************@news1.tin.it...

Terry Reedy wrote:
If the builtins are reduced in 3.0, as I generally would like, I would be fine with moving apply, map, filter, and a repaired version of
reduce to a 'fun'ctional or hof module. But the argument of some
seems to be that this batteries-included language should specifically exclude even that.
A functional module would be neat.

Good. I wrongly thought you might be in the abolish camp.
A great way to enhance the chance that there will be one
would be starting one today (e.g. on sourceforge),
ideally with both pure-Python and C-helped (or pyrex, etc)
implementations,
and get it reasonably-widely used, debugged, optimized.
Interesting idea. Maybe I will do something with it.
Also, I advocate that 3.0 should have a module or package (call it
"legacy" for example) such that, if you started your program with
some statement such as:

from legacy import *

compatibility with Python 2.4 or thereabouts would be repaired as
much as feasible, to ease running legacy code, and to the expense
of performance, 'neatness' and all such other great things if needed
(e.g., no "repaired" versions or anything -- just compatibility).
Running old code in new interpreters seems not to be much of a problem
yet because of the current policy of not deleting features or (with
pretty minor or obscure exceptions) changing semantics. But if any
deprecations take effect before 3.0, legacy could also be added
before.

I think there should also be an official 'retroxy' module to help run
(a feasible subset of) pythonx.y code in older interpreters. A
retro23 module would define True, False, sum, enumerate and perhaps
other stuff. Accompanying instructions could discuss which older
versions a new module like sets will work with. I am sure this has
been invented privately several times.
One reasonably popular counterproposal to that was to have it as

from __past__ import *

by analogy with today's "from __future__".
which Guido has strongly rejected since it would mean keeping old
stuff in the interpreter forever -- unless it were a synonym for
'import legacy' -- in which case your spelling is better.
I'd also like to make it
easy to get this functionality with a commandline switch, like is
the case today with -Q specifically for _division_ legacy issues.
-L to mean 'import legacy' on startup should be possible.
But mostly, each time I mention that on python-dev, I'm rebuffed with remarks about such issues being way premature today. Somebody's
proposed starting a new list specifically about 3.0, to make sure
remarks and suggestions for it made today are not lost about more
day-to-day python-dev traffic, but I don't think anything's been
done about that yet.

Perhaps Guido is trying to avoid pressure to start before he and the
PSF are ready.

I personally wish 3.0 were out now or the next release. That is what
I more or less expected 2+ years ago when I suggested that the
division switch be delayed until 3.0 -- or rather, suggested that the
first release after 2 years make the switch *and* be called 3.0. I
was not asking to have to import the new division rules for years and
years;-).

Terry

Jul 18 '05 #116

Thomas Heller

Alex Martelli <al***@aleax.it> writes:

Terry Reedy wrote:
...
personally argued that reduce() should be removed from the language,
but I do agree that it does not have to be part of "core" Python,
and could easily be relegated to a module.)

If the builtins are reduced in 3.0, as I generally would like, I would
be fine with moving apply, map, filter, and a repaired version of
reduce to a 'fun'ctional or hof module. But the argument of some
seems to be that this batteries-included language should specifically
exclude even that.

A functional module would be neat. A great way to enhance the chance
that there will be one would be starting one today (e.g. on sourceforge),
ideally with both pure-Python and C-helped (or pyrex, etc) implementations,
and get it reasonably-widely used, debugged, optimized.

I would suggest: don't move the implementation to C too fast or too much
of the code. With pypy on the horizon, this may never be needed (?).

Thomas

Jul 18 '05 #117

Terry Reedy

"Alex Martelli" <al***@aleax.it> wrote in message
news:IT********************@news2.tin.it...

I'd rather put map/filter/reduce in a 'legacy' module -- warts and
all, e.g. map's horrible habit of None-padding shorter sequences,
EEK -- and have a designed-from-scratch functional module without
the warts.
What a liberating thought. I sometimes forget that the only thing I
am really stuck with when programming in Python is the core syntax (in
the Ref Manual) and that I am free to wrap or replace anything else
(in the Lib Manual). For a project I hope to start soon, I have been
worried about how to work around or explain away the deficiencies of
Python's of reduce(). Instead, I should just write the version I want
for my purposes (especially since speed does not matter). So your
voluminous writing on this has benefited at least one person.
Probably, yes. What functions, as opposed to types, do you
think should absolutely be in the builtins rather than in a separate
module, and (unless it's obvious) why?
I have thought some about this also. Type objects could be put/left
in types, but as constructors, they are too useful and basic not to
have immediately available (unless builtins was reduced to nearly
nothing).
Of course __import__ had better stay or we won't
be able to get anything imported, for example:-).
Does it have to be in builtins for import statements to work?
Direct use of __import__ seems relatively rare.
The attribute-related functions hasattr, getattr,
setattr, and delattr, seem very fundamental
(but I'd want the same status for the item-functions
currently over in operator),
as well as
(at least some of them -- delattr isn't -- but I do think that trying to discriminate too finely would make things too hard to learn)
very frequently used in code.
I had to re-lineate this to parse apart the three thoughts. A static
frequency-of-use analysis for some large code collection would be
interesting. I might write something eventually.
iter I group this with type constructor objects, even though not one
itself len unless made a method, keep in pow [for the crucial 3-arguments case] very specialized and rarely used? move to math as pow3 range (or some preferable variant that returns an iterator) how about iter(3), etc, currently an arg type error?
think of 3 as the sentinal value.
is not typing '0,' worth the wartiness of optional first arg? and zip seem pretty fundamental

agree except for pow

Terry J. Reedy

Jul 18 '05 #118

Andrew Dalke

Alex:

Anyway, computational scientists using Python should be using Numeric
(if they aren't, they're sadly misguided).
Or they are like me and work in a subfield where (specialized)
database searches and graph theory is more useful. It was strange
at SciPy with all those people using NumPy and doing CFD and
our little group of computational life sciences people being rather
bored.
Indeed, that's why 'sum' is a commonly used word in English -- exactly
because nobody's every DOING anything like that -- while, of course,
'reduce' is totally obvious -- why, _anybody_ who's ever taken
Chemistry 101 knows it means "to remove oxygen"!

While in Chemistry 102 they learn that it means to add electrons
and can occur in a non-oxygen environment ;)

http://reference.allrefer.com/encycl...oxidreduc.html

Andrew
da***@dalkescientific.com

Jul 18 '05 #119

Terry Reedy

"Alex Martelli" <al***@aleax.it> wrote in message
news:8Y*******************@news1.tin.it...

Dave Brueck wrote:
...
results = [ func(x) for x in sequence ]
... instead of ...
results = sequence.map(func) ??
Because I find the first much more readable>
I entirely agree with both points.
For this pair, I like the second better. Different aesthetics.
They're even clearer when the contrast is between, e.g.:
results = [ x+23 for x in sequence ]
and:
results = sequence.map(lambda x: x+23)
where using the HOF approach forces you
to understand (and read) lambda too.

Here I might take the first. 'lambda' is something I feed 'stuck'
with.

Would the hypothetical
results = sequence.map(func x: x+23)
be any better?
How about a very hypothetical (post ``==repr deprecation)
results = sequence..map(`x+23`)
?

Terry J. Reedy

Jul 18 '05 #120

Terry Reedy

"Alex Martelli" <al***@aleax.it> wrote in message
news:%N********************@news2.tin.it...

to be consistent with your other arguments, no doubt you'd argue for
a .sort() followed by [-1] as "more general" than max...

By my definition of "more general", a function that returns all order
statistics of a list is trivially more general than one that returns
any fewer, including just one. It conveys a superset of the
infomation conveyed by less general functions and answers a superset
of the questions they answer. If you add a control parameter
specifying which order statistics to return, then less general
functions have a restriced domain.

So I have no idea your intent in appearing to challenge this and how
you think it contributes to your overall argument. Is your idea of
'more general' so different?

Terry J. Reedy

Jul 18 '05 #121

Andrew Dalke

Alex Martelli:

"added"? 'max' worked that way since well before I met Python. But
to be consistent with your other arguments, no doubt you'd argue for
a .sort() followed by [-1] as "more general" than max...

Even better (tongue-in-cheek) would be a function to get the
t largest (or s largest to t largest) without necessarily comparing
between elements whose values are guaranteed out of range

Just looked through Knuth for that. There are various ways
listed, but no neat name to use for that function. :)
Look under 'select k largest' and 'select kth largest'. Most
of the hits are in the exercises. First one is for Hoare, and
there's a reference to Dodgson (Carroll) and lawn tennis.

Andrew
da***@dalkescientific.com

Jul 18 '05 #122

Douglas Alan

Dang Griffith <no*****@noemail4u.com> writes:

I would say that if you didn't get introduced to at least the concept
of functional programming and a hint of how it works, then something
was seriously wrong with your CS education.
I've been lurking this thread for a while, but missed the beginning.
Can I ask where you got your CS education (school and year)? From
the conversation, it seems that right or wrong, your education
exceeds those in this group. Where should I send my children such
that they receive a better CS education that I?

I took some sort of one-day class at the public library on BASIC when
I was about 12. Then later in high school, I was handed a book on APL
by a friend's father. Then my mother sent me off to Harvard Summer
School, where I first took an official CS-101 class (and a class on
Astronomy), and then I went to MIT for undergraduate school, where I
also took their CS-101.

So, in a way, I had four different CS-101's, and in *all* of them I
was exposed to concepts of functional programming, except for the
BASIC case.

Despite me having studied at Harvard and MIT, it is my belief that you
don't have to have attended either of these places to receive a decent
CS education. You just have to be taught by people who don't think
that something as simple as reduce() is beyond your abilities. In
fact, perhaps it was in a small part because I was introduced to
concepts such as reduce() at a young age that I was able to become
smart enough to get into such a fine school. Would you deny such an
opportunity to those who learn to program in Python?

|>oug

Jul 18 '05 #123

Dave Brueck

> You just have to be taught by people who don't think

that something as simple as reduce() is beyond your abilities.
Nobody in this entirely too-long thread has asserted such a thing. You have
inferred from people saying that it's non-obvious or less readable than other
constructs that they mean it's difficult to teach or that people are too stupid
to learn it. Nothing could be farther from the truth.

It's not that reduce is hard to learn - it's just that there are simpler and
clearer constructs that do the same thing and are more readable to most people.
That's it!
fact, perhaps it was in a small part because I was introduced to
concepts such as reduce() at a young age that I was able to become
smart enough to get into such a fine school. Would you deny such an
opportunity to those who learn to program in Python?

<retching noises> Ok, I'm done with this thread. Have fun,
Dave

Jul 18 '05 #124

Douglas Alan

"Dave Brueck" <da**@pythonapocrypha.com> writes:

You just have to be taught by people who don't think
that something as simple as reduce() is beyond your abilities.
Nobody in this entirely too-long thread has asserted such a thing.
There *have* been postings that have asserted that reduce() is
difficult to learn. But it is not. I learned it when I was in high
school with no trouble and no background. I've already posted a
perfectly good description of reduce() that anyone should be able to
learn in under a minute. And if despite this, someone finds reduce()
difficult to learn, then perhaps this is particularly the kind of
thing they that *should* be learning, so that such things will no
longer be difficult for them.
You have inferred from people saying that it's non-obvious or less
readable than other constructs that they mean it's difficult to
teach or that people are too stupid to learn it. Nothing could be
farther from the truth.
Neither is sum() obvious until you know what it does. Perhaps it is
it just a synonym for add? Or maybe it produces summaries of some
sort. What happens if you give it an empty list? In the time that
you can answer these questions for yourself, you can have figured out
the more general tool, reduce(), and once you have, then it is just as
obvious and readable as sum(), only it has other uses as well.
It's not that reduce is hard to learn - it's just that there are
simpler and clearer constructs that do the same thing and are more
readable to most people. That's it!

Well, bah! There are precious few constructs in this world that are
clearer and more readable than

reduce(add, seq)

|>oug

Jul 18 '05 #125

BW Glitch

Douglas Alan wrote:

BW Glitch <bw******@hotpop.com> writes:

The mantra "there should be only one obvious way to do it" apparently
implies that one should remove powerful, general features like
reduce() from the language, and clutter it up instead with lots of
specific, tailored features like overloaded sum() and max().
That's not what _I_ understand. There's room for powerful features,
but when these features gives trouble to the people who just want
something done, then another approach should be taken.

If there's a problem with people not understaning how to sum numbers
with reduce(), then the problem is with the documentation, not with
reduce() and the documentation should be fixed. It is quite easy to
make this fix. Here it is:

You are assuming way too much. Documentation is meant to clarify what a
function does, not a tutorial on what a function is. That's why your
CS101 professor should have taught you that the name of the function
should state what the function does. sum() states clearly that it is
meant to sum something (a connection is made with numbers, because
that's what most people associate numbers). reduce(), OTOH, states
nothing, it reduces. But what? Why should I care for a function that
isn't clear?
FAQ
---
Q: How do I sum a sequence of numbers?

A: from operator import add
reduce(add, seq)

Problem solved.

A: Guru.

Q: How a user that reads the manual is called?

;)

And I'm not talking about stupid people. I'm talking about the
microbiolgist/chemist/physics/etc who is programming because of
need.

If a microbiologist cannot understand the above, they have no business
being a microbiologist.

Why should he? Because he could care less about a function called
reduce()? Because he just want to do his job, regardless on how a CS
expert does it? Because he didn't learned LISP as his first language?
I learned reduce() in high school, and it
didn't take me any longer than the 60 seconds I claim it will take
anyone with a modicum of intelligence.

Please, get your fact straights.

Do you know that people learn in different ways? Just because you
learned something one way doesn't mean everyone else should learn it
(and understand it) the same way you do.

If so, clearly this mantra is harmful, and will ultimately result
in Python becoming a bloated language filled up with "one obvious
way" to solve every particular idiom. This would be very bad, and
make it less like Python and more like Perl.
You feel ok? Perl's mantra is "More Than One Way To Do It"...

If both the mantras cause a language to have general features
replaced with a larger number of specialized features that accomplish
less, then both mantras are bad.

Mantras are cute names for design decisions (I know, mantras doesn't
mean that, but in these context). When designing something, certain
compromises must be made. Guido van Rossum (GvR or BDFL) made some
decisions early on and have stayed with Python ever since. Good or bad,
we have to stick with the decisions. Why? Because each decision
represents a solution from many to a single problem. If we start making
exceptions to the decision, the decision should be rethought. But more
importantly, if it works, don't fix it.

Unless you go by the engineering mantra...

I can already see what's going to happen with sum(): Ultimately,
people will realize that they may want to perform more general
types of sums, using alternate addition operations. (For intance,
there may be a number of different ways that you might add together
vectors -- e.g, city block geometry vs. normal geometry. Or you
may want to add together numbers using modular arithmetic, without
worrying about overflowing into bignums.) So, a new feature will
be added to sum() to allow an alternate summing function to be
passed into sum(). Then reduce() will have effectively been put
back into the language, only its name will have been changed, and
its interface will have been changed so that everyone who has taken
CS-101 and knows off the top of their head what reduce() is and
does, won't easily be able to find it.
I don't get you. There's a reason why special functions can be
overloaded (__init__, __cmp__, etc.; I don't use others very
often). That would allow for this kind of special
treatments. Besides GvR would not accept your scenario.

There are often multiple different ways to add together the same data
types, and you wouldn't want to have to define a new class for each
way of adding. For instance, you wouldn't want to have to define a
new integer class to support modular arithmetic -- you just want to
use a different addition operation.

Define a class then. What you are describing is a specific case which
most people don't face in everyday problems. If you have a set of tools,
try to use the best combination for the given tool. If all you have is a
hammer, everything start looking like a nail. ;) If you have to program
in Java, everything start looking like a class.

Also, whytf do you mention so much CS101?

Because anyone who has studied CS, which should include a significant
percentage of programmers, will know instantly where to look for the
summing function if it is called reduce(), but they won't necessarily
know to look for sum(), since languages don't generally have a
function called sum(). And everyone else will not know to look for
either, so they might as well learn a more powerful concept in the
extra 30 seconds it will take them.

False. Programmers come from many sources. Restraining yourself to CS
people is bad enough. As Alex Martelli mentioned in another post of this
thread, power should be demonstrable. It's quite amusing that no one has
showed an example of the power of reduce() (by itself, not its side
effects).

[snip]
IMHO, you think that the only people that should make software
is a CS major.

Did I say that?

You didn't. But I infere that from what you've said in this thread.

>To me, there is never *one* obviously "right way" to do anything
Never? I doubt this very much. When you want to add two numbers in a
programming language, what's your first impulse? Most likely it is
to write "a + b".
Or b + a. Perhaps we should prevent that, since that makes two
obviously right ways to do it!
Even without any algebra, any kid can tell you that 1 + 2 is the same
as 2 + 1. Replace 1 and 2 by a and b and you get the same result.

Yes, but they are still two *different* ways to to get to that result.
Starting with a and adding b to it, is not the same thing as starting
with b and adding a to it. It is only the commutative law of
arithmetic, as any good second grade student can tell you, that
guarantees that the result will be the same. On the other hand, not
all mathematical groups are albelian, and consequently, a + b != b + a
for all mathematical groups.

This might be the case IF we were talking about Matlab/Octave, which are
languages design towards mathematics.

This would be the case of concatenating strings ('Hello ' + 'World! ' <>
'World! ' + 'Hello '), but you should expect that to happen.

C'mon -- all reduce() is is a generalized sum or product. What's
there to think about? It's as intuitive as can be. And taught in
every CS curiculum. What more does one want out of a function?
|>oug
It wasn't obvious for me until later. reduce() is more likely to be
used for optimization. IIRC, some said that optimization is the root
of all evil.

I don't know what you are getting at about "optimization". Reduce()
exists for notational convenience--i.e., for certain tasks it is easer
to read, write, and understand code written using reduce() than it
would be for the corresponding loop--and understanding it is no more
difficult than understanding that a summing function might let you
specify the addition operation that you'd like to use, since that's
all that reduce() is!

Optimization was from somenone else in the thread whom I might
misunderstood.

It's still not obvious. You need to understand 1) whattf does reduce()
do and how, 2) what does the function passed does and 3) what the array
is about.
Just because it's _obvious_ to you, it doesn't mean it's obvious to
people who self taught programming.

It was obvious to me when I was self-taught and I taught myself APL in
high-school. It also seemed obvious enough to all the physicists who
used APL at the lab where I was allowed to hang out to teach myself
APL.

Well, I learned TI-BASIC, C, C++ and then went on to learn Scheme,
Prolog and Python. Most of these during the last 6 years. Have I missed
something because I didn't know how to use reduce()? No. And more
importantly, what is a real life scenario where I should go with
reduce() instead of a loop? From what you've said, the only reason to
use reduce() is to show mere mortals that I know how to use reduce().
Besides that, Martelli have pointed that it provides no advantage what
so ever. And readability suffers because of the high abuse of this function.

--
Andres Rosado

-----BEGIN TF FAN CODE BLOCK-----
G+++ G1 G2+ BW++++ MW++ BM+ Rid+ Arm-- FR+ FW-
#3 D+ ADA N++ W OQP MUSH- BC- CN++ OM P75
-----END TF FAN CODE BLOCK-----

A plague upon all their houses!
-- Scourge (BW)

Jul 18 '05 #126

Rainer Deyke

Douglas Alan wrote:

Q: How do I sum a sequence of numbers?
In three lines:

x = 0
for element in seq:
x += element

In two lines:
A: from operator import add
reduce(add, seq)

Unless seq is empty, in which case you need this:

from operator import add
reduce(add, seq, 0)

In one line:

x = sum(seq)
And that is why 'sum' is a worthwhile part of the Python standard library
and 'reduce' is not: using 'sum' is significantly shorter than using a for
loop, while using 'reduce' is not.
--
Rainer Deyke - ra*****@eldwood.com - http://eldwood.com

Jul 18 '05 #127

Douglas Alan

Alex Martelli <al***@aleax.it> writes:

But to be consistent with your other arguments, no doubt you'd argue
for a .sort() followed by [-1] as "more general" than max...
That would have a larger big O time growth value -- most likely
O(n * log n) vs. O(n), for reasonable implemenations. And while I
wouldn't sweat a factor of 2 for a feature of a RAD or scripting
language, I would be more concerned about moving to a larger big O
value.

Well, perhaps you can explain your confusion to me? What could
possibly be unintuitive about a function that is just like sum(),
yet it allows you to specify the addition operation that you want
to use?

Addition, as you have remarked elsewhere, is intrinsically suitable
for being applied to multiple operands; one is therefore naturally
disposed to think in terms of "add them all up", "sum them all", and
the like, NOT in terms of "iteratively perform total = operator.add(total, item) for all items in the sequence of numbers". But, reduce as it exists
in Python today cannot be clearly defined EXCEPT by such an
iteration with some generic "callable which accepts two arguments"
specified in lieu of the operator.add. Given this total generality,
even though hardly ever does it make sense to USE it, reduce becomes
quite complex.
Your proposed extension to max() and min() has all the same problems.
But reasonable programmers don't abuse this generality, and so there
is little reason for the reasonable programmer to devote any
head-space to it. A programming language should be made to make life
as easy as possible for the reasonable programmer. Not to assume that
all programmers are latent unreasonable programmers and try to force
this urge to be stiffled. Don't take my word for it -- ask Paul
Graham. I believe he was even invited to give the Keynote Address at
a recent PyCon.

A tutorial for new programmers doesn't have to go into any of the
nuances of how one might abuse reduce() because that's not what
tutorials are for. It can merely say that reduce() is for applying a
binary operation such as "+" or "*" to a sequence of numbers, give a
couple examples on how to do it, and leave it at that. If the student
comes to a point where they want or need to know more details, they
can check the reference manual or experiment with it.
You've even argued in this thread, incorrectly, that reduce could be
eliminated if only all binary operators were able to accept arbitrary
numbers of arguments. This, plus your insistence that 'reduce' is
"just like" APL's / (which does NOT accept arbitrary functions on its
left -- just specific operators such as +), indicate a _misconception_
of reduce on your part. I'm sure you don't hold it consciously, but
these miscommunications indicate that even you, reduce's paladin, do
NOT properly grasp reduce "intuitively".
Just what is it that I don't grasp again? I think my position is
clear: I have no intention to abuse reduce(), so I don't worry myself
with ways in which I might be tempted to.
Having (e.g.) add accept a sequence argument (like max does), or, for
explicitness and potentially performance, grow an add.reduce attribute
just like in Numeric, would give no particular problems. I'd still
want (just like Numeric does) to have sum as a name for add.reduce,
because it's by far the most common case
So, now you *do* want multiple obviously right ways to do the same
thing?
and avoids getting into the issue of what the (expletive delete)
does "reducing" have to do with anything that "add.reduce" does
(it's really a curve ball compared with the many meanings of
"reduce" in English, after all).
The English world "reduce" certainly has multiple meanings, but so
does "sum". I can be a noun or a verb, for instance. It can mean
"summary" or "gist" in addition to addition. It also can be confusing
by appearing to be just a synonym for "add". Now people might have
trouble remember what the difference between sum() and add() is.

In Computer Science, however, "reduce" typically only has one meaning
when provided as a function in a language, and programmers might as
well learn that sooner than later.
But, most importantly, such a design would avoid the veritable traps
to which the current, too-general 'reduce' subjects the poor learner
who's quite reasonably going to believe that all that generality
MUST be good for something if it's been designed into a built-in.
We've seen quite a few such inappropriate uses on this thread, after
all.

That's very easy to fix:

FAQ
---
Q. Should I ever pass a function with side effects into reduce() or
map()?

A. No.

(Unless the side-effect is just printing out debugging information,
or saving away statistics, or something like that.)

|>oug

Jul 18 '05 #128

Andrew Dalke

Douglas Alan

Describing reduce() in 10 seconds is utterly trivial to anyone with an
IQ above 100, whether or not they have ever used sum():

"To add a sequence of numbers together:

reduce(add, seq)

To multiply a sequence of numbers together:

reduce(mul, seq) ... Any two-argument function can be used in place of add, mul, sub, or
div and you'll get the appropriate result. Other interesting
examples are left as an exercise for the reader."

This isn't enough to describe what 'reduce' does. For
example, after reading this someone could think that
reduce(add, seq) is a macros which expands to
seq[0] + seq[1] + seq[2] + ...
and that reduce(mul, seq) is the same as
seq[0] * seq[1] * seq[2] + ...

For all the examples you gave, this is correct. However,
reduce(pow, seq) is not the same as
seq[0] ** seq[1] ** seq[2] + ...
because the order of operations is different.

import operator
reduce(operator.pow, [2, 3, 4]) 4096 2**3**4 2417851639229258349412352L (2**3)**4 4096

I would argue that a "better" way to describe 'reduce'
is by defining it through its implementation, as

def reduce(f, seq):
val = seq[0]
for x in seq[1:]:
val = f(val, x)
return val

or by explaining it as
reduce(f, (a, b)) is the same as f(a, b)
reduce(f, (a, b, c)) is the same as f(f(a, b), c)
reduce(f, (a, b, c, d)) is the same as f(f(f(a, b), c), d)
and reduce(f, seq) is the same as
f(...f(f(f(f(seq[0], seq[1]), seq[2]), seq[3]), seq[4]), ....seq[n-1])

As I pointed out, any approach assumes the student knows
what it means to pass a function around. This is not
something that a beginning programmer (the proverbial CS
101 student) necessarily knows or is taught in that course.
(I know that you'll disagree with that statement.)

As others have pointed out, you must at that time explain
when/why 'reduce' is useful, otherwise it is not going to be
remembered, or at best put away on the mental top-shelf
only to be pulled out on rare occasion. For that matter, are
you prepared to answer the question 'why is it called "reduce"?' ?

You've already given an example of when it's useful. You
said that you use it all the time, as in

def longer(x, y):
if len(y) > len(x): return y
else: return x

def longest(seq):
return reduce(longer, seq)

It took me a while but I figured out why it works and
why it's clever. It must have taken so long because of
my substandard CS education. It appears though that
many others in the Python world have suffered from a
poor CS education because there are only six uses of
'reduce' in the top-level of the standard library, (And
several are rather confusing.)

Of those, cvs.py has a tricky way to write
sum( (x[1] for x in items) )
and difflib.py has a way to write
sum( (x[-1] for x in self.get_matching_blocks()) )
and profile.py could also be written as the simpler
def get_time_timer(timer=timer, sum=sum):
return sum(timer())
so 1/2 of those reduce cases are disguised 'sum's.

The other three are in csv.py and look something like
quotechar = reduce(lambda a, b, quotes = quotes:
(quotes[a] > quotes[b]) and a or b,
quotes.keys())
and are identical in intent to your use case -- select the
object from a list which maximizes some f(obj).

This suggests the usefulness of a new function or two,
perhaps in itertools, which applies a function to a list
of values and returns the first object which has the
maximum value, as in

def imax(seq, f = lambda x: x):
seq = iter(seq)
obj = seq.next()
maxobj, maxval = obj, f(obj)
while 1:
try:
obj = seq.next()
except StopIteration:
return maxobj
val = f(obj)
if val > maxval:
maxobj, maxval = obj, val

and an equivalent imin. Note that this is "better" than
your code because f(x) is only computed once per
object or N times total, while you compute it 2*(N-1)
times. In some cases the f(x) may be the most
expensive term in the calculation.

Then your 'longest' could be implemented as

def longest(seq):
return imax(seq, len)

and the example code from csv can be rewritten as
the simpler (IMO, of course):

quotechar = imax(quotes.keys(),
lambda a, quotes = quotes: quotes[a])

I tried looking for places in the std. lib which are better
expressed as a 'reduce' but that was a hard search.
Instead, I looked for places which compute a sum by
using an explicit loop, to see if they are better written in
some alternate form. I looked for places which used the
words 'sum', 'total', or 'tot', as well as places where
there was a '= 0' followed by a '+' within the next few
lines. There were very few, and the paucity suggests
that 'sum' isn't needed all that often. Then again, I'm
not one who suggested that that be a builtin function ;)

In fact, I only found three places.

Here's the two most relevant, from tarfile.py:
chk = 256
for c in buf[:148]: chk += ord(c)
for c in buf[156:]: chk += ord(c)

I wondered what it would look like using some of
the (all too many -- what happened to "should only be
one obvious way"!) variants:

# Using 'reduce'
chk = 256
chk = reduce(lambda x, y: x + ord(y), buf[:148], chk)
chk = reduce(lambda x, y: x + ord(y), buf[156:], chk)

# using the new 'sum' and a map
chk = 256
chk += sum(map(ord, buf[:148])
chk += sum(map(ord, buf[156:])

# using sum and list comprehensions
chk = 256
chk += sum([ord(c) for c in buf[:148])
chk += sum([ord(c) for c in buf[156:])

# using sum and the proposed generator expression
chk = 256
chk += sum( (ord(c) for c in buf[:148]) )
chk += sum( (ord(c) for c in buf[156:]) )

Personally, I think the existing code is the clearest
of the bunch, and the functional forms are too complicated.

In summary:
- I disagree with you that your text is the clearest way
to explain the 'reduce' function, since it is open to an
alternate interpretation and it doesn't help the people who
learn by understanding how things are implemented either
in code or in math.

- The use case you gave suggests that an alternate function,
which I named 'imax' (and 'imin'), is more useful, in part
because it does fewer operations

- In any case, for Python code the 'reduce' function is very
rarely used, so should not be something frequently pulled
from a Pythonista's toolbox and not taught in an intro. CS
course based on Python.

Andrew
da***@dalkescientific.com

Jul 18 '05 #129

Andrew Dalke

Me:

def imax(seq, f = lambda x: x):

The name, btw, is wrong. It should be 'obj_with_max_value'
'maxobj' or somesuch, since 'imax' should return the maximum
value of the iterator, which happens to be identical to what
max does.

The idea is the same.

Andrew
da***@dalkescientific.com

Jul 18 '05 #130

Patrick Maupin

"Terry Reedy" wrote:

"Patrick Maupin" wrote:
(I'm curious, though, why you included "apply" in this list. I've
personally never needed it since the * enhancement to the call

syntax.)

I presume you mean as opposed to delete it altogether. Because apply
can be used with map while * and ** cannot and at least one person has
said he uses that.

That makes sense. It also fully supports the idea of pulling out
some of these features into an iterable-consumer module. Originally,
apply() provided functionality which would have been exceedingly
difficult to code in Python (I'd say "impossible", except someone
would post an example using eval(), or a stupefyingly long example
function which works on any size list up to 10,000 elements, or
whatever... :), but now it's just a one-line helper function for map().

Regards,
Pat

Jul 18 '05 #131

Andrew Dalke

Douglas Alan:

In Computer Science, however, "reduce" typically only has one meaning
when provided as a function in a language, and programmers might as
well learn that sooner than later.
There's also a __reduce__ and __reduce_ex__ in the pickle
protocol. See http://www.python.org/peps/pep-0307.html .
It's by far the most mentioned 'reduce' in the Python standard
library. Does it have anything to do with the 'reduce' function?
Not that I can tell. But I can't find an explaination for that
choice of a name.
FAQ
---
Q. Should I ever pass a function with side effects into reduce() or
map()?

A. No.

(Unless the side-effect is just printing out debugging information,
or saving away statistics, or something like that.)

Or raising an exception on error. And a true functional programmer
would say:

Q: Should I ever pass a function with side effects?
A: No.

;)
Andrew
da***@dalkescientific.com

Jul 18 '05 #132

Ville Vainio

Douglas Alan <ne****@mit.edu> writes:

Well, bah! There are precious few constructs in this world that are
clearer and more readable than

reduce(add, seq)

I asked my non-programmer girlfriend what she thinks your construct
does - she didn't know. She immediately understood what sum(seq) does.

--
Ville Vainio http://www.students.tut.fi/~vainio24

Jul 18 '05 #133

Ville Vainio

Alex Martelli <al***@aleax.it> writes:

EEK -- and have a designed-from-scratch functional module without
the warts. What about starting one as a sourceforge project, as I
mentioned elsewhere?
At least there is

http://sourceforge.net/projects/xoltar-toolkit
[stuff that should remain builtins]
iter, len, pow [for the crucial 3-arguments case], range (or some
preferable variant that returns an iterator), and zip seem pretty
+enumerate
are quite handy for interactive use, such as locals, globals,
dir, vars, ..., are not all that frequently used in programs --
so they might live in a module that the interactive mode gets
automatically, rather than being built-ins.
locals and globals seem to have a natural place in builtins IMHO.
All of this would be perfect for the mailing list on Python 3.0
if the latter existed. Posting it to c.l.py makes it unlikely
Guido will ever consider the discussion's resuts, anyway. The

How about the wiki at python.org?

--
Ville Vainio http://www.students.tut.fi/~vainio24

Jul 18 '05 #134

Ron Adam

On 13 Nov 2003 22:40:35 +0200, Ville Vainio
<vi********************@spamtut.fi> wrote:

Douglas Alan <ne****@mit.edu> writes:

[reduce]
>> If someone can't understand this quickly, then they shouldn't be
>> programming!

> Again, it's not "can't", it's whether they need to or not.

If you don't want to learn a cool concept that will only take you 60
seconds to learn, then you shouldn't be programming! Or you can stick
to loops.

As far as reduce goes, ppl will undoubtedly take a look at the
description, understand it in well under 60 seconds, can't think of
any use for the feature during the next 60 seconds (that wouldn't be
clearer with explicit iteration), and forget it soon after turning the
page. I didn't forget it, just wondered why such an oddball feature
was a builtin. Obviously reduce can rock someone's world, but life is
too short to bother if it doesn't rock yours.
and powerful feature with a slew of specific, tailored features. If
reduce() can be relegated to a library or for the user to implement
for himself, then so can sum(). If the language is to only have one,
it should be reduce().

I also think that reduce, sum, map and filter (and lots of others,
__builtins__ has *too much stuff*) should be removed from builtins,
but that will probably take some time (1997 years?). LC's and genexpes
will take care of most of that stuff. And people can always do:

from funtional import *
# map, filter, reduce, curry, ... (I want lots of these :)

There are also tons of functions that should be in sys, math or
whatever:

reload, repr, divmod, max, min, hash, id, compile, hex...

What's your pet deprecation candidate? I have always thought
`backticks` as repr has got to be the most useless feature around.

I think from a pragmatic point of view, what should be a built in
core, built in module, or in a standard library, should be
determined by how often they are needed to be used or depended on in
programs and modules. You could probably do a statistical annalist
to determine this by searching though the libraries, and a fairly
large library of end use applications.

Moving things closer to the core may increase the efficiency of the
most used items, and moving things further away would simplify and
better organize the language. It's always a trade off to some degree
isn't it?

Also, being in the core is not necessarily the fastest way. Modules
that accessing compiled and optimized dll's can be faster when related
functions are grouped so they can share code and data internally.

_Ron Adam

Jul 18 '05 #135

Alex Martelli

Andrew Dalke wrote:
...

The other three are in csv.py and look something like
quotechar = reduce(lambda a, b, quotes = quotes:
(quotes[a] > quotes[b]) and a or b,
quotes.keys())
and are identical in intent to your use case -- select the
object from a list which maximizes some f(obj).
I have already suggested, in a post on this thread's direct ancestor 9 days
ago, the 100%-equivalent substitution in pure, simple, faster Python:

quotechar = None
for k in quotes:
if not quotechar or quotes[k]>quotes[quotechar]:
quotechar = k

or, for those who consider fast, simple, obviously correct code too boring,

quotechar = max([ (v,k) for k,v in quotes.iteritems() ])[-1]

which is more concise and at least as clever as the original.

This suggests the usefulness of a new function or two,
perhaps in itertools, which applies a function to a list
Nope -- itertools is not about CONSUMERS of iterators, which this one would
be. All itertools entries RETURN iterators.
of values and returns the first object which has the
maximum value, as in
Given that the sort method of lists now has an optional key= argument, I
think the obvious approach would be to add the same optional argument to min
and max with exactly the same semantics. I.e., just as today:

x = max(somelist)
somelist.sort()
assert x == somelist[-1]

we'd also have

x = max(somelist, key=whatever)
somelist.sort(key=whatever)
assert x == somelist[-1]

a perfectly natural extension, it seems to me. I've found such total and
extreme opposition to this perfectly natural extension in private
correspondence about it with another Python committer that I've so far
delayed proposing it on python-dev -- for reasons that escape me it would
appear to be highly controversial rather than perfectly obvious.
def longest(seq):
return imax(seq, len)
That would be max(seq, key=len) in my proposal.
lines. There were very few, and the paucity suggests
that 'sum' isn't needed all that often. Then again, I'm
not one who suggested that that be a builtin function ;)

Right, that was my own responsibility. I did identify about
10 spots in the standard library then (including substitutions
for reduce); that's more than the typical built-in has, even though
the tasks handled by the standard library are heavily slanted
to string and text processing, networking &c, and (of course)
"pure infrastructure", rather than typical application tasks.
Alex

Jul 18 '05 #136

Andrew Dalke

Alex:

I have already suggested, in a post on this thread's direct ancestor 9 days ago, the 100%-equivalent substitution in pure, simple, faster Python:
I've only been skimming this thread. Now upon reread I see you also
looked at the standard library. No fair -- I'm supposed to be the only
one who does that! :)
Nope -- itertools is not about CONSUMERS of iterators, which this one would be. All itertools entries RETURN iterators.
Yup. There's no good, existing, natural place for such a function, that
I can tell.
Given that the sort method of lists now has an optional key= argument, I
think the obvious approach would be to add the same optional argument to min and max with exactly the same semantics. I.e., just as today:
That follows Moshe's comment that a "Decorate Max Undecorate",
in parallel to Decorate Sort Undecorate. Nice.
we'd also have

x = max(somelist, key=whatever)
somelist.sort(key=whatever)
assert x == somelist[-1]
Is 'key' also an iterable? ...
That would be max(seq, key=len) in my proposal.
Ahh, it's the function used to get the keys. What about 'getkey'
as a name?

I looked through the back thread but didn't see your function
definition. Is it backwards compatible to the existing max
function?... You know, I never realized how strange max was
in the first place, so that
max(1, 2)
and
max( (1, 2) )
are both valid. It must assume that if there's only one argument
then the latter form is appropriate. Then your form must say
that if there's only one non-keyword argument then it's of
the form

max(seq, **args)

But
max(a, b, key=len)
seems reasonable as well. Then again, so does
max(seq, len)
a perfectly natural extension, it seems to me. I've found such total and
extreme opposition to this perfectly natural extension in private
correspondence about it with another Python committer that I've so far
delayed proposing it on python-dev -- for reasons that escape me it would
appear to be highly controversial rather than perfectly obvious.
My guess is that reaction is based on the need to have both forms
max(a, b)
and
max(seq)
and allow a key function to be passed in, and make it feel
natural without running into problems when the two might be
mixed up.
even though
the tasks handled by the standard library are heavily slanted
to string and text processing, networking &c, and (of course)
"pure infrastructure", rather than typical application tasks.

Yeah, I worry about that bias error when I do my analysis.
I really should also scan some of the other (proprietary) code
bases I have access to, but then it leads to claims of being
irreproducible. Besides, despite being a scientific programmer,
I mostly do "string and text processing, networking &c."

Andrew
da***@dalkescientific.com

Jul 18 '05 #137

Bengt Richter

On Sun, 16 Nov 2003 15:15:45 GMT, Alex Martelli <al*****@yahoo.com> wrote:

Andrew Dalke wrote:
...
The other three are in csv.py and look something like
quotechar = reduce(lambda a, b, quotes = quotes:
(quotes[a] > quotes[b]) and a or b,
quotes.keys())
and are identical in intent to your use case -- select the
object from a list which maximizes some f(obj). IMO the use cases in csv.py derive from a particular approach to a particular
problem, and don't mean much beyond that, except as a trigger for ideas. I.e., any
ideas have to stand on their own, without reference to the csv use cases.
(BTW I'm guessing one could do better with "for q in data.split(candidate_quote): ..." and
looking at q[0] and q[-1] and maybe q.strip()[0] and q.strip()[-1] as appropriate
to gather the statistics for the two candidate quotes. This would also let you check
for escaped quotes. But this is another rabbit trail here ;-)

I have already suggested, in a post on this thread's direct ancestor 9 days
ago, the 100%-equivalent substitution in pure, simple, faster Python:

quotechar = None
for k in quotes:
if not quotechar or quotes[k]>quotes[quotechar]:
quotechar = k

or, for those who consider fast, simple, obviously correct code too boring,

quotechar = max([ (v,k) for k,v in quotes.iteritems() ])[-1]

which is more concise and at least as clever as the original.

This suggests the usefulness of a new function or two,
perhaps in itertools, which applies a function to a list
Nope -- itertools is not about CONSUMERS of iterators, which this one would
be. All itertools entries RETURN iterators.

Ok, but what about returning an iterator -- e.g., funumerate(f, seq) -- that supplies f(x),x pairs
like enumerate(seq) supplies i,x?

[I'd suggest extending enumerate, but I already want to pass optional range parameters there,
so one could control the numeric values returned, e.g., enumerate(seq,<params>) corresponding to
zip(xrange(<params>),seq))]. [BTW, should xrange() default to xrange(0,sys.maxint,1)?]

of values and returns the first object which has the
maximum value, as in
Given that the sort method of lists now has an optional key= argument, I

This is a new one on me:
seq.sort(key=lambda x:x) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sort() takes no keyword arguments

Do you mean the comparison function? Or is there something else now too?
I'm beginning to infer that key= is actually a keyword arg for a _function_
to get a "key" value from a composite object (in which case ISTM "getkeyvalue" or "valuefunc"
would be a better name). But IMO "key" suggests it will be used on elements x like x[key],
not passing a definition key=whatever and then using key(x) to get values.
think the obvious approach would be to add the same optional argument to min
and max with exactly the same semantics. I.e., just as today:

x = max(somelist)
somelist.sort()
assert x == somelist[-1]

we'd also have

x = max(somelist, key=whatever)
somelist.sort(key=whatever)
assert x == somelist[-1] I think I like it, other than the name. Maybe s/key/valuefunc/ ;-)

a perfectly natural extension, it seems to me. I've found such total and
extreme opposition to this perfectly natural extension in private
correspondence about it with another Python committer that I've so far
delayed proposing it on python-dev -- for reasons that escape me it would
appear to be highly controversial rather than perfectly obvious.
def longest(seq):
return imax(seq, len)
That would be max(seq, key=len) in my proposal.

That's a nice option for max (and min, and ??), but ISTM that it would
also be nice to have a factory for efficient iterators of this kind.
It would probably be pretty efficient then to write

maxlen, maxitem = max(funumerate(len,seq))

or

def longest(seq):
return max(funumerate(len,seq))[-1]

and it would be available as infrastructure for other efficient loops in
addition to being tied in to specific sequence processors like max and min.

Of course we can roll our own:
def funumerate(fun, seq): ... for x in seq: yield fun(x),x
... seq = 'ab cde f gh ijk'.split()
seq ['ab', 'cde', 'f', 'gh', 'ijk'] max(funumerate(len,seq)) (3, 'ijk') min(funumerate(len,seq)) (1, 'f') max(funumerate(len,seq))[-1] 'ijk' longest(seq)

'ijk'
lines. There were very few, and the paucity suggests
that 'sum' isn't needed all that often. Then again, I'm
not one who suggested that that be a builtin function ;)

Right, that was my own responsibility. I did identify about
10 spots in the standard library then (including substitutions
for reduce); that's more than the typical built-in has, even though
the tasks handled by the standard library are heavily slanted
to string and text processing, networking &c, and (of course)
"pure infrastructure", rather than typical application tasks.

Regards,
Bengt Richter

Jul 18 '05 #138

Andrew Dalke

Alex:

Given that the sort method of lists now has an optional key= argument, I
Bengt Richter
This is a new one on me: >>> seq.sort(key=lambda x:x) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sort() takes no keyword arguments
Do you mean the comparison function? Or is there something else now too?
Quoting from Brett Cannon's most excellent
python-dev Summary for 2003-10-01 through 2003-10-15

=============================================
Decorate-sort-undecorate eye for the list.sort guy
--------------------------------------------------
Raymond Hettinger suggested adding built-in support for the
decorate-sort-undecorate (DSU) sorting idiom to list.sort (see the
Python Cookbook recipe at
http://aspn.activestate.com/ASPN/Coo...n/Recipe/52234 which is
recipe 2.3 in the dead tree version or Tim Peters' intro to chapter 2
for details on the idiom). After a long discussion on the technical
merits of various ways to do this, list.sort gained the keyword
arguments 'key' and 'reverse'.

'key' takes in a function that accepts one argument and returns what the
item should be sorted based on. So running ``[(0,0), (0,2),
(0,1)].sort(key=lambda x: x[1])`` will sort the list based on the second
item in each tuple. Technically the sort algorithm just runs the item
it is currently looking at through the function and then handles the
sorting. This avoids having to actually allocate another list. If
'key' and 'cmp' are both provided then 'key' is called on the items to
be sorted and the function's return values are then passed to 'cmp'.

'reverse' does what it sounds like based on whether its argument is true
or false.

list.sort also became guaranteed to be stable (this include 'reverse').

A discussion of whether list.sort should return self came up and was
*very* quickly squashed by Guido. The idea of having a second method,
though, that did sort and returned a copy of the sorted list is still
being considered.

Contributing threads:
`decorate-sort-undecorate
<http://mail.python.org/pipermail/python-dev/2003-October/038652.html>`
`list.sort
<http://mail.python.org/pipermail/python-dev/2003-October/038772.html>`
=============================================
I'm beginning to infer that key= is actually a keyword arg for a _function_ to get a "key" value from a composite object (in which case ISTM "getkeyvalue" or "valuefunc" would be a better name). But IMO "key" suggests it will be used on elements x like x[key], not passing a definition key=whatever and then using key(x) to get values.

I concur. (Ooops, should have said "Me too!" :)

That would be max(seq, key=len) in my proposal.

That's a nice option for max (and min, and ??), but ISTM that it would
also be nice to have a factory for efficient iterators of this kind.
It would probably be pretty efficient then to write

maxlen, maxitem = max(funumerate(len,seq))

With generator expressions this is

maxlen, maxitem = max( ((len(x), x) for x in seq) )

It still has the (slight) problem that it assumes x is comparable
when there are multiple items with the same length. Nearly
all of these quicky versions make that assumption. The
quicky will-work-for-any-item version would look more like

maxlen, pos, maxitem = max( ((len(x), i, x) for i, x in enumerate(seq)) )
Andrew
da***@dalkescientific.com

Jul 18 '05 #139

Paul Rubin

"Andrew Dalke" <ad****@mindspring.com> writes:

list.sort also became guaranteed to be stable (this include 'reverse').
I don't see the reason for that. It seems like a needless restriction
on implementers.
A discussion of whether list.sort should return self came up and was
*very* quickly squashed by Guido. The idea of having a second method,
though, that did sort and returned a copy of the sorted list is still
being considered.

Changing the behavior of list.sort would be really bad, but I
certainly favor adding a new method (maybe list.nsort). There should
also be list.nuniq which takes a sorted list and strips duplicate
elements.

Jul 18 '05 #140

David Eppstein

In article <KL*****************@newsread2.news.pas.earthlink. net>,
"Andrew Dalke" <ad****@mindspring.com> wrote:

That would be max(seq, key=len) in my proposal.

That's a nice option for max (and min, and ??), but ISTM that it would
also be nice to have a factory for efficient iterators of this kind.
It would probably be pretty efficient then to write

maxlen, maxitem = max(funumerate(len,seq))

With generator expressions this is

maxlen, maxitem = max( ((len(x), x) for x in seq) )

It still has the (slight) problem that it assumes x is comparable
when there are multiple items with the same length. Nearly
all of these quicky versions make that assumption. The
quicky will-work-for-any-item version would look more like

maxlen, pos, maxitem = max( ((len(x), i, x) for i, x in enumerate(seq)) )

The new sort(key=...) option works even when the underlying objects are
incomparable. I'd expect the same of max(key=...)

So (supposing such a change ever happens) you could instead write
maxitem = max(seq, key=len)
maxlen = len(maxitem)

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #141

Terry Reedy

"Paul Rubin" <http://ph****@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...

"Andrew Dalke" <ad****@mindspring.com> writes:
list.sort also became guaranteed to be stable (this include
'reverse').
I don't see the reason for that. It seems like a needless restriction on implementers.

Which is why Guido long resisted making such a guarantee, in spite of
many requests. (The previous list.sort was not stable.) However,
timsort is stable, tested for about a year, and so fast for lists both
random and with various types of order, and so close to optimal in
terms of number of comparisons, that Guido was willing to 'annoint' it
as list.sort for the indefinite future. If there were a generally
useful non-stable sort discovered proposed in the future, is could be
added under a different name.

Terry J. Reedy

Jul 18 '05 #142

Paul Rubin

"Terry Reedy" <tj*****@udel.edu> writes:

Which is why Guido long resisted making such a guarantee, in spite of
many requests. (The previous list.sort was not stable.) However,
timsort is stable, tested for about a year, and so fast for lists both
random and with various types of order, and so close to optimal in
terms of number of comparisons, that Guido was willing to 'annoint' it
as list.sort for the indefinite future. If there were a generally
useful non-stable sort discovered proposed in the future, is could be
added under a different name.

I think that decision is a little too CPython-centric. Some other
Python implementation (maybe even Jython) might like to implement the
list.sort method by calling some other sort routine (like the one
already in the library for that language) rather than by porting
timsort, even if timsort is better. Also, someone might want to
implement an indexable class that isn't a list (maybe it's a disk file
or something) and want to give it a sort method that cannot work by
sucking the keys into memory and calling timsort (maybe there are too
many keys to fit in memory, so external methods must be used). It may
turn out to work best to use some nonstable sorting method, but then
the behavior will be inconsistent with list.sort and that makes more
cruft that the application programmer has to remember.

Most sorting applications don't care about stability. I think if a
new sorting method is going to get added so there's separate methods
for guaranteed-stable and possibly-nonstable sorting, it's best to let
the new method be the stable one (maybe list.ssort) and leave the
existing one alone.

Of course then you want variants that return self instead of None, so
you have list.sort, list.ssort, list.nsort, and list.nssort. It gets
out of hand.

Maybe the best way to do this kind of thing is with keyword arguments
rather than new methods.

Jul 18 '05 #143

Andrew Dalke

Paul Rubin:

"Andrew Dalke" <ad****@mindspring.com> writes: [actually, I was quoting from the python-dev summary of Brett C's.

Paul: Changing the behavior of list.sort would be really bad, but I
certainly favor adding a new method (maybe list.nsort). There should
also be list.nuniq which takes a sorted list and strips duplicate
elements.

Do you want unix-style unique where repeats are merged into
one, so that 1 2 2 1 -> 1 2 1 or do you want it to return
items which only occur once, and you don't care about the
order (as in dict.from_keys([1, 2, 2, 1]).keys() which can
return [1, 2] or [2, 1]) or do you want the order of the keys
in the final list to maintain the same order as the initial list,
so that [2, 1, 1, 2] -> [2, 1] always?

Andrew
da***@dalkescientific.com

Jul 18 '05 #144

Andrew Dalke

Paul Rubin:

I think that decision is a little too CPython-centric. Some other
Python implementation (maybe even Jython) might like to implement the
list.sort method by calling some other sort routine (like the one
already in the library for that language)
That's not a problem with Jython, since Java has a stable sort, see
http://java.sun.com/j2se/1.4.2/docs/...html#sort(java
..util.List)

There was a OCaml implemenation of a Python variant, called
Vyper. OCaml has a stable sort
http://caml.inria.fr/devtools/doc_oc...sic/Array.html

C++'s STL include a stable_sort
http://www.codeguru.com/cpp/stlguide/stable_sort.shtml

C# doesn't appear to have a stable sort.

WWPyPyD? (What Would PyPy do?) I don't know.
Also, someone might want to
implement an indexable class that isn't a list (maybe it's a disk file
or something) and want to give it a sort method that cannot work by
sucking the keys into memory and calling timsort (maybe there are too
many keys to fit in memory, so external methods must be used).
There are a couple of misconceptions here:

- sort is a method on lists. It only works on lists. Unlike STL,
which distinguishes between a container and an algorithm, there
is no way to apply sort directly to anything which isn't a list. Your
case (using sort on "a disk file or something") cannot occur.

- All the keys must fit in memory. This is a consequence of being
a list. (The keys may depend on other resource to do the
comparison.) C++ is different in this regard as well.

- The sort only rearranges references so there's no way to use
sort to directly sort the values on disk as you could in C++.
It may
turn out to work best to use some nonstable sorting method, but then
the behavior will be inconsistent with list.sort and that makes more
cruft that the application programmer has to remember.
It may have, but now with the restriction on what the sort must do
there's been a redefinition of what 'best' means for Python. No
implementation of Python is now allowed to do so, so the application
programmer won't have to remember it. All it does it make it
slightly harder for a C# programmer to write a Python implementation.
And I assume there's plenty of 3rd party libraries which can help out.
Most sorting applications don't care about stability.
Really? I prefer my sorts to be stable since it best agrees
with what a person would do, assuming infinite patience.
It's nice because it's well defined and platform independent --
which is important because I've seen bugs crop up in some
programs which assumed a sort (3C) under IRIX will give
the same order as under Solaris -- C's sort is not stable.
I think if a
new sorting method is going to get added so there's separate methods
for guaranteed-stable and possibly-nonstable sorting, it's best to let
the new method be the stable one (maybe list.ssort) and leave the
existing one alone.

Humbug. You want an extra method (and possibly a couple of
independent algorithms which can be used on your disk-based
but list-like data structures) which for the most widely used version
of Python (CPython) will not do anything different and for which
the second most widely used version (Jython) is a trivial change,
all so that *someday* someone who implements a Python in a
language which doesn't have a fast stable search doesn't have
to write one (copying out of any standard text book), find a
3rd party version, or translate one?

Why not worry about something more complex, like regular
expressions. CPython and Jython almost certainly have
differences in edge cases, and what will the version of Python
written in Prolog use?

Andrew
da***@dalkescientific.com

Jul 18 '05 #145

Magnus Lie Hetland

In article <61*****************@newsread2.news.pas.earthlink. net>,
Andrew Dalke wrote:
[snip]

- sort is a method on lists. It only works on lists. Unlike STL,
which distinguishes between a container and an algorithm, there
is no way to apply sort directly to anything which isn't a list. Your
case (using sort on "a disk file or something") cannot occur.

Surely he was talking about implementing "list-alikes"...? I.e.
sequences polymorphically equivalent to lists, or at least wrt.
sequence-ness and sorting...? The stability requirement does make this
sort of thing a tad more difficult. (Not that I find it very
problematic; I'm just trying to interpret the meaning of the post.)

--
Magnus Lie Hetland "In this house we obey the laws of
http://hetland.org thermodynamics!" Homer Simpson

Jul 18 '05 #146

SUZUKI Hisao

All in all, I agree with you. You have pointed out precisely
what has been making Python more attractive than, say, Perl.

In <lc************@gaffa.mit.edu>, you, Douglas Alan, wrote:

The argument that some programmers might be too lazy to want to learn
powerful, simple, and elegant features that can be taught in seconds,
is no good reason to remove such features from Python and bloat Python
by replacing them with a plethora of less powerful, less elegant
features.
In <lc************@gaffa.mit.edu>, you also wrote: I like Python because it is a very simple, generic, easy-to-learn
programming language, where I didn't have to learn much of anything
new. It was just like all the other myriad of programming
languages I have programmed in, only less cluttered than most, and
"light-weight" in its implementation and accessibility.
So do I.

As to sum(), when learning string addition (concatenation),
one may wonder why sum() does not handle it:

sum(['a', 'b', 'c']) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'str'

while general reduce() does it just as _expected_:
reduce(str.__add__, ['a', 'b', 'c'])

'abc'

It may be sum() that is more difficult to learn...

For this particular problem, it is better to use
''.join(['a', 'b', 'c']), you know.
However, it is important for Python to have an easy and generic
way to do things. If we have to read the manual through to solve
anything, what is the point to use Python instead of Perl (or Ruby,
to some extent)?
I having nothing against learning new stuff, by the way, but when I
want to learn new stuff, I'm not particularly interested in the
idiosyncrasies of Python -- I'd spend my effort learning something a
bit more exotic, like OCaml, or Haskell, that might stretch my brain a
bit. Learning a bunch of idiosyncratic detail is just tedious. This
is one of the most important reasons that Perl sucks so badly.
Again in <lc************@gaffa.mit.edu> And as I have pointed out, it goes against the principle of simplicity
and expressiveness to remove an easy to use and easy to learn simple
and powerful feature with a slew of specific, tailored features. If
reduce() can be relegated to a library or for the user to implement
for himself, then so can sum(). If the language is to only have one,
it should be reduce().

I agree with you.

However, It may be better to give reduce() some nice notation.
Someone (sorry, but I do not remember) suggested such one:
[s: s+c for c in ['a', 'b', 'c']]
or
[s='': s+c for c in ['a', 'b', 'c']]
though it may not be the nicest.

-- SUZUKI Hisao

Jul 18 '05 #147

Andrew Dalke

Magnus Lie Hetland:

Surely he was talking about implementing "list-alikes"...?

Yes, I think you're right about that, and I misinterpreted
his statement.

That being the case, an alternative is to have that 'sort'
implements the Python required semantics, and that
'fsort' or somesuch ('f' for 'fast'?) implement the
appropriate data-structure specific one.

Then again, is the change that "all Python list-alikes must
implement stable sort" or that "Python native lists shall
now and forever implement stable sort"?

The official statement from the BDFL is
http://mail.python.org/pipermail/pyt...er/038773.html
] OK, I pronounce on this: Python's list.sort() shall be stable.

That's a statement only that list.sort shall be stable and
not that all .sort() methods must be stable.

BTW, there was a *HUGH* amount of discussion about
sort and stability and keys and DSU on the python-dev list.
See the archive
http://mail.python.org/pipermail/pyt...ead.html#38772
and look for "decorate-sort-undecorate" as well as subjects
with the word 'sort' in them.

Andrew
da***@dalkescientific.com

Jul 18 '05 #148

Ville Vainio

SUZUKI Hisao <su*******@oki.com> writes:

However, It may be better to give reduce() some nice notation.

I think the idea is to hide the thing somewhere far away from
builtins, not to *increase* its prominence in the language by
introducing new syntax.

--
Ville Vainio http://www.students.tut.fi/~vainio24

Jul 18 '05 #149

Alex Martelli

SUZUKI Hisao wrote:
...

As to sum(), when learning string addition (concatenation),
one may wonder why sum() does not handle it:
I originally had sum try to detect the "summing strings" case and
delegate under the covers to ''.join -- Guido vetoed that as too
"clever" (which has a BAD connotation in Python) and had me forbid
the "summing strings" case instead, for simplicity.

>>> sum(['a', 'b', 'c']) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'str'

whereupon, one hopes, the user checks sum's doc:

print sum.__doc__ sum(sequence, start=0) -> value

Returns the sum of a sequence of numbers (NOT strings) plus the value
of parameter 'start'. When the sequence is empty, returns start.
and if one tries anyway:
sum(['a','b','c'],'')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]

the error message should direct the user to the proper way to sum
strings. Having more than one "obvious way to do it" was avoided.

while general reduce() does it just as _expected_:
>>> reduce(str.__add__, ['a', 'b', 'c'])
'abc'

Well, if what you "expect" is the following performance...:

[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,range(999))'
'reduce(str.__add__, x)'
1000 loops, best of 3: 1.82e+03 usec per loop
[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,range(999))' "''.join(x)"
10000 loops, best of 3: 68 usec per loop

i.e., a slowdown by about 2700% for a 999-items sequence,

[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,range(1999))'
'reduce(str.__add__, x)'
100 loops, best of 3: 5e+03 usec per loop
[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,range(1999))' "''.join(x)"
10000 loops, best of 3: 143 usec per loop

growing to 3500% for a 1999-items sequence, and so on without bounds,
then no doubt ``reduce does it just as expected'' by YOU.

Most people, however, EXPECT sensible performance, not slow-downs by
factors of tens or hundreds of times, when they use constructs that are
considered "normal and supported" in the language and its built-ins.

This makes reduce a terrible performance trap just waiting to catch the
unwary. It SEEMS to work all right, but in fact it's doing nothing of
the kind, nor can it -- it's defined to iteratively run N repetitions
of whatever function you pass as the first argument, therefore it can
never have O(N) performance when used to add up a sequence of strings,
but always, necessarily O(N squared). It's _conceivable_ (although it
currently appears unlikely that Guido will ever countenance it) that
'sum' can be specialcased (to use faster approaches to summation when it
is dealing with sequences, not numbers) to give the O(N) performance
most people (sensibly) DO expect; that just depends on loosening its
current specs, while maintaining the concept of "sum of a sequence".
No such avenue is open for 'reduce': it will always be a terrible
performance trap just waiting to pounce on the unwary.

It may be sum() that is more difficult to learn...
I have enough experience teaching both built-in functions, by now,
that I can rule this hypothesis out entirely.

For this particular problem, it is better to use
''.join(['a', 'b', 'c']), you know.
Yes, and sum's error messages tells you so, so, if you DON'T know,
you learn immediately.
However, it is important for Python to have an easy and generic
way to do things. If we have to read the manual through to solve
anything, what is the point to use Python instead of Perl (or Ruby,
to some extent)?
Why do you need to read the manual after getting an error message
that tells you to """ use ''.join(seq) instead """? As for the
importance of "easy and generic", I would agree -- I'd FAR rather
have 'sum' be able to handle _well_ sums of any kinds -- but Guido
doesn't, so far. If you have arguments that might convince him, make
then. But 'reduce' just isn't a candidate, as the "easy" part simply
cannot apply.

However, It may be better to give reduce() some nice notation.

"Syntax sugar causes cancer of the semicolon". The nicer the
notation, the more superficially attractive you make it to use
a construct with unacceptable performance implications, the
craziest and most terrible the performance trap you're building
for the poor unwary users. I think it's an appalling direction
to want to move the language in.
Alex

Jul 18 '05 #150

Similar topics