469,266 Members | 2,032 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,266 developers. It's quick & easy.

reduce() anomaly?

This seems like it ought to work, according to the
description of reduce(), but it doesn't. Is this
a bug, or am I missing something?

Python 2.3.2 (#1, Oct 20 2003, 01:04:35)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
d1 = {'a':1}
d2 = {'b':2}
d3 = {'c':3}
l = [d1, d2, d3]
d4 = reduce(lambda x, y: x.update(y), l) Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update' d4 = reduce(lambda x, y: x.update(y), l, {})

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update'

- Steve.
Jul 18 '05
226 10940
On 19 Nov 2003 17:20:10 -0800, Paul Rubin wrote:
"A.M. Kuchling" <am*@amk.ca> writes:
[the Unix command] uniq doesn't report an error if its input isn't
sorted.


Maybe it should. If its behavior on unsorted input isn't specified,
you shouldn't assume it will act in any particular way.


The specification on the man page for GNU uniq seems clear on this:

DESCRIPTION
Discard all but one of successive identical lines from INPUT
(or standard input), writing to OUTPUT (or standard output).

It doesn't care if the input is sorted or unsorted; it describes
behaviour on successive identical lines, not the total set of all lines.

--
\ "The Bermuda Triangle got tired of warm weather. It moved to |
`\ Alaska. Now Santa Claus is missing." -- Steven Wright |
_o__) |
Ben Finney <http://bignose.squidly.org/>
Jul 18 '05 #201
Aahz wrote:
Huh?!?! uniq has always to my knowledge only worked on sorted input.
Reading the man page on two different systems confirms my knowledge.


uniq doesn't care whether the input is sorted or not. All it does is
collapse multiple consecutive duplicate lines into a single line. Using
uniq in conjunction with sort is certainly a common mode, but it's
hardly required.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
__ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
/ \
\__/ There are no dull subjects. There are only dull writers.
-- H.L. Mencken
Jul 18 '05 #202
Ben Finney <bi****************@and-benfinney-does-too.id.au> writes:
The specification on the man page for GNU uniq seems clear on this:

DESCRIPTION
Discard all but one of successive identical lines from INPUT
(or standard input), writing to OUTPUT (or standard output).

It doesn't care if the input is sorted or unsorted; it describes
behaviour on successive identical lines, not the total set of all lines.


OK, that's reasonable behavior.
Jul 18 '05 #203
In article <sl*******************************@iris.polar.loca l>,
Ben Finney <bi****************@and-benfinney-does-too.id.au> wrote:
On 19 Nov 2003 17:20:10 -0800, Paul Rubin wrote:
"A.M. Kuchling" <am*@amk.ca> writes:
[the Unix command] uniq doesn't report an error if its input isn't
sorted.


Maybe it should. If its behavior on unsorted input isn't specified,
you shouldn't assume it will act in any particular way.


The specification on the man page for GNU uniq seems clear on this:

DESCRIPTION
Discard all but one of successive identical lines from INPUT
(or standard input), writing to OUTPUT (or standard output).

It doesn't care if the input is sorted or unsorted; it describes
behaviour on successive identical lines, not the total set of all lines.


It doesn't care if the input is sorted ascending or
descending, and it doesn't go to a lot of extra trouble to
check whether the order is monotonic.

Regards. Mel.
Jul 18 '05 #204
Me:
And I've used what can now be written as

unique_keys = dict.from_keys(lst).keys()

Paul Rubin: This from_keys operation must be something new, and consing up a
dictionary like that is a woeful amount of overhead. But I guess
it would do the job.


Hmm, I misspelled it -- should be 'fromkeys' without the underscore.
Got confused with 'has_key'....

It's a new class method for dict

fromkeys(...)
dict.fromkeys(S[,v]) -> New dict with keys from S and values equal to v.
v defaults to None.

I said 'what can now be written as' meaning that before the
class method I would write it as

d = {}
for x in lst:
d[x] = 1
unique_keys = d.keys()

I don't know what you mean by 'consing up a dictionary'
taking up a 'woeful amount of overhead'. It's a bit of overhead,
but not woefully so. (And what does 'consing' mean? It's
a Lisp thing, yes? A 'cons cell' is the first / rest pair?)

BTW, another option is

unique_keys = list(sets.Set(lst))

but I still haven't internalized the sets module enough to
remember it trippingly.

For the case where I want a list of unique keys and where
I don't care about the resulting order then either of these
should work nicely.

Andrew
da***@dalkescientific.com
Jul 18 '05 #205
"Andrew Dalke" <ad****@mindspring.com> writes:
I don't know what you mean by 'consing up a dictionary'
taking up a 'woeful amount of overhead'. It's a bit of overhead,
but not woefully so. (And what does 'consing' mean? It's
a Lisp thing, yes? A 'cons cell' is the first / rest pair?)
Sorry, yeah, Lisp jargon. A cons cell is a pair but "consing" has a
more generalized informal meaning of allocating any kind of storage on
the heap, which will have probably to be garbage collected later.
Consing up an object means building it up dynamically.

The way I usually uniq a list goes something like (untested):

def uniq(list):
p = 0
for i in xrange(1, len(list)):
if list[i] != list[p]:
p += 1
list[p] = list[i]
del list[p+1:]

So it just scans through the list once and then does a single del
operation.
BTW, another option is

unique_keys = list(sets.Set(lst))

but I still haven't internalized the sets module enough to
remember it trippingly.


Yes, more consing :)
Jul 18 '05 #206

"Paul Rubin" <http://ph****@NOSPAM.invalid> wrote in message
news:7x************@ruckus.brouhaha.com...
Sorry, yeah, Lisp jargon. A cons cell is a pair but "consing" has a
more generalized informal meaning of allocating any kind of storage on the heap, which will have probably to be garbage collected later.
Consing up an object means building it up dynamically.
This 'generalized informal' meaning is new to me also ;-).
The way I usually uniq a list goes something like (untested):

def uniq(list):
p = 0
for i in xrange(1, len(list)):
if list[i] != list[p]:
p += 1
list[p] = list[i]
del list[p+1:]

So it just scans through the list once and then does a single del
operation.


I believe this requires list to be sorted, generally O(nlogn) if not
already so, while hash solution (sets, dict) is O(n) (+O(n) temporary
space). So either method can be 'best' in particular situation.

Terry J. Reedy
Jul 18 '05 #207
Erik Max Francis <ma*@alcyone.com> writes:
Aahz wrote:
Huh?!?! uniq has always to my knowledge only worked on sorted input.
Reading the man page on two different systems confirms my knowledge.

uniq doesn't care whether the input is sorted or not. All it does is
collapse multiple consecutive duplicate lines into a single line. Using
uniq in conjunction with sort is certainly a common mode, but it's
hardly required.


curty@einstein:~$ less uniq.txt
flirty
curty
flirty
curty

curty@einstein:~$ uniq uniq.txt
flirty
curty
flirty
curty

curty@einstein:~$ sort uniq.txt | uniq
curty
flirty

Maybe my uniq is unique.

curty@einstein:~$ man uniq

NAME
uniq - remove duplicate lines from a sorted file
******

Jul 18 '05 #208
Erik Max Francis <ma*@alcyone.com> writes:
The tweak I made to your sample file wasn't sorted. It just had two
identical adjacent lines. The modified sample again was: max@oxygen:~/tmp% cat > uniq.txt
flirty
curty
curty
flirty
^D
max@oxygen:~/tmp% uniq uniq.txt
flirty
curty
flirty You don't really think the sequence [flirty, curty, curty, flirty] is
sorted, do you?


Well, you did do _something_ to the sample for which you fail to find
a more descriptive word than "tweak". I certainly do think that the
proper word for the modified sample is "sorted"; yes, you sorted the
file on the word "curty", by which I mean that you performed "an
operation that segregates items into groups according to a specified
criterion" (WordNet). You segregated the item "curty" into a group
and therefore sorted the file by what we will now refer to as the
"curty criterion". That you didn't apply the "flirty criterion" to
your sort (or simply apply an alphabetical criterion) does not
demonstrate anything other than your cleverly disguised reticence to
employ the "sort" command. ;-)
Jul 18 '05 #209
On 21 Nov 2003 12:46:44 +0100, Curt wrote:
Erik Max Francis <ma*@alcyone.com> writes:
You don't really think the sequence [flirty, curty, curty, flirty] is
sorted, do you?
Well, you did do _something_ to the sample for which you fail to find
a more descriptive word than "tweak".


He contrived an example that demonstrated his point. You seem to be
fascinated with finding some definition of "sort" that can be bent to
this.
I certainly do think that the
proper word for the modified sample is "sorted"; yes, you sorted the
file on the word "curty", by which I mean that you performed "an
operation that segregates items into groups according to a specified
criterion" (WordNet).


This is ridiculous.

What makes you think he applied "the curty criterion", presuming there
can be some meaningful definition of that? Why could he not, perhaps,
have "sorted" it based on the arrangement of monitor toys he could see?
Or on the output of /dev/random ?

Are you saying that *any* list which has had *any* concievable criterion
applied to its generation, must therefore have been "sorted"?
None of this is the case.

The contrived example showed that 'uniq' *does* have an effect on a list
which has not been sorted (i.e. not sorted into the default order one
would assume when hearing the term "sorted", i.e. alphanumeric
ascending).

Writhing to attempt to force the term "sorted" to apply to an unsorted
list is rather psychotic.

--
\ "Any sufficiently advanced bug is indistinguishable from a |
`\ feature." -- Rich Kulawiec |
_o__) |
Ben Finney <http://bignose.squidly.org/>
Jul 18 '05 #210
"Terry Reedy" <tj*****@udel.edu> wrote in message news:<sI********************@comcast.com>...
The above was a minimal 'concept' proposal to test the aesthetics of
something structurally different from current 'lambda's. I think I
would make all identifiers params by default, since I believe this to
be more common, and 'tag' non-locals, perhaps with one of the
reserved, as yet unused symbols. Example: lambda x: x + y == `x + @y`
or `x+y@`. Since expressions cannot assign, no global declaration is
needed.


A pretty simplistic way to do something similar would be this alias:

@ <=> lambda X=None, Y=None, Z=None:

Thusly:

seq.sort(lambda x, y: cmp(y, x)) =>
seq.sort(@cmp(Y, X))

map(lambda x: x[::-1], seq) =>
map(@X[::-1], seq)

It could replace lambda for most cases. Half of the
function definition wouldn't be "useless trash" anymore,
and redundancy in describing the parameter names would
go away (lambdas don't usually need describing parameter
names).

But @ isn't exactly describing symbol (though "lambda" or "def"
don't scream out "Here's a function definition!" either)
and the 3 implicit arguments don't really fit with
Python's explicity philosophy.... And from what
I've gathered, Guido would rather remove lambda than
introduce syntax that encourages small anonymous functions
(in a weird way to boot). Oh well.
Jul 18 '05 #211
Ben Finney <bi****************@and-benfinney-does-too.id.au> writes:
On 21 Nov 2003 12:46:44 +0100, Curt wrote:
Erik Max Francis <ma*@alcyone.com> writes:
You don't really think the sequence [flirty, curty, curty, flirty] is
sorted, do you?
Well, you did do _something_ to the sample for which you fail to find
a more descriptive word than "tweak".
He contrived an example that demonstrated his point. You seem to be
No, he didn't contrive an example. Please don't invent things. He tooked
my perfectly good and reasonable example of a file containing redundant
entries and "tweaked" it in order to make the entries of type "curty"
contiguous.
fascinated with finding some definition of "sort" that can be bent to
this.
I did not bend the definition of sort--I got it out of the WordNet dictionary
and quoted one of its senses directly.
I certainly do think that the
proper word for the modified sample is "sorted"; yes, you sorted the
file on the word "curty", by which I mean that you performed "an
operation that segregates items into groups according to a specified
criterion" (WordNet).


This is ridiculous.


No, it isn't.
What makes you think he applied "the curty criterion", presuming there
can be some meaningful definition of that? Why could he not, perhaps,
He grouped the "curty" entries in my sample, is what makes me think he
applied the "curty criterion", and this a perfectly meaningful definition
of the "curty criterion", to wit: "take the sample and change the order
of the lines so that the items of type "curty" are grouped". The fact
that he grouped the latter in the middle of the list changes nothing. That
was a conscious decision on his part; he could have put the curty items at
the top, or at the bottom, or at any position in the list of his choosing,
if the list was long enough.
have "sorted" it based on the arrangement of monitor toys he could see?
Or on the output of /dev/random ? Are you saying that *any* list which has had *any* concievable criterion
applied to its generation, must therefore have been "sorted"?


I'm saying exactly what I said, i.e. that any arbitrary list upon which
one performs an operation which segregates items into groups according
to a specified criterion is sorted. If you have a list of cows and
cats and dogs, and perform an operation on said list which groups all
the cows together, you have sorted that list "by cows". This appears
to me to be a standard definition; I have yet to see an argument from
you that would dissuade me from believing this to be true, but I would
love to see it if it exists.


Jul 18 '05 #212
Curt wrote:
I'm saying exactly what I said, i.e. that any arbitrary
list upon which one performs an operation which
segregates items into groups according to a specified
criterion is sorted. If you have a list of cows and cats
and dogs, and perform an operation on said list which
groups all the cows together, you have sorted that list
"by cows". This appears to me to be a standard
definition; I have yet to see an argument from you
that would dissuade me from believing this to be
true, but I would love to see it if it exists.


Curt, you have a point that in conventional English usage, "sorted" does not
necessarily mean "ordered". For example, if you hand me a bag of mixed fruit
and ask me to sort the fruit, I won't need to ask you what order to put them
in. I will merely separate out the apples, oranges, bananas, pears, and what
have you.

But you must realize that in the software world, when someone says "sorted"
they usually do mean "sorted and ordered". That's why people have been
having trouble understanding your point.

Even so, you are quite mistaken about the uniq command. It really has
nothing to do with the input being sorted at all. Consider this file:

up
down
up
up
down
down
down
up
up
up
down
down

This file is not sorted by any definition of the word. Agreed? Now run uniq
on it and you'll get this result:

up
down
up
down
up
down

As you can see, uniq simply removes consecutive duplicate lines, nothing
more and nothing less. It does this whether the file is ordered, sorted, or
unsorted.

-Mike
Jul 18 '05 #213
Curt wrote:
Well, you did do _something_ to the sample for which you fail to find
a more descriptive word than "tweak". I certainly do think that the
proper word for the modified sample is "sorted"; yes, you sorted the
file on the word "curty", by which I mean that you performed "an
operation that segregates items into groups according to a specified
criterion" (WordNet).


It seems at this point you're conceding that the file is not globally
sorted, and so retracting your original claim. If uniq does something
meaningful and predictable, according to its documentation, on
non-sorted text, then obviously it does not require sorted input as you
once claimed. All despite your goalpost shifting.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
__ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
/ \
\__/ Golf is a good walk spoiled.
-- Mark Twain
Jul 18 '05 #214
Curt wrote:
No, he didn't contrive an example. Please don't invent things.


I posted another example, totally unrelated to your flirty/curty
nonsense, that demonstrated that uniq could do something meaningful with
a totally unsorted file. Please read things.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
__ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
/ \
\__/ Golf is a good walk spoiled.
-- Mark Twain
Jul 18 '05 #215
On Fri, 21 Nov 2003 13:52:45 -0800, Erik Max Francis wrote:
Curt wrote:
Well, you did do _something_ to the sample for which you fail to find
a more descriptive word than "tweak". I certainly do think that the
proper word for the modified sample is "sorted"; yes, you sorted the
file on the word "curty", by which I mean that you performed "an
operation that segregates items into groups according to a specified
criterion" (WordNet).

It seems at this point you're conceding that the file is not globally
sorted, and so retracting your original claim. If uniq does something
My original demonstration removed _all_ the duplicates. To remove all
the duplicates from my sample with "uniq", it must be sorted in either
ascending or descending alphabetical order.

My sample file in its modified state is not "globally sorted", but it is
partially sorted (the one implies the other, I'm afraid), and "uniq" worked
solely on that part of the file which you "tweaked" in this way. Seems okay
to me!
meaningful and predictable, according to its documentation, on
non-sorted text, then obviously it does not require sorted input as you
once claimed. All despite your goalpost shifting.


I haven't shifted goalposts, whatever that means. You shifted the
emphasis of my example by removing only one of the duplicates.

I concede that "uniq" does something "meaningful" on unsorted input, if
that input contains identical, successive lines. I do not concede,
however, that you may never be required to sort its input in order to
arrive at that state, which is exactly what my demonstration "proved".

If I've just shifted goalposts, well, you can all shoot me at dawn, if
you feel it's a capital offense. Don't worry, we'll give you the real
bullet.

Jul 18 '05 #216
On Fri, 21 Nov 2003 13:54:03 -0800, Erik Max Francis wrote:
Curt wrote:
No, he didn't contrive an example. Please don't invent things.

I posted another example, totally unrelated to your flirty/curty
Oh yes, you did. Was that the contrived example he was referring to?

I actually thought he was alluding to your contrived example which was
a variation on the theme of my contrived example. Why did I think that?

Let's go back to the context of the whole shebang which you've cut.

He quotes you, then me, then speaks out himself:

You:
"You don't really think the sequence [flirty, curty, curty, flirty]
is sorted, do you?"

Me:
"Well, you did do something to the sample for which you fail to find
a more descriptive word than "tweak"."

Him:
"He contrived an example that demonstrated his point."

Then I say the following thing, which you truncated:

"No, he didn't contrive an example. Please don't invent things. He
tooked my perfectly good and reasonable example of a file containing
redundant entries and "tweaked" it in order to make the entries of type
"curty" contiguous.

Well, the whole thing is not as clearly a case of bad reading as you say
it is.
nonsense, that demonstrated that uniq could do something meaningful with
a totally unsorted file. Please read things.

Jul 18 '05 #217
Curt wrote:
I haven't shifted goalposts, whatever that means. You shifted the
emphasis of my example by removing only one of the duplicates.

I concede that "uniq" does something "meaningful" on unsorted
input, if that input contains identical, successive lines. I do not
concede, however, that you may never be required to sort its
input in order to arrive at that state, which is exactly what my
demonstration "proved".

If I've just shifted goalposts, well, you can all shoot me at dawn,
if you feel it's a capital offense. Don't worry, we'll give you the
real bullet.


Curt...

It is unfortunate that both the program name and the one-line description of
uniq are misleading:

uniq - remove duplicate lines from a sorted file

Well, yes: If a file is sorted, then uniq removes all duplicate lines, and
the resulting file contains only unique lines (thus the name 'uniq').

But the fact remains that what uniq actually does has nothing to do with
sorting. It removes consecutive duplicate lines. That's all.

If I flip a coin 100 times and record the results, will there be consecutive
"heads" or "tails"? Most likely. Is that file sorted? Of course not--it's
completely random. Will uniq remove the consecutive duplicates? Yes.

It's really that simple.

-Mike
Jul 18 '05 #218
On Thu, 20 Nov 2003 17:18:27 -0800, Erik Max Francis <ma*@alcyone.com> wrote:
Ben Finney wrote:
Sadly, the 'NAME' section does:


Yes, I quoted that earlier and commented on it. It's not universal in
uniq man pages. The Solaris 8 man page, for instance, doesn't have this
bug.

Just looked at the version that comes with msys:

----
[11:26] ~>uniq --help
Usage: uniq [OPTION]... [INPUT [OUTPUT]]
Discard all but one of successive identical lines from INPUT (or
standard input), writing to OUTPUT (or standard output).

-c, --count prefix lines by the number of occurrences
-d, --repeated only print duplicate lines
-D, --all-repeated print all duplicate lines
-f, --skip-fields=N avoid comparing the first N fields
-i, --ignore-case ignore differences in case when comparing
-s, --skip-chars=N avoid comparing the first N characters
-u, --unique only print unique lines
-w, --check-chars=N compare no more than N characters in lines
-N same as -f N
+N same as -s N
--help display this help and exit
--version output version information and exit

A field is a run of whitespace, then non-whitespace characters.
Fields are skipped before chars.

Report bugs to <bu***********@gnu.org>.
----

Where in the past have I seen an option for writing line repetions as a count message? I.e.,

hoo
hee
hee
haw
haw
haw
hoopla

becomes

hoo
hee
*** above line repeated 1 more time ***
haw
*** above line repeated 2 more times ***
hoopla

(Useful for collapsing error log content
such as a traceback from exceeding recursion depth ;-)

Whether to make a message for a single repeat or just allow two-in-row might be nice
to differentiate in e.g., a -g vs -G option.

Regards,
Bengt Richter
Jul 18 '05 #219
Curt wrote:
Oh yes, you did. Was that the contrived example he was referring to?

I actually thought he was alluding to your contrived example which was
a variation on the theme of my contrived example. Why did I think
that?

Let's go back to the context of the whole shebang which you've cut.


This is exceptionally tiring. I don't really care what the other poster
said. You're concentrating on the one example I gave that you insist
involves sorting (which does not involve sorting the file, it involves
rearranging two lines which still leave the file in an unsorted state).

The fact is I posted an entirely different example -- which you have
completely ignored -- which demonstrated the functionality of uniq as
document and has absolutely, unequivocally, undeniably, nothing at all
to do with sorting.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
__ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
/ \
\__/ I'm trying to forget / But I can't act as if we never met
-- Chante Moore
Jul 18 '05 #220
Erik Max Francis <ma*@alcyone.com> writes:
Curt wrote:
Oh yes, you did. Was that the contrived example he was referring to? I actually thought he was alluding to your contrived example which was
a variation on the theme of my contrived example. Why did I think
that? Let's go back to the context of the whole shebang which you've cut.

This is exceptionally tiring. I don't really care what the other poster
said. You're concentrating on the one example I gave that you insist


You don't care what the other poster said. I did, however, because I was
responding to his post. Get it? I made a reasonable assumption in the
context of his article; you insulted me gratuitously and dishonestly because
you cut that context which made it clear that my remark was in good faith.

Jul 18 '05 #221
"Michael Geary" <Mi**@DeleteThis.Geary.com> writes:
Curt... It is unfortunate that both the program name and the one-line description of
uniq are misleading: uniq - remove duplicate lines from a sorted file

The Gnu/Linux man page is not the only document in the world that's misleading
about "uniq".

http://www.gnu.org/manual/textutils-...xtutils_7.html

7.2 uniq: Uniquify files

uniq writes the unique lines in the given `input', or standard input
if nothing is given or for an input name of `-'. Synopsis:

uniq [option]... [input [output]]

By default, uniq prints the unique lines in a sorted file, i.e.,
******
discards all but one of identical successive lines. Optionally, it can
instead show only lines that appear exactly once, or lines that appear
more than once.

The input must be sorted. If your input is not sorted, perhaps you
******
want to use sort -u.
http://publib16.boulder.ibm.com/pser...cmds5/uniq.htm

Description

The uniq command deletes repeated lines in a file. The uniq command
reads either standard input or a file specified by the InFile
parameter. The command first compares adjacent lines and then removes
the second and succeeding duplications of a line. Duplicated lines
must be adjacent. (Before issuing the uniq command, use the sort
****
command to make all duplicate lines adjacent.)

http://www.tldp.org/LDP/abs/html/textproc.html

uniq

This filter removes duplicate lines from a sorted file. It is often
******
seen in a pipe coupled with sort.
****
cat list-1 list-2 list-3 | sort | uniq > final.list # Concatenates the
list files, # sorts them, # removes duplicate lines, # and finally
writes the result to an output file.

================================================== =====================

It seems to me that this utility was developed to "uniquify" files; that
is, remove every and all duplicate lines in order that every line be
"unique". I can only assume that this was the primary, fundamental use
case in the mind of the author when he developed the program, and that's
why he called it "uniq". If you find this misleading, maybe you should
file a bug report. ;-)

Anyway, I'm having lunch with him next Friday so I'll ask him what he
had in mind and let you know if he remembers just exactly what that was.


Jul 18 '05 #222
Alex Martelli <al***@aleax.it> writes:
That would have a larger big O time growth value -- most likely
O(n * log n) vs. O(n), for reasonable implemenations. And while I
If the sequence is carefully randomized, yes. If the sequence has
any semblance of pre-existing order, the timsort is amazingly good
at exploiting it, so that in many real-world cases it DOES run as
O(N).
C'mon -- to make robust programs you have to assume the worst-case
scenario for your data, not the best case. I certainly don't want to
write a program that runs quickly most of the time and then for opaque
reasons slows to a crawl occasionally. I want it to either run
quickly all of the time or run really slowly all of the time (so that
I can then figure out what is wrong and fix it).
wouldn't sweat a factor of 2 for a feature of a RAD or scripting
language, I would be more concerned about moving to a larger big O
value. Me too! That's why I'd like to make SURE that some benighted soul
cannot code: onebigstring = reduce(str.__add__, lotsofstrings)
The idea of aiming a language at trying to prevent people from doing
stupid things is just innane, if you ask me. It's not just inane,
it's offensive to my concept of human ability and creativity. Let
people make their mistakes, and then let them learn from them. Make a
programming language to be a fluid and natural medium for expressing
their concepts, not a straight-jacket for citing canon in the
orthodox manner.

Furthermore, by your argument, we have to get rid of loops, since
an obvious way of appending strings is:

result = ""
for s in strings: result += s
Your proposed extension to max() and min() has all the same problems. Not at all. Maybe you have totally misunderstood "my proposed
extension"?
You are correct, sorry -- I misunderstood your proposed extension.
But max() and min() still have all the same problems as reduce(), and
so does sum(), since the programmer can provide his own comparison
and addition operations in a user-defined class, and therefore, he can
make precisely the same mistakes and abuses with sum(), min(), and
max() that he can with reduce().
But reasonable programmers don't abuse this generality, and so there So, you're claiming that ALL people who were defending 'reduce' by
posting use cases which DID "abuse this generality" are
unreasonable?
In this small regard, at least, yes. So, here reduce() has granted
them the opportunity to learn from their mistakes and become better
programmers.
this urge to be stiffled. Don't take my word for it -- ask Paul
Graham. I believe he was even invited to give the Keynote Address at
a recent PyCon. However, if you agree with Paul Graham's theories on language
design, you should be consistent, and use Lisp.
I don't agree with *everything* that anyone says. But if there were a
version of Lisp that were as tuned for scripting as Python is, as
portable as Python is, came with as many batteries installed, and had
anywhere near as large a user-base, I probably *would* be using Lisp.
But there isn't, so I don't.

And Python suits me fine. But if it continues to be bloated with a
large number special-purpose features, rather than a small number of
general and expressive features, there may come a time when Python
will no longer suit me.
If you consider Python to be preferable, then there must be some
point on which you disagree with him. In my case, I would put
"simplicity vs generality" issues as the crux of my own
disagreements with Dr. Graham.
Bloating the language with lots of special-purpose features does not
match my idea of simplicity. To the extent that I have succeeded in
this world, it has always been by understanding how to simplify things
by moving to a more general model, meaning that I have to memorize and
understand less. Memorizing lots of detail is not something I am
particurly good at, and is one of the reasons why I dislike Perl so
much. Who can remember all that stuff in Perl? Certainly not I. I
suppose some people can, but this is why *I* prefer Python -- there is
much less to remember.

Apparently you would have it so that for every common task you might
want to do there is one "right" way to do it that you have to
remember, and the language is packed to the brim with special features
to support each of these common tasks. That's not simple! That's a
nightmare of memorizing special cases, and if that were the future of
Python, then that future would be little better than Perl.
Just what is it that I don't grasp again? I think my position is
clear: I have no intention to abuse reduce(), so I don't worry myself
with ways in which I might be tempted to. Yet you want reduce to keep accepting ANY callable that takes two
arguments as its first argument, differently from APL's / (which does
NOT accept arbitrary functions on its left);
That's because I believe that there should be little distinction
between features built into the language and the features that users
can add to it themselves. This is one of the primary benefits of
object-oriented languages -- they allow the user to add new data types
that are as facile to use as the built-in data types.
and you claimed that reduce could be removed if add, mul, etc, would
accept arbitrary numbers of arguments. This set of stances is not
self-consistent.
Either solution is fine with me. I just don't think that addition
should be placed on a pedestal above other operations. This means that
you have to remember that addition is different from all the other
operations, and then when you want to multiply a bunch of numbers
together, or xor them together, for example, you use a different
idiom, and if you haven't remembered that addition has been placed on
this pedestal, you become frustrated when you can't find the
equivalent of sum() for multiplication or xor in the manual.
So, now you *do* want multiple obviously right ways to do the same
thing? sum(sequence) is the obviously right way to sum the numbers that are
the items of sequence. If that maps to add.reduce(sequence), no problem;
nobody in their right mind would claim the latter as "the one obvious
way", exactly because it IS quite un-obvious.
It's quite obvious to me. As is a loop.
The point is that the primary meaning of "reduce" is "diminish", and
when you're summing (positive:-) numbers you are not diminishing
anything whatsoever
Of course you are: You are reducing a bunch of numbers down to one
number.
"summary" or "gist" in addition to addition. It also can be confusing
by appearing to be just a synonym for "add". Now people might have
trouble remember what the difference between sum() and add() is. Got any relevant experience teaching Python? I have plenty and I
have never met ANY case of the "trouble" you mention.
Yes, I taught a seminar on Python, and I didn't feel it necessary to
teach either sum() or reduce(). I taught loops, and I feel confident
that by the time a student is ready for sum(), they are ready for
reduce().
In Computer Science, however, "reduce" typically only has one meaning
when provided as a function in a language, and programmers might as
well learn that sooner than later.


I think you're wrong. "reduce dimensionality of a multi-dimensional
array by 1 by operating along one axis" is one such meaning, but there
are many others. For example, the second Google hit for "reduce
function" gives me:

http://www.gits.nl/dg/node65.html
That's a specialized meaning of "reduce" in a specific application
domain, not a function in a general-purpose programming.
where 'reduce' applies to rewriting for multi-dot grammars, and
the 5th hit is http://www.dcs.ed.ac.uk/home/stg/NOTES/node31.html which uses a much more complicated generalization:
It still means the same thing that reduce() typically means. They've
just generalized it further. Some language might generalize sum()
further than you have in Python. That wouldn't mean that it still
didn't mean the same thing.
while http://csdl.computer.org/comp/trans/...4/i0364abs.htm
deals with "the derivation of general methods for the L/sub 2/
approximation of signals by polynomial splines" and defines REDUCE
as "prefilter and down-sampler" (which is exactly as I might expect
it to be defined in any language dealing mostly with signal
processing, of course).
Again a specialized domain.
Designing an over-general approach, and "fixing it in the docs" by
telling people not to use 90% of the generality they so obviously
get, is not a fully satisfactory solution. Add in the caveats about
not using reduce(str.__add__, manystrings), etc, and any reasonable
observer would agree that reduce had better be redesigned.
You have to educate people not to do stupid things with loop and sum
too. I can't see this as much of an argument.
Again, I commend APL's approach, also seen with more generality in
Numeric (in APL you're stuck with the existing operator on the left
of + -- in Numeric you can, in theory, write your own ufuncs), as
saner. While not quite as advisable, allowing callables such as
operator.add to take multiple arguments would afford a similarly
_correctly-limited generality_ effect. reduce + a zillion warnings
about not using most of its potential is just an unsatisfactory
combination.


You hardly need a zillion warnings. A couple examples will suffice.

|>oug
Jul 18 '05 #223
Jumping into this a bit late...

Douglas Alan wrote:
Alex Martelli <al***@aleax.it> writes:
That would have a larger big O time growth value -- most likely
O(n * log n) vs. O(n), for reasonable implemenations. And while I

If the sequence is carefully randomized, yes. If the sequence has
any semblance of pre-existing order, the timsort is amazingly good
at exploiting it, so that in many real-world cases it DOES run as
O(N).
C'mon -- to make robust programs you have to assume the worst-case
scenario for your data, not the best case. I certainly don't want to
write a program that runs quickly most of the time and then for opaque
reasons slows to a crawl occasionally. I want it to either run
quickly all of the time or run really slowly all of the time (so that
I can then figure out what is wrong and fix it).


I think that's a bit naive. You really need to understand what the
usage pattern of your data is going to be like. If upstream data is
being validity checked by the UI, for exampe, then you can generally
assume that most of the data is valid and you code assuming that most of
the data will be good. If a pathological case happens that slows the
system down but that case is likely to happen only once a month, it may
be worth it. However, if 50% of your data is likely to be bad than
that would be unaccetable.
Me too! That's why I'd like to make SURE that some benighted soul
cannot code:

onebigstring = reduce(str.__add__, lotsofstrings)


The idea of aiming a language at trying to prevent people from doing
stupid things is just innane, if you ask me. It's not just inane,
it's offensive to my concept of human ability and creativity. Let
people make their mistakes, and then let them learn from them. Make a
programming language to be a fluid and natural medium for expressing
their concepts, not a straight-jacket for citing canon in the
orthodox manner.

I agree completely. One reason I like Smalltalk and Python is that they
keep track of the mundane book-keeping, but then get out of my way and
don't try to limit me too much.
And Python suits me fine. But if it continues to be bloated with a
large number special-purpose features, rather than a small number of
general and expressive features, there may come a time when Python
will no longer suit me.


This is something I'm watching as well. I use Python where Smalltalk in
inappropriate. One thing I like about Smalltalk is that it has a small
number of powerful general principles that it follows extensively. This
means there are not a lot of suprises and not a lot of 'language rules'
to keep in mind when writing my code or looking at others. What first
attracted me to Python was a lot of the same thing; simple syntax and
some simple concepts used throughout. To me there is a difference
between adding to a language and extending a language along it's own
principles. As Python grows and matures, if it solidifies and extends
along it's natural principles, I will be happy. If it just adds new
stuff on, I won't be as happy. I think list comprehensions, where the
first thing I saw that made me nervous as to how the language was going
to grow.

If you consider Python to be preferable, then there must be some
point on which you disagree with him. In my case, I would put
"simplicity vs generality" issues as the crux of my own
disagreements with Dr. Graham.


Bloating the language with lots of special-purpose features does not
match my idea of simplicity. To the extent that I have succeeded in
this world, it has always been by understanding how to simplify things
by moving to a more general model, meaning that I have to memorize and
understand less. Memorizing lots of detail is not something I am
particurly good at, and is one of the reasons why I dislike Perl so
much. Who can remember all that stuff in Perl? Certainly not I. I
suppose some people can, but this is why *I* prefer Python -- there is
much less to remember.


Ditto

Jul 18 '05 #224
Douglas Alan wrote:
Alex Martelli <al***@aleax.it> writes:

That would have a larger big O time growth value -- most likely
O(n * log n) vs. O(n), for reasonable implemenations. And while I
If the sequence is carefully randomized, yes. If the sequence has
any semblance of pre-existing order, the timsort is amazingly good
at exploiting it, so that in many real-world cases it DOES run as
O(N).

C'mon -- to make robust programs you have to assume the worst-case
scenario for your data, not the best case. I certainly don't want to
write a program that runs quickly most of the time and then for opaque
reasons slows to a crawl occasionally. I want it to either run
quickly all of the time or run really slowly all of the time (so that
I can then figure out what is wrong and fix it).


In theory, I'd agree with you Douglas. But IRL, I agree with Alex. If I
have to choose between two algorithms that do almost the same, but one
works on an special case (that's very common to my range) and the other
works in general, I'd go with the special case. There is no compelling
reason for getting into the trouble of a general approach if I can do it
correctly with a simpler, special case.

One example. I once wrote a small program for my graphic calculator to
analyze (rather) simple electrical networks. To make the long story
short, I had two approaches: implement Gaussian Elimination "purely" (no
modifications to take into account some nasty problems) or implement it
with scaled partial pivoting. Sure, the partial pivoting is _much_
better than no pivoting at all, but for the type of electrical networks
I was going to analyze, there was no need to. All the fuzz about
pivoting is to prevent a small value to wreck havoc in the calculations.
In this *specific* case (electric networks with no dependent sources),
it was impossible for this situation to ever happen.

Results? Reliable results in the specific range of working, which is
what I wanted. :D
wouldn't sweat a factor of 2 for a feature of a RAD or scripting
language, I would be more concerned about moving to a larger big O
value.
Me too! That's why I'd like to make SURE that some benighted soul
cannot code:

onebigstring = reduce(str.__add__, lotsofstrings)

The idea of aiming a language at trying to prevent people from doing
stupid things is just innane, if you ask me. It's not just inane,
it's offensive to my concept of human ability and creativity. Let
people make their mistakes, and then let them learn from them. Make a
programming language to be a fluid and natural medium for expressing
their concepts, not a straight-jacket for citing canon in the
orthodox manner.


It's not about restraining someone from doing something. It's about
making it possible to *read* the "f(.)+" code. Human ability and
creativity are not comprimised when restrictions are made. In any case,
try to program an MCU (micro-controller unit). Python's restrictions are
nothing compared with what you have to deal in a MCU.
Furthermore, by your argument, we have to get rid of loops, since
an obvious way of appending strings is:

result = ""
for s in strings: result += s
By your logic, fly swatters should be banned because shotguns are more
general. :S It's a matter of design decisions. Whatever the designer
thinks is better, so be it (in this case, GvR).

At least, in my CS introductory class, one of the things we learned was
that programming languages could be extremely easy to read, but very
hard to write and vice versa. These design decisions *must* be taken by
someone *and* they should stick to them. Decision can be changed, but
must be done with rather large quantities of caution.
Your proposed extension to max() and min() has all the same problems.
Not at all. Maybe you have totally misunderstood "my proposed
extension"?


You are correct, sorry -- I misunderstood your proposed extension.
But max() and min() still have all the same problems as reduce(), and
so does sum(), since the programmer can provide his own comparison
and addition operations in a user-defined class, and therefore, he can
make precisely the same mistakes and abuses with sum(), min(), and
max() that he can with reduce().


It might still be abused, but not as much as reduce(). But considering
the alternative (reduce(), namely), it's *much* better because it shifts
the problem to someone else. We are consenting adults, y'know.
But reasonable programmers don't abuse this generality, and so there
So, you're claiming that ALL people who were defending 'reduce' by
posting use cases which DID "abuse this generality" are
unreasonable?


In this small regard, at least, yes. So, here reduce() has granted
them the opportunity to learn from their mistakes and become better
programmers.


Then reduce() shouldn't be as general as it is in the first place.
this urge to be stiffled. Don't take my word for it -- ask Paul
Graham. I believe he was even invited to give the Keynote Address at
a recent PyCon.
However, if you agree with Paul Graham's theories on language
design, you should be consistent, and use Lisp.


I don't agree with *everything* that anyone says. But if there were a
version of Lisp that were as tuned for scripting as Python is, as
portable as Python is, came with as many batteries installed, and had
anywhere near as large a user-base, I probably *would* be using Lisp.
But there isn't, so I don't.

And Python suits me fine. But if it continues to be bloated with a
large number special-purpose features, rather than a small number of
general and expressive features, there may come a time when Python
will no longer suit me.


reduce() ... expressive? LOL. I'll grant you it's (over)general, but
expressive? Hardly. It's not a bad thing, but, as Martelli noted, it is
overgeneralized. Not everyone understand the concept off the bat (as you
_love_ to claim) and not everyone find it useful.
If you consider Python to be preferable, then there must be some
point on which you disagree with him. In my case, I would put
"simplicity vs generality" issues as the crux of my own
disagreements with Dr. Graham.


Bloating the language with lots of special-purpose features does not
match my idea of simplicity. To the extent that I have succeeded in
this world, it has always been by understanding how to simplify things
by moving to a more general model, meaning that I have to memorize and
understand less. Memorizing lots of detail is not something I am
particurly good at, and is one of the reasons why I dislike Perl so
much. Who can remember all that stuff in Perl? Certainly not I. I
suppose some people can, but this is why *I* prefer Python -- there is
much less to remember.


Then you should understand some of the design decisions reached (I said,
understand, not agree).

What I understand for simplicity is that I should not memorize anything
at all, if I read the code. That's one of the things I absolutely hate
about LISP/Scheme. Especially when it comes to debugging the darn code.
Apparently you would have it so that for every common task you might
want to do there is one "right" way to do it that you have to
remember, and the language is packed to the brim with special features
to support each of these common tasks. That's not simple! That's a
nightmare of memorizing special cases, and if that were the future of
Python, then that future would be little better than Perl.


As I have said, these are design decisions GvR reached or he should be
taking in the future.

Now, what's so hard about sum()? Can't you get what it does by reading
the function name? The hardest part of _any_ software project is not
writing it, but mantaining it. IIRC (and if the figures arent' correct,
please someone correct me), in a complete software project life cycle,
more than 70% of the total budget spent. So you will find that most
companies standarize many things that can be faster/better/"obviously"
done any other way, i.e., x ^= x in C/C++.
Just what is it that I don't grasp again? I think my position is
clear: I have no intention to abuse reduce(), so I don't worry myself
with ways in which I might be tempted to.
Yet you want reduce to keep accepting ANY callable that takes two
arguments as its first argument, differently from APL's / (which does
NOT accept arbitrary functions on its left);


That's because I believe that there should be little distinction
between features built into the language and the features that users
can add to it themselves. This is one of the primary benefits of
object-oriented languages -- they allow the user to add new data types
that are as facile to use as the built-in data types.


_Then_ let them *build* new classes to use sum(), min(), max(), etc.
These functionality is better suited for a class/object in an OO
approach anyway, *not* a function.
and you claimed that reduce could be removed if add, mul, etc, would
accept arbitrary numbers of arguments. This set of stances is not
self-consistent.


Either solution is fine with me. I just don't think that addition
should be placed on a pedestal above other operations. This means that
you have to remember that addition is different from all the other
operations, and then when you want to multiply a bunch of numbers
together, or xor them together, for example, you use a different
idiom, and if you haven't remembered that addition has been placed on
this pedestal, you become frustrated when you can't find the
equivalent of sum() for multiplication or xor in the manual.


Have you ever programmed in assembly? It's worth a look...

(In case someone's wondering, addition is the only operation available
in many MPU/MCUs. Multiplication is heavily expensive in those that
support it.)
So, now you *do* want multiple obviously right ways to do the same
thing?
sum(sequence) is the obviously right way to sum the numbers that are
the items of sequence. If that maps to add.reduce(sequence), no problem;
nobody in their right mind would claim the latter as "the one obvious
way", exactly because it IS quite un-obvious.


It's quite obvious to me. As is a loop.


Procecution rests.
The point is that the primary meaning of "reduce" is "diminish", and
when you're summing (positive:-) numbers you are not diminishing
anything whatsoever


Of course you are: You are reducing a bunch of numbers down to one
number.


That make sense if you are in a math related area. But for a layperson,
that is nonsense.
"summary" or "gist" in addition to addition. It also can be confusing
by appearing to be just a synonym for "add". Now people might have
trouble remember what the difference between sum() and add() is.
Got any relevant experience teaching Python? I have plenty and I
have never met ANY case of the "trouble" you mention.


Yes, I taught a seminar on Python, and I didn't feel it necessary to
teach either sum() or reduce(). I taught loops, and I feel confident
that by the time a student is ready for sum(), they are ready for
reduce().


<sarcasm>But why didn't you teached reduce()? If it were so simple, it
was a must in the seminar.</sarcasm>

Now in a more serious note, reduce() is not an easy concept to grasp.
That's why many people don't want it in the language. The middle land
obviously is to reduce the functionality of reduce().
In Computer Science, however, "reduce" typically only has one meaning
when provided as a function in a language, and programmers might as
well learn that sooner than later.


I think you're wrong. "reduce dimensionality of a multi-dimensional
array by 1 by operating along one axis" is one such meaning, but there
are many others. For example, the second Google hit for "reduce
function" gives me:

http://www.gits.nl/dg/node65.html


That's a specialized meaning of "reduce" in a specific application
domain, not a function in a general-purpose programming.
where 'reduce' applies to rewriting for multi-dot grammars, and
the 5th hit is

http://www.dcs.ed.ac.uk/home/stg/NOTES/node31.html

which uses a much more complicated generalization:


It still means the same thing that reduce() typically means. They've
just generalized it further. Some language might generalize sum()
further than you have in Python. That wouldn't mean that it still
didn't mean the same thing.
while http://csdl.computer.org/comp/trans/...4/i0364abs.htm
deals with "the derivation of general methods for the L/sub 2/
approximation of signals by polynomial splines" and defines REDUCE
as "prefilter and down-sampler" (which is exactly as I might expect
it to be defined in any language dealing mostly with signal
processing, of course).


Again a specialized domain.


?-|

You mention here "general-purpose programming". The other languages that
I have done something more than a small code snippet (C/C++, Java and
PHP) lack a reduce()-like function. And it haven't hurt them by lacking
it. If something like reduce() to be "general", I ~think~ it should be
something in one of the mainstream languages. For example, regex is
"general" in programming languages either because libraries for it have
been added to existing languages (C/C++), or because it has been
incorporated into rising languages (Perl, PHP).
Designing an over-general approach, and "fixing it in the docs" by
telling people not to use 90% of the generality they so obviously
get, is not a fully satisfactory solution. Add in the caveats about
not using reduce(str.__add__, manystrings), etc, and any reasonable
observer would agree that reduce had better be redesigned.


You have to educate people not to do stupid things with loop and sum
too. I can't see this as much of an argument.


After reading the whole message, how do you plan to do *that*? The only
way to effectively do this is by warning the user explicitly *and*
limiting the power the functionality has. As Martelli said, APL and
Numeric does have an equivalent to reduce(), but it's limited to a range
of functions. Doing so ensures that abuse can be contained.

And remember, we are talking about the real world.
Again, I commend APL's approach, also seen with more generality in
Numeric (in APL you're stuck with the existing operator on the left
of + -- in Numeric you can, in theory, write your own ufuncs), as
saner. While not quite as advisable, allowing callables such as
operator.add to take multiple arguments would afford a similarly
_correctly-limited generality_ effect. reduce + a zillion warnings
about not using most of its potential is just an unsatisfactory
combination.


You hardly need a zillion warnings. A couple examples will suffice.


I'd rather have the warnings. It's much better than me saying "How
funny, this shouldn't do this..." later. Why? Because you can't predict
what people will actually do. Pretending that most people will act like
you is insane.

Two last things:

1) Do you have any extensive experience with C/C++? (By extensive, I
mean a small-medium to medium project) These languages taught me the
value of -Wall. There are way too many bugs lurking in the warnings to
just ignore them.

2) Do you have any experience in the design process?

--
Andres Rosado

-----BEGIN TF FAN CODE BLOCK-----
G+++ G1 G2+ BW++++ MW++ BM+ Rid+ Arm-- FR+ FW-
#3 D+ ADA N++ W OQP MUSH- BC- CN++ OM P75
-----END TF FAN CODE BLOCK-----

"Greed and self-interest, eh? Excellent! I discern a protege!"
-- Starscream to Blackarachnia, "Possession"

Jul 18 '05 #225
BW Glitch <bw******@hotpop.com> writes:
Douglas Alan wrote:
C'mon -- to make robust programs you have to assume the worst-case
scenario for your data, not the best case. I certainly don't want
to write a program that runs quickly most of the time and then for
opaque reasons slows to a crawl occasionally. I want it to either
run quickly all of the time or run really slowly all of the time
(so that I can then figure out what is wrong and fix it). In theory, I'd agree with you Douglas. But IRL, I agree with
Alex. If I have to choose between two algorithms that do almost the
same, but one works on an special case (that's very common to my
range) and the other works in general, I'd go with the special
case. There is no compelling reason for getting into the trouble of
a general approach if I can do it correctly with a simpler, special
case.
When people assert that

reduce(add, seq)

is so much harder to use, read, or understand than

sum(seq)

I find myself incredulous. People are making such claims either
because they are sniffing the fumes of their own righteous argument,
or because they are living on a different planet from me. On my
planet, reduce() is trivial to understand and it often comes in handy.
I find it worrisome that a number of vocal people seem to be living on
another planet (or could use a bit of fresh air), since if they end up
having any significant influence on the future of Python, then, from
where I am standing, Python will be aimed at aliens. While this may
be fine and good for aliens, I really wish to use a language designed
for natives of my world.

Reasonable people on my world typically seem to realize that sum() is
just not useful enough that it belongs being a built-in in a
general-purpose programming language that aims for simplicity. This
is why sum() occurs rather infrequently as a built-in in general
purpose programming languages. sum(), however, should be in the
dictionary as a quintessential example of the word "bloat". If you
agree that sum() should have been added to the language as a built-in,
then you want Python to be a bloated language, whether you think you
do, or not. It is arguable whether reduce() is useful enough that it
belongs as a built-in, but it has many more uses than sum(), and
therefore, there's a case for reduce() being a built-in. reduce() may
carry the weight of its inclusion, but sum() certainly does not.
One example. I once wrote a small program for my graphic calculator to
analyze (rather) simple electrical networks. To make the long story
short, I had two approaches: implement Gaussian Elimination "purely" (no
modifications to take into account some nasty problems) or implement it
with scaled partial pivoting. Sure, the partial pivoting is _much_
better than no pivoting at all, but for the type of electrical networks
I was going to analyze, there was no need to. All the fuzz about
pivoting is to prevent a small value to wreck havoc in the calculations.
In this *specific* case (electric networks with no dependent sources),
it was impossible for this situation to ever happen. Results? Reliable results in the specific range of working, which is
what I wanted. :D
You weren't designing a general purpose programming language -- you
were designing a tool to meet your own idiosyncratic needs, in which
case, you could have it tap dance and read Esperanto, but only on
second Tuesdays, if you wanted it to, and no one need second guess
you. But that's not the situation that we're talking about.
[Alex Martelli:] Me too! That's why I'd like to make SURE that
some benighted soul cannot code: onebigstring = reduce(str.__add__, lotsofstrings)
The idea of aiming a language at trying to prevent people from
doing stupid things is just innane, if you ask me. It's not just
inane, it's offensive to my concept of human ability and
creativity. Let people make their mistakes, and then let them
learn from them. Make a programming language to be a fluid and
natural medium for expressing their concepts, not a straight-jacket
for citing canon in the orthodox manner. It's not about restraining someone from doing something. It's about
making it possible to *read* the "f(.)+" code.
A language should make it *easy* to write readable code; it should not
strive to make it *impossible* to write unreadable code. There's no
way a language could succeed at that second goal anyway, and any
serious attemtps to accomplish that impossible goal would make a
langauge less expressive, more bloated, and would probably even end up
making the typical program less readable as a consequence of the
language being less expressive and more complex.
Human ability and creativity are not comprimised when restrictions
are made.
That would depend on the restrictions that you make. Apparently you
would have me not use reduce(), which would compromise my ability to
code in the clear and readable way that I enjoy.
In any case, try to program an MCU (micro-controller unit). Python's
restrictions are nothing compared with what you have to deal in a
MCU.
I'm not sure what point you have in mind. I have programmed MCU's,
but I certainly didn't do so because I felt that it was a particularly
good way to express the software I wished to compose.
Furthermore, by your argument, we have to get rid of loops, since
an obvious way of appending strings is:
result = ""
for s in strings: result += s By your logic, fly swatters should be banned because shotguns are
more general.
That's not my logic at all. In the core of the language there should
be neither shotguns nor fly-swatters, but rather elegant, orthogonal
tools (like the ability to manufacture interchangable parts according
to specs) that allow one to easily build either shotguns or
fly-swatters. Perhaps you will notice, that neither "shotgun" nor
"fly-swatter" is a non-compound (i.e., built-in) word in the English
language.
:S It's a matter of design decisions. Whatever the
designer thinks is better, so be it (in this case, GvR).
Of course it is a matter of design decisions, but that doesn't imply
that all the of design decisions are being made correctly.
At least, in my CS introductory class, one of the things we learned
was that programming languages could be extremely easy to read, but
very hard to write and vice versa
Or they can be made both easy to read *and* easy to write.
You are correct, sorry -- I misunderstood your proposed extension.
But max() and min() still have all the same problems as reduce(), and
so does sum(), since the programmer can provide his own comparison
and addition operations in a user-defined class, and therefore, he can
make precisely the same mistakes and abuses with sum(), min(), and
max() that he can with reduce(). It might still be abused, but not as much as reduce(). But
considering the alternative (reduce(), namely), it's *much* better
because it shifts the problem to someone else. We are consenting
adults, y'know.
Yes, we're consenting adults, and we can all learn to use reduce()
properly, just as we can all learn to use class signatures properly.
So, you're claiming that ALL people who were defending 'reduce' by
posting use cases which DID "abuse this generality" are
unreasonable? In this small regard, at least, yes. So, here reduce() has granted
them the opportunity to learn from their mistakes and become better
programmers. Then reduce() shouldn't be as general as it is in the first place.
By that reasoning, you would have to remove all the features from the
language that people often use incorrectly. And that would be, ummm,
*all* of them.
And Python suits me fine. But if it continues to be bloated with a
large number special-purpose features, rather than a small number of
general and expressive features, there may come a time when Python
will no longer suit me. reduce() ... expressive? LOL. I'll grant you it's (over)general, but
expressive? Hardly.
You clearly have something different in mind by the word "expressive",
but I have no idea what. I can and do radily express things that I
wish to do with reduce().
It's not a bad thing, but, as Martelli noted, it is
overgeneralized. Not everyone understand the concept off the bat (as
you _love_ to claim) and not everyone find it useful.
If they don't find it useful, they don't have to use it. Many people
clearly like it, which is why it finds itself in many programming
languages.
What I understand for simplicity is that I should not memorize anything
at all, if I read the code.
There's no way that you could make a programming langauge where you
don't have to memorize anything at all. That's ludicrous. To use any
programming language you have to spend time and effort to develop some
amount of expertise in it -- it's just a question of how much bang you
get for your learning and memorization buck.
That's one of the things I absolutely hate
about LISP/Scheme.
You have to remember a hell of lot more to understand Python code than
you do to understand Scheme code. (This is okay, because Python does
more than Scheme.) You just haven't put in the same kind of effort
into Scheme that you put into Python.

For instance, no one could argue with a straight face that

seq[1:]

is easier to learn, remember, or read than

(tail seq)
Now, what's so hard about sum()?
There's nothing hard about sum() -- it's just unneccessary bloat that
doesn't do enough to deserve being put into the language core. If we
put in sum() in the language core, why not quadratic_formula() and
newtons_method(), and play_chess()? I'd use all of those more
frequently that I would use sum(). The language core should only have
stuff that gives you a lot of bang for the buck. sum() doesn't.
That's because I believe that there should be little distinction
between features built into the language and the features that users
can add to it themselves. This is one of the primary benefits of
object-oriented languages -- they allow the user to add new data types
that are as facile to use as the built-in data types. _Then_ let them *build* new classes to use sum(), min(), max(),
etc. These functionality is better suited for a class/object in an
OO approach anyway, *not* a function.
No, making new classes is a heavy-weight kind of thing, and every
class you define in a program also should pay for its own weight.
Requiring the programer to define new classes to do very simple things
is a bad idea.
and you claimed that reduce could be removed if add, mul, etc, would
accept arbitrary numbers of arguments. This set of stances is not
self-consistent. Either solution is fine with me. I just don't think that addition
should be placed on a pedestal above other operations. This means that
you have to remember that addition is different from all the other
operations, and then when you want to multiply a bunch of numbers
together, or xor them together, for example, you use a different
idiom, and if you haven't remembered that addition has been placed on
this pedestal, you become frustrated when you can't find the
equivalent of sum() for multiplication or xor in the manual. Have you ever programmed in assembly? It's worth a look...
Yes, I have. And I've written MCU code for microcontrollers that I
designed and built myself out of adders and other TTL chips and lots
of little wires. I've even disassembled programs and patched the
binary to fix bugs for which I didn't have the source code.
(In case someone's wondering, addition is the only operation available
in many MPU/MCUs. Multiplication is heavily expensive in those that
support it.)
If your point is that high-level languages should be like extremely
low level languages, then I don't think that you will find many people
to agree with that.
The point is that the primary meaning of "reduce" is "diminish", and
when you're summing (positive:-) numbers you are not diminishing
anything whatsoever

Of course you are: You are reducing a bunch of numbers down to one
number. That make sense if you are in a math related area. But for a layperson,
that is nonsense.
It wasn't nonsene to me when I learned this in tenth grade -- it made
perfect sense. Was I some sort of math expert? Not that I recall,
unless you consider understanding algebra and geometry to be out of
the realm of the layperson.
Yes, I taught a seminar on Python, and I didn't feel it necessary
to teach either sum() or reduce(). I taught loops, and I feel
confident that by the time a student is ready for sum(), they are
ready for reduce(). <sarcasm>But why didn't you teached reduce()? If it were so simple, it
was a must in the seminar.</sarcasm>
I did teach lamda in the seminar and no one seemed to have any
difficulty with it. I really couldn't fit in the entire language in
one three-hour seminar. There are lots of other things in the
language more important than reduce(), but *all* of them are more
important thhan sum().
Now in a more serious note, reduce() is not an easy concept to
grasp. That's why many people don't want it in the language. The
middle land obviously is to reduce the functionality of reduce().
That's silly. Reducing the functionality of reduce() would make it
*harder* to explain, learn, and remember, because it would have to be
described by a bunch of special cases. As it is, reduce() is very
orthogonal and regular, which translates into conceptual simplicity.
You mention here "general-purpose programming". The other languages
that I have done something more than a small code snippet (C/C++, Java
and PHP) lack a reduce()-like function.
Lots of languages don't provide reduce() and lots of languages do. Few
provide sum(). Higher-order functions such as reduce() are
problematic in statically typed langauges such as C, C++, or Java,
which may go a long way towards explaining why none of them include
it. Neither C, C++, or Java provide sum() either, though PHP provides
array_sum(). But PHP has a huge number of built-in functions, and I
don't think that Python wishes to go in that direction.
You have to educate people not to do stupid things with loop and sum
too. I can't see this as much of an argument. After reading the whole message, how do you plan to do *that*?
You put a line in the manual saying "As a rule of thumb, reduce()
should only be passed functions that do not have side-effects."
The only way to effectively do this is by warning the user
explicitly *and* limiting the power the functionality has.
Is this the only way to educate people not to abuse loops?

I didn't think so.
As Martelli said, APL and Numeric does have an equivalent to
reduce(), but it's limited to a range of functions. Doing so ensures
that abuse can be contained.
Doing so would make the language harder to document, remember, and
implement.
You hardly need a zillion warnings. A couple examples will suffice.

I'd rather have the warnings. It's much better than me saying "How
funny, this shouldn't do this..." later. Why? Because you can't
predict what people will actually do. Pretending that most people
will act like you is insane.
If the day comes where Python starts telling me that I used reduce()
incorrectly (for code that I have that currently works fine), then
that is the last day that I would ever use Python.
Two last things: 1) Do you have any extensive experience with C/C++? (By extensive, I
mean a small-medium to medium project)
Yes, quite extensive. On large projects, in fact.
These languages taught me the value of -Wall. There are way too many
bugs lurking in the warnings to just ignore them.
I don't use C when I can avoid it because I much prefer C++. C++'s
strong static type-checking is very helpful in eliminating many bugs
before the code will even compile. I find -Wall to be useless in the
C++ compiler I typically use because it will complain about all sorts
of perfectly okay things. But that's okay, because usually once I get
a progam I write in C++ to compile, it typically has very few bugs.
2) Do you have any experience in the design process?


Yes, I design and implement software for a living.

|>oug
Jul 18 '05 #226
On Fri, 05 Dec 2003 02:48:40 -0500, Douglas Alan <ne****@mit.edu>
wrote:

When people assert that

reduce(add, seq)

is so much harder to use, read, or understand than

sum(seq)

I find myself incredulous. People are making such claims either
because they are sniffing the fumes of their own righteous argument,
or because they are living on a different planet from me. On my
planet, reduce() is trivial to understand and it often comes in handy.
I find it worrisome that a number of vocal people seem to be living on
another planet (or could use a bit of fresh air), since if they end up
having any significant influence on the future of Python, then, from
where I am standing, Python will be aimed at aliens. While this may
be fine and good for aliens, I really wish to use a language designed
for natives of my world.


The dynamics here are indeed sad.

When an MIT guy says stuff like this, it is discountable because he is
an MIT guy. What is trivial to him...

When a guy like myself, with a degree in English and an MBA says
essentially the same thing - it is more than discounted. It is
persumptuos, almost, to be participating.

The silliness of the converstations here, about what "I" of course can
understand, but cannot expect others to grasp easily have been indeed
a near downfall, in my eyes. At times, as you say, a broad and
depressing insult seems to be eminating from those discussions.

I've never noticed much insight in those discussion, and none have
served the practical decision making process in connection with the
future of Python well, at all.

I would probably include the "sum" decision in the mix.

Art

Jul 18 '05 #227

This discussion thread is closed

Replies have been disabled for this discussion.

By using this site, you agree to our Privacy Policy and Terms of Use.