reduce() anomaly?

Stephen C. Waterbury

This seems like it ought to work, according to the
description of reduce(), but it doesn't. Is this
a bug, or am I missing something?

Python 2.3.2 (#1, Oct 20 2003, 01:04:35)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

d1 = {'a':1}
d2 = {'b':2}
d3 = {'c':3}
l = [d1, d2, d3]
d4 = reduce(lambda x, y: x.update(y), l) Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update' d4 = reduce(lambda x, y: x.update(y), l, {})

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update'

- Steve.

Jul 18 '05

Subscribe Post Reply

226

12284

« First
<
2
3
4
5
>

Alex Martelli

Bengt Richter wrote:
...

Nope -- itertools is not about CONSUMERS of iterators, which this one
would be. All itertools entries RETURN iterators.
Ok, but what about returning an iterator -- e.g., funumerate(f, seq) --
that supplies f(x),x pairs like enumerate(seq) supplies i,x?
That doesn't seem to conflict with Raymond Hettinger's vision (he's the
itertools guru), but you might check with him directly.
[I'd suggest extending enumerate, but I already want to pass optional
[range parameters there,
so one could control the numeric values returned, e.g.,
enumerate(seq,<params>) corresponding to zip(xrange(<params>),seq))].
I know Raymond wants to be able to control the starting point if Guido
will allow that extension (RH is also enumerate's author) but I don't
know if he's interested in the stride and stop values, too.
[BTW, should xrange() default to xrange(0,sys.maxint,1)?]
I doubt xrange has enough of a future to be worth extending it. I'd
rather have an irange returning an iterator (and given that itertools.count
already does basically the job you mention, treating irange without args
as an error may be more useful). No special reason to single out sys.maxint
when the int/long distinction is rapidly withering, btw.

Given that the sort method of lists now has an optional key= argument, I
Sorry, I try to remember to always say something like "now (in the 2.4
pre-alpha on CVS)" but sometimes I do forget to repeat this every time I
mention 2.4 novelties.
This is a new one on me: >>> seq.sort(key=lambda x:x)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sort() takes no keyword arguments

In 2.3, yes. Get a 2.4 CVS snapshot and you'll see it work.

Do you mean the comparison function? Or is there something else now too?
I'm beginning to infer that key= is actually a keyword arg for a
_function_ to get a "key" value from a composite object (in which case
Right.
ISTM "getkeyvalue" or "valuefunc" would be a better name). But IMO "key"
suggests it will be used on elements x like x[key], not passing a
definition key=whatever and then using key(x) to get values.
The concept and terminology of a "sort key" or "sorting key" is very
popular, so the concision of the 'key' attribute name was chosen in
preference to your longer suggestions. E.g., to sort a list of strings
in order of increasing string length (and otherwise stably),

thelist.sort(key=len)

was deemed preferable to

thelist.sort(getkeyvalue=len)

Considering that the key= parameter is meant to take the place of most
uses of DSU (and thus likely become extremely popular), I concur with
the current choice for the name. However, it's not etched in stone yet:
2.4 is at the pre-alpha stage. You can check the python-dev archives
for past discussions of the issue, then join python-dev to offer your
contributions, if you think they're helpful; this is a course of action
that is always open to anybody, of course.

think the obvious approach would be to add the same optional argument to
min
and max with exactly the same semantics. I.e., just as today:

x = max(somelist)
somelist.sort()
assert x == somelist[-1]

we'd also have

x = max(somelist, key=whatever)
somelist.sort(key=whatever)
assert x == somelist[-1]

I think I like it, other than the name. Maybe s/key/valuefunc/ ;-)

If max and min do acquire such an optional argument, it will of course
have the same name and semantics as it does for sort.

That would be max(seq, key=len) in my proposal.

That's a nice option for max (and min, and ??), but ISTM that it would

max and min are it (unless heapq or bisect grow new _classes_ where such
comparison-flexibility would also be natural to have; in the current,
function-centered state of heapq, such an optional argument would not
fit well, and the situation for bisect is not crystal-clear either way).
also be nice to have a factory for efficient iterators of this kind.
It would probably be pretty efficient then to write

maxlen, maxitem = max(funumerate(len,seq))
Yes, roughly as efficient as it is today (2.4, again) to write the
less concise
max(izip(imap(len, seq), seq))
[for a RE-iterable seq] using more general itertools entries. [If
seq is not necessarily reiterable, you'd need to add a tee to this;
again, see the itertools in the current 2.4 snapshot].

However, this does not deal with the issue of seq items needing
to be comparable (and at reasonable cost, too) when they have the
same value for len(item). max and min with an optional argument
would know to ONLY compare the "sort key", just like sort does,
NEVER the items themselves. So, to fully emulate the proposed
max and min, you need to throw in an enumerate as well.
or

def longest(seq):
return max(funumerate(len,seq))[-1]

and it would be available as infrastructure for other efficient loops in
addition to being tied in to specific sequence processors like max and
min.

The only issue here is whether izip(imap(len, seq), seq) is frequent
enough to warrant a new itertools entry. As it cannot take the place
of the proposed optional argument to min and max, it doesn't really
affect them directly. The general interest of
izip(imap(len, seq), enumerate(seq))
is, I think, too low to warrant a special entry just to express it.
Alex

Jul 18 '05 #151

Alex Martelli

Andrew Dalke wrote:

Alex:
I have already suggested, in a post on this thread's direct ancestor 9 days
ago, the 100%-equivalent substitution in pure, simple, faster Python:

I've only been skimming this thread. Now upon reread I see you also
looked at the standard library. No fair -- I'm supposed to be the only
one who does that! :)

On python-dev one had better make ready by such a preliminary check
for the inevitable objections.

Nope -- itertools is not about CONSUMERS of iterators, which this one

would
be. All itertools entries RETURN iterators.

Yup. There's no good, existing, natural place for such a function, that
I can tell.

I agree. Which is why I've been claiming that we need a module of
"iterator consumers" (accumulators?) in 2.4 to get full advantage
from the new genexp's and itertools enhancements.

Given that the sort method of lists now has an optional key= argument, I
think the obvious approach would be to add the same optional argument to

min
and max with exactly the same semantics. I.e., just as today:

That follows Moshe's comment that a "Decorate Max Undecorate",
in parallel to Decorate Sort Undecorate. Nice.

Hadn't seen Moshe comment on it, but, yes, it IS the same idea.

we'd also have

x = max(somelist, key=whatever)
somelist.sort(key=whatever)
assert x == somelist[-1]

Is 'key' also an iterable? ...

No, a callable (just like in 2.4's sort).

That would be max(seq, key=len) in my proposal.

Ahh, it's the function used to get the keys. What about 'getkey'
as a name?

See my comments to Bengt on this thread. Summarizing: 2.4 is in
pre-alpha, therefore suggestions to change it will be quite
appropriate on python-dev. The name and semantics of such an
optional argument must of course be the same as whatever ends up
being accepted for lists' sort method (currently, it's key=).

I looked through the back thread but didn't see your function
definition. Is it backwards compatible to the existing max
function?... You know, I never realized how strange max was
in the first place, so that
max(1, 2)
and
max( (1, 2) )
are both valid. It must assume that if there's only one argument
then the latter form is appropriate. Then your form must say
Yes:

max(23)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: iteration over non-sequence

that if there's only one non-keyword argument then it's of
the form

max(seq, **args)

But
max(a, b, key=len)
seems reasonable as well. Then again, so does
max(seq, len)

max(seq, len) must of course remain compatible with today. Keyword
arguments are handled separately (in built-in functions).

a perfectly natural extension, it seems to me. I've found such total and
extreme opposition to this perfectly natural extension in private
correspondence about it with another Python committer that I've so far
delayed proposing it on python-dev -- for reasons that escape me it would
appear to be highly controversial rather than perfectly obvious.

My guess is that reaction is based on the need to have both forms
max(a, b)
and
max(seq)
and allow a key function to be passed in, and make it feel
natural without running into problems when the two might be
mixed up.

Interestingly enough, this objection wasn't raised. If it were,
then it would easily be squashed by analogy with somelist.sort:
somelist.sort(foo) must mean to use foo as the comparator func,
for backwards compatibility, so that one MUST give the argument
name, somelist.sort(key=foo), to use a key-extracting func instead.

even though
the tasks handled by the standard library are heavily slanted
to string and text processing, networking &c, and (of course)
"pure infrastructure", rather than typical application tasks.

Yeah, I worry about that bias error when I do my analysis.
I really should also scan some of the other (proprietary) code
bases I have access to, but then it leads to claims of being
irreproducible. Besides, despite being a scientific programmer,
I mostly do "string and text processing, networking &c."

A corpus of freely available code from the Vaults of Parnassus
might make for a useful resource for such tasks.
Alex

Jul 18 '05 #152

Alex Martelli

Ville Vainio wrote:
...

[stuff that should remain builtins]
iter, len, pow [for the crucial 3-arguments case], range (or some
preferable variant that returns an iterator), and zip seem pretty

+enumerate

Yep, it sure should be there.

are quite handy for interactive use, such as locals, globals,
dir, vars, ..., are not all that frequently used in programs --
so they might live in a module that the interactive mode gets
automatically, rather than being built-ins.

locals and globals seem to have a natural place in builtins IMHO.

Why? They only allow a limited form of introspection. Why are
they more essentially "built-in" than other stuff in inspect?

All of this would be perfect for the mailing list on Python 3.0
if the latter existed. Posting it to c.l.py makes it unlikely
Guido will ever consider the discussion's resuts, anyway. The

How about the wiki at python.org?

Sure, one could write things up there, but I don't think it's a
medium as suitable to debate as a newsgroup or mailing list.
Alex

Jul 18 '05 #153

Ville Vainio

Alex Martelli <al***@aleax.it> writes:

[ locals() and globals() ]

Why? They only allow a limited form of introspection. Why are
they more essentially "built-in" than other stuff in inspect?

Somehow their existence in builtins has a connotation that we are
referring to the "current" namespace, instead of, say,
inspect.locals() (which might bear the connotation that we are talking
about locals of the module inspect).

I've mostly used locals() and globals() as "%(myvar)s" % locals(), and
never for inspect-ish tasks.

How about the wiki at python.org?

Sure, one could write things up there, but I don't think it's a
medium as suitable to debate as a newsgroup or mailing list.

Yes, it isn't. How about a "py3k-sig"?

--
Ville Vainio http://www.students.tut.fi/~vainio24

Jul 18 '05 #154

Alex Martelli

Douglas Alan wrote:

Alex Martelli <al***@aleax.it> writes:
But to be consistent with your other arguments, no doubt you'd argue
for a .sort() followed by [-1] as "more general" than max...
That would have a larger big O time growth value -- most likely
O(n * log n) vs. O(n), for reasonable implemenations. And while I

If the sequence is carefully randomized, yes. If the sequence has
any semblance of pre-existing order, the timsort is amazingly good
at exploiting it, so that in many real-world cases it DOES run as
O(N).
wouldn't sweat a factor of 2 for a feature of a RAD or scripting
language, I would be more concerned about moving to a larger big O
value.
Me too! That's why I'd like to make SURE that some benighted soul
cannot code:

onebigstring = reduce(str.__add__, lotsofstrings)

and get O(N squared) performance, versus the O(N) performance of
the correct Python idiom, ''.join(lotsofstrings) . At least sum
does give an error message, AND a reminder that ''.join should be
used, when you try sum(lotsofstrings, '') -- reduce just slows
your program down by a LOT silently and underhandedly...

for all items in the sequence of numbers". But, reduce as it exists
in Python today cannot be clearly defined EXCEPT by such an
iteration with some generic "callable which accepts two arguments"
specified in lieu of the operator.add. Given this total generality,
even though hardly ever does it make sense to USE it, reduce becomes
quite complex.

Your proposed extension to max() and min() has all the same problems.

Not at all. Maybe you have totally misunderstood "my proposed
extension"? There is NO "callable which accepts two arguments"
involved -- and therefore no complications at all. Indeed, what
I propose is simply to ensure that under all circumstances

x = max(iterable, key=k)

binds x to exactly the same value which would be be bound by

x = list.sorted(iterable, key=k)[-1]

in Python 2.4 (max can of course guarantee O(N) behavior, while
list.sorted cannot -- this optimization, as well as the slight
simplification, would be my motivations for having key= in max).
But reasonable programmers don't abuse this generality, and so there
So, you're claiming that ALL people who were defending 'reduce' by
posting use cases which DID "abuse this generality" are unreasonable?
this urge to be stiffled. Don't take my word for it -- ask Paul
Graham. I believe he was even invited to give the Keynote Address at
a recent PyCon.
However, if you agree with Paul Graham's theories on language design,
you should be consistent, and use Lisp. If you consider Python to be
preferable, then there must be some point on which you disagree with
him. In my case, I would put "simplicity vs generality" issues as
the crux of my own disagreements with Dr. Graham.

So, Python is not designed as PG would design it (else it would be
Lisp). It's designed with far greater care for simplicity, and for
practicality, and with a jaundiced eye against excess generality.
Indeed, 'sum' itself lost substantial generality between my original
conception of it and the way Guido eventually accepted it into the
built-ins -- and Guido's on record as regretting that he did not
remove even _more_ generality from it.

You've even argued in this thread, incorrectly, that reduce could be
eliminated if only all binary operators were able to accept arbitrary
numbers of arguments. This, plus your insistence that 'reduce' is
"just like" APL's / (which does NOT accept arbitrary functions on its
left -- just specific operators such as +), indicate a _misconception_
of reduce on your part. I'm sure you don't hold it consciously, but
these miscommunications indicate that even you, reduce's paladin, do
NOT properly grasp reduce "intuitively".

Just what is it that I don't grasp again? I think my position is
clear: I have no intention to abuse reduce(), so I don't worry myself
with ways in which I might be tempted to.

Yet you want reduce to keep accepting ANY callable that takes two
arguments as its first argument, differently from APL's / (which does
NOT accept arbitrary functions on its left); and you claimed that reduce
could be removed if add, mul, etc, would accept arbitrary numbers of
arguments. This set of stances is not self-consistent.

Having (e.g.) add accept a sequence argument (like max does), or, for
explicitness and potentially performance, grow an add.reduce attribute
just like in Numeric, would give no particular problems. I'd still
want (just like Numeric does) to have sum as a name for add.reduce,
because it's by far the most common case

So, now you *do* want multiple obviously right ways to do the same
thing?

sum(sequence) is the obviously right way to sum the numbers that are
the items of sequence. If that maps to add.reduce(sequence), no problem;
nobody in their right mind would claim the latter as "the one obvious
way", exactly because it IS quite un-obvious.

and avoids getting into the issue of what the (expletive delete)
does "reducing" have to do with anything that "add.reduce" does
(it's really a curve ball compared with the many meanings of
"reduce" in English, after all).

The English world "reduce" certainly has multiple meanings, but so
does "sum". I can be a noun or a verb, for instance. It can mean

Any noun can be verbed, but that's a problem, or feature, of English
as a natural language, and unrelated to programming languages.

The point is that the primary meaning of "reduce" is "diminish",
and when you're summing (positive:-) numbers you are not diminishing
anything whatsoever ... unless you think in terms of multidimensional
arrays and diminishing dimensionality, but while that's quite OK in
APL or Numeric, which DO have multidimensional arrays, it's silly in
Python proper, which doesn't. The primary meaning of "sum" is "sum",
so I have never met anybody having the slightest problem understanding
or using it (including both people with English as their mother
tongue, and others, all the way to people with near-zero command of
English: it helps, here, that "summare" is a _Latin_ word -- so is
"reducere", but with the same primary meaning as in English:-).
"summary" or "gist" in addition to addition. It also can be confusing
by appearing to be just a synonym for "add". Now people might have
trouble remember what the difference between sum() and add() is.
Got any relevant experience teaching Python? I have plenty and I have
never met ANY case of the "trouble" you mention.

In Computer Science, however, "reduce" typically only has one meaning
when provided as a function in a language, and programmers might as
well learn that sooner than later.

I think you're wrong. "reduce dimensionality of a multi-dimensional
array by 1 by operating along one axis" is one such meaning, but there
are many others. For example, the second Google hit for "reduce
function" gives me:

http://www.gits.nl/dg/node65.html

where 'reduce' applies to rewriting for multi-dot grammars, and
the 5th hit is

http://www.dcs.ed.ac.uk/home/stg/NOTES/node31.html

which uses a much more complicated generalization:

reduce(g, e, m, n, f)=g(e,g(f(m),g(f(m+1),...g(f(n-1),g(f(n)))...)))

not to mention Python's own __reduce__, etc. And the 'reduce'
given at
http://w3future.com/html/stories/hop.xml
is what we might code as sum(map(templateFunction, sequence))
while
http://csdl.computer.org/comp/trans/...4/i0364abs.htm
deals with "the derivation of general methods for the L/sub 2/
approximation of signals by polynomial splines" and defines
REDUCE as "prefilter and down-sampler" (which is exactly as I
might expect it to be defined in any language dealing mostly
with signal processing, of course).

So, it's quite sensible for people to be confused about the
meaning of 'reduce' within a programming language.

But, most importantly, such a design would avoid the veritable traps
to which the current, too-general 'reduce' subjects the poor learner
who's quite reasonably going to believe that all that generality
MUST be good for something if it's been designed into a built-in.
We've seen quite a few such inappropriate uses on this thread, after
all.

That's very easy to fix:

FAQ
---
Q. Should I ever pass a function with side effects into reduce() or
map()?

A. No.

(Unless the side-effect is just printing out debugging information,
or saving away statistics, or something like that.)

Designing an over-general approach, and "fixing it in the docs" by
telling people not to use 90% of the generality they so obviously
get, is not a fully satisfactory solution. Add in the caveats about
not using reduce(str.__add__, manystrings), etc, and any reasonable
observer would agree that reduce had better be redesigned.

Again, I commend APL's approach, also seen with more generality in Numeric
(in APL you're stuck with the existing operator on the left of + -- in
Numeric you can, in theory, write your own ufuncs), as saner. While not
quite as advisable, allowing callables such as operator.add to take
multiple arguments would afford a similarly _correctly-limited generality_
effect. reduce + a zillion warnings about not using most of its potential
is just an unsatisfactory combination.
Alex

Jul 18 '05 #155

Alex Martelli

Terry Reedy wrote:

"Alex Martelli" <al***@aleax.it> wrote in message
news:%N********************@news2.tin.it...
to be consistent with your other arguments, no doubt you'd argue for
a .sort() followed by [-1] as "more general" than max...
By my definition of "more general", a function that returns all order
statistics of a list is trivially more general than one that returns
any fewer, including just one. It conveys a superset of the

Of course. So...?
infomation conveyed by less general functions and answers a superset
of the questions they answer. If you add a control parameter
specifying which order statistics to return, then less general
functions have a restriced domain.

So I have no idea your intent in appearing to challenge this and how
you think it contributes to your overall argument. Is your idea of
'more general' so different?

Hmmm, nobody else seemed to have any problem understanding my quip,
including the poster at which it was directed. Let me spell it out
for you: a few people are attacking 'sum' and defending 'reduce'
because the latter is "more general". Why, so is sorting a sequence
"more general" than finding its maximum; for "consistency" with their
other arguments (which by this quip I'm alleging are no good) they
should then be clamoring for the removal of 'max' to leave only 'sort'.
(I did expect an O()-based objection and punctually got it, ready for
my repartee about reduce(str.add, lotsastrings) being O(N squared),
etc, etc).

So, no, I don't think my idea of "more general" is different from
yours: e.g., a function that, given a sequence, returns its length
AND the number of times 'None' occurs as an item, is more general
than one which just returns the length. That does not make it in
any intrinsic way "necessarily preferable" -- and THAT is my point.
Alex

Jul 18 '05 #156

Alex Martelli

Terry Reedy wrote:

"Alex Martelli" <al***@aleax.it> wrote in message
news:8Y*******************@news1.tin.it...
Dave Brueck wrote:
...
>> results = [ func(x) for x in sequence ]
>> ... instead of ...
>> results = sequence.map(func) ?? > Because I find the first much more readable>
I entirely agree with both points.
For this pair, I like the second better. Different aesthetics.

I guess. I just can't imagine EVERY iterable automagically
growing a 'map' method without feelign shudders of terror at
the total ugliness and gratuitous "blackmagicness" of the idea.

They're even clearer when the contrast is between, e.g.:
results = [ x+23 for x in sequence ]
and:
results = sequence.map(lambda x: x+23)
where using the HOF approach forces you
to understand (and read) lambda too.

Here I might take the first. 'lambda' is something I feed 'stuck'
with.

I cherish being able to use the same construct, list comprehension,
whether I already have a callable ready or not.

Would the hypothetical
results = sequence.map(func x: x+23)
be any better?
Just replacing the keyword 'lambda' with 'func'? If you were
designing a green-field language, and couldn't find any other way
to express callable literals -- so it only came down to a 2-way
choice between lambda and func as keywords with the same semantics,
I guess I would suggest func as the lesser evil.
How about a very hypothetical (post ``==repr deprecation)
results = sequence..map(`x+23`)

How would this notation imply that x is an argument rather than,
say, a global?
Alex

Jul 18 '05 #157

Alex Martelli

Terry Reedy wrote:

"Alex Martelli" <al***@aleax.it> wrote in message
news:IT********************@news2.tin.it...
I'd rather put map/filter/reduce in a 'legacy' module -- warts and
all, e.g. map's horrible habit of None-padding shorter sequences,
EEK -- and have a designed-from-scratch functional module without
the warts.
What a liberating thought. I sometimes forget that the only thing I
am really stuck with when programming in Python is the core syntax (in
the Ref Manual) and that I am free to wrap or replace anything else
(in the Lib Manual). For a project I hope to start soon, I have been
worried about how to work around or explain away the deficiencies of
Python's of reduce(). Instead, I should just write the version I want
for my purposes (especially since speed does not matter). So your
voluminous writing on this has benefited at least one person.

Glad to hear this.

Probably, yes. What functions, as opposed to types, do you
think should absolutely be in the builtins rather than in a separate
module, and (unless it's obvious) why?

I have thought some about this also. Type objects could be put/left
in types, but as constructors, they are too useful and basic not to
have immediately available (unless builtins was reduced to nearly
nothing).

Yes, I think the types should stay right in the builtins.

Of course __import__ had better stay or we won't
be able to get anything imported, for example:-).

Does it have to be in builtins for import statements to work?

No, it would be technically possible to redefine "import"'s semantics
to take it from anywhere.
Direct use of __import__ seems relatively rare.

It depends on what you're comparing against, I guess, but yes,
it's not a staple of most programming styles. It's there more
to be overridden than to be directly called.

The attribute-related functions hasattr, getattr,
setattr, and delattr, seem very fundamental
(but I'd want the same status for the item-functions
currently over in operator),
as well as
(at least some of them -- delattr isn't -- but I do think that

trying to
discriminate too finely would make things too hard to learn)
very frequently used in code.

I had to re-lineate this to parse apart the three thoughts. A static
frequency-of-use analysis for some large code collection would be
interesting. I might write something eventually.

I'm pretty sure that checking for existence and getting the value
are more frequent operations than setting a new value, and that, in
turn, more frequent than deletion. That's "just the way it is" --
it's much the same as in, e.g., SQL (SELECT overwhelming more used,
INSERT and UPDATE less often, DELETE quite rarely).

iter

I group this with type constructor objects, even though not one
itself

It's a factory function, yes. Not technically a type, but the
connection in terms of use is indeed close.

len

unless made a method, keep in

A method of every sequence and container on Earth? Eeek.

pow [for the crucial 3-arguments case]

very specialized and rarely used? move to math as pow3
range (or some preferable variant that returns an iterator)

how about iter(3), etc, currently an arg type error?
think of 3 as the sentinal value.
is not typing '0,' worth the wartiness of optional first arg?
and zip seem pretty fundamental

agree except for pow

Yes, I can see pow could be controversial here.
Alex

Jul 18 '05 #158

Paul Rubin

"Andrew Dalke" <ad****@mindspring.com> writes:

Surely he was talking about implementing "list-alikes"...?
Yes, I think you're right about that, and I misinterpreted
his statement.

Yes, I was talking about "list-alikes".
Then again, is the change that "all Python list-alikes must
implement stable sort" or that "Python native lists shall
now and forever implement stable sort"?
The change was that native lists will implement stable sort. My gripe
is that if native lists are required to sort stably and list-alikes
can sort unstably, then list.sort and listalike.sort act differently
in a way that can lead to subtle bugs.
That's a statement only that list.sort shall be stable and
not that all .sort() methods must be stable.

It would be icky if some .sort() methods are required to be stable but
others are not.

Note that the most obvious way to implement sort() in C is to call the
standard C library qsort function, which is unstable.

Jul 18 '05 #159

David Eppstein

In article <7x************@ruckus.brouhaha.com>,
Paul Rubin <http://ph****@NOSPAM.invalid> wrote:

That's a statement only that list.sort shall be stable and
not that all .sort() methods must be stable.

It would be icky if some .sort() methods are required to be stable but
others are not.

Note that the most obvious way to implement sort() in C is to call the
standard C library qsort function, which is unstable.

You're complaining that stability makes implementing a list-alike's sort
trickier. However it also can make using sort simpler. Which do you
think happens more often?

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #160

Donn Cave

In article <du*************@mozart.cc.tut.fi>,
Ville Vainio <vi********************@spamtut.fi> wrote:

Douglas Alan <ne****@mit.edu> writes:
Well, bah! There are precious few constructs in this world that are
clearer and more readable than

reduce(add, seq)

I asked my non-programmer girlfriend what she thinks your construct
does - she didn't know. She immediately understood what sum(seq) does.

You guys are hysterical. "reduce" is a performance trap waiting
to pounce! (But while profiting from this bogus argument, let's
not support any solution for that problem.) It's hard to
understand, just ask my girlfriend! (Great. By the way, what
does she think of list comprehensions, generators, etc.?)

Go ahead and get rid of reduce, that sounds like a good idea to me.
The reason though is just that it's not very useful in the context
of a language like Python, and it seems to confuse people who draw
the conclusion that Python must be some kind of functional programming
language. This will be a wake-up call on that score.

Donn Cave, do**@u.washington.edu

Jul 18 '05 #161

Paul Rubin

David Eppstein <ep******@ics.uci.edu> writes:

You're complaining that stability makes implementing a list-alike's sort
trickier. However it also can make using sort simpler. Which do you
think happens more often?

I generally haven't found stability to be important. When I've cared
about doing something other than sorting (possibly unstably) on some
obvious key, I've generally needed some kind of DSU. Just sorting
stably wouldn't be enough. If I'm using DSU anyway, then getting
stability is trivial if I happen to need it.

Anyway, requiring stability makes implementing list.sort trickier in
addition to making listalike.sort trickier. That's no big deal for
CPython or (apparently) Jython, since the work is already done, but
typical sorting libraries don't necessarily provide stability. If
stability were so important so much of the time, those libraries would
provide it.

Jul 18 '05 #162

David Eppstein

In article <7x************@ruckus.brouhaha.com>,
Paul Rubin <http://ph****@NOSPAM.invalid> wrote:

I generally haven't found stability to be important. When I've cared
about doing something other than sorting (possibly unstably) on some
obvious key, I've generally needed some kind of DSU. Just sorting
stably wouldn't be enough. If I'm using DSU anyway, then getting
stability is trivial if I happen to need it.

If you're doing the DSU by hand, getting stability is not so hard.
But it's not obvious how to do it with the new key= sort argument for
simplifying DSU. So there was a long discussion on python-dev about how
maybe sort needed yet another keyword argument on top of key= for
specifying that the DSU should include the item positions and be stable;
but this seemed redundant and overcomplicated given that both current
Python sorts are already stable. So Guido ended the discussion by
declaring that sorts would remain stable, hence no extra keyword
argument is necessary.

Since DSU is now built in to the sort mechanism anyway, if you're
rolling your own sort to match that mechanism you shouldn't find it
difficult to include the positions on top of the other DSU you already
have to do.

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #163

Ville Vainio

Donn Cave <do**@u.washington.edu> writes:

Well, bah! There are precious few constructs in this world that are
clearer and more readable than

reduce(add, seq)
I asked my non-programmer girlfriend what she thinks your construct
does - she didn't know. She immediately understood what sum(seq) does.

not support any solution for that problem.) It's hard to
understand, just ask my girlfriend! (Great. By the way, what
does she think of list comprehensions, generators, etc.?)
I was merely arguing that 'reduce' is not more readable or intuitive
than 'sum', which was the argument of OP.
Go ahead and get rid of reduce, that sounds like a good idea to me.
I don't think reduce should be altogether removed, it just shouldn't
be in stdlib. And neither should sum, btw.
The reason though is just that it's not very useful in the context
of a language like Python, and it seems to confuse people who draw
the conclusion that Python must be some kind of functional programming
language. This will be a wake-up call on that score.

I wouldn't mind Python getting more influence from functional realm,
as Python seems to me to be *the* hybrid language that can pull the FP
thing while still remaining practical and intuitive (and delightfully
non-academic).

--
Ville Vainio http://www.students.tut.fi/~vainio24

Jul 18 '05 #164

Paul Rubin

Ville Vainio <vi********************@spamtut.fi> writes:

I wouldn't mind Python getting more influence from functional realm,
as Python seems to me to be *the* hybrid language that can pull the FP
thing while still remaining practical and intuitive (and delightfully
non-academic).

Python sometimes seems to go out of its way to thrwart the use of
functional style. Look at list.sort returning None, for example.

Jul 18 '05 #165

David Eppstein

In article <7x************@ruckus.brouhaha.com>,
Paul Rubin <http://ph****@NOSPAM.invalid> wrote:

Ville Vainio <vi********************@spamtut.fi> writes:
I wouldn't mind Python getting more influence from functional realm,
as Python seems to me to be *the* hybrid language that can pull the FP
thing while still remaining practical and intuitive (and delightfully
non-academic).

Python sometimes seems to go out of its way to thrwart the use of
functional style. Look at list.sort returning None, for example.

The issue here is not that it returns None, but that it changes its
input. To be truly functional, it should return a new list and leave
the original list unchanged. Returning None is just a helpful reminder
that it's not functional. Of course, the functional version would often
be less efficient...

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #166

Paul Rubin

David Eppstein <ep******@ics.uci.edu> writes:

Python sometimes seems to go out of its way to thrwart the use of
functional style. Look at list.sort returning None, for example.

The issue here is not that it returns None, but that it changes its
input. To be truly functional, it should return a new list and leave
the original list unchanged. Returning None is just a helpful reminder
that it's not functional. Of course, the functional version would often
be less efficient...

Hmmm, good point. Returning None is still inconvenient of course.

Jul 18 '05 #167

Steve Lamb

On 2003-11-11, Robin Becker <ro***@jessikat.fsnet.co.uk> wrote:

This whole thread is reminiscent of vi vs emacs or an os war or similar.
It's a pity that people with a preferred style should be so dogmatic
that they want to remove language features to prevent others using them.
The difference there is clear. vi vs. emacs, OS A vs. OS B are two
completely different entities. We're talking about the same one here. That
one has a basic philosophy.
The whole 'only one way to do it' concept is almost certainly wrong.
Erm, no.
There should be maximal freedom to express algorithms. As others have
stated min, max,... sum et al are useful specialisations, but because
they cover 99% of the space doesn't mean that reduce is redundant.
-Eliminate reducespeak and control the future-ly yrs-

You have quite a few languages you can do that in. 5+ years in Perl and
I'm done with TIMTOWTDI, thank-you-very-much. I'm glad that in Python I don't
have to learn several different ways to do the same basic thing. I lament
every time I have to step into another language which has several ways to do
the same thing and if at any time Python fits the problem space that language
occupies perfectly (as in the case of Perl) then I advocate the hell out of
it.

I'm glad I no longer have to deal with 4 ways of doing a simple if
statement. I'm glad that there are only two loop constructs; one for
iterating over a sequence and one that runs until a condition is met. It
means that at the core level I can read the code and immediately see what is
going on instead of having to memorize a dozen or so specilized ways of doing
things.

Oddly enough it is in Python that I have had the most fun programming. It
is in Python that I find myself not only the most expressive but the most
elegant in my programming. In Python my code is the clearest and most
concise. I don't for one instant feel constrained by Python. I feel
liberated by it.

Because as much as it helps when reading the code to only have to learn a
minimal set of controls the same applies to writing code as well. When I
approach a problem I don't have to agonize over "well, should I do a
do...until(), a for(;;), a while(), or something else?" It breaks down to
this. Is it a sequence? For. Is it a condition to be met? While. There,
done, move along.

--
Steve C. Lamb | I'm your priest, I'm your shrink, I'm your
PGP Key: 8B6E99C5 | main connection to the switchboard of souls.
-------------------------------+---------------------------------------------

Jul 18 '05 #168

Terry Reedy

"Alex Martelli" <al***@aleax.it> wrote in message
news:Jj********************@news2.tin.it...

The point is that the primary meaning of "reduce" is "diminish",
and when you're summing (positive:-) numbers you are not diminishing
anything whatsoever
Yes you are: you are reducing the number of numbers. Data reduction
is a standard term (in US at least) for literally reducing lots of
numbers to a fewer number of numbers, like count, sum, mean,
sum_of_squares, variance, and/or maybe a few others. As the volumn of
data generated by observational and experimental studies explodes,
useful reduction becomes even more important.

.. unless you think in terms of multidimensional arrays and diminishing dimensionality,

Reducing a dimension 1 vector to a dimension 0 number is reduction in
both senses. But even reduction of a homogeneous array to tuple of
statistics is reduction in number if not dimension.

Perhaps my career in statistics and data reduction made reduce() more
immediately obvious to me than some other people.

Terry J. Reedy

Jul 18 '05 #169

Andrew Bennetts

On Mon, Nov 17, 2003 at 02:55:11PM +0000, Alex Martelli wrote:

Terry Reedy wrote:
len

unless made a method, keep in

A method of every sequence and container on Earth? Eeek.

Like __len__, you mean?

-Andrew.

Jul 18 '05 #170

Terry Reedy

"Alex Martelli" <al***@aleax.it> wrote in message
news:p1********************@news1.tin.it...

Terry Reedy wrote:

Hmmm, nobody else seemed to have any problem understanding my quip,
Among those who dared respond ...
including the poster at which it was directed. Let me spell it out for you:

Since, as I said, your target was obscure to me, thank you...
a few people are attacking 'sum' and defending 'reduce'
because the latter is "more general".
Oh.... so that was your target.

Since reduce with initializer is completely general, equivalent to an
initialized for loop (which is why I might instead call it 'induce' or
'repeat'), and hence as or more general than max, min, range,
enumerate, map, filter, list.count, and man other functions and
methods, and since none of these other 'less general than reduce'
functions have been similarly attacked that I have noticed, the
special-case, ad-hominen-like attacks on 'sum' had mostly passed
beneath my notice as devoid of memorable content.

A consistent preference for either or both of 'more general' and 'only
one way to do it' might lead us toward a very lean Python with exactly
one general repetition-with-variation function -- either 'for'
statements or something equivalent to reduce with required
initializer. But I have not noticed anyone proposing that.
So, no, I don't think my idea of "more general" is different from
yours: e.g., a function that, given a sequence, returns its length
AND the number of times 'None' occurs as an item, is more general
than one which just returns the length. That does not make it in
any intrinsic way "necessarily preferable" -- and THAT is my point.

Agreed. Generic versus specific involves tradeoff and balance.

In my garage-workshop tool set, I have perhaps 10 different general
(adjustable) wrenches, pliers, and grips, 5 specialize (fixed-size)
box-end wenches, and perhaps 30 sockets (both metric and 'English').
I have used most of them except for the large sockets meant for auto
and truck work I don't do. So I am glad to have all of them.

Terry J. Reedy

Jul 18 '05 #171

Terry Reedy

"Alex Martelli" <al***@aleax.it> wrote in message
news:P7********************@news2.tin.it...

Terry Reedy wrote:
Would the hypothetical
results = sequence.map(func x: x+23)
be any better?

Just replacing the keyword 'lambda' with 'func'? If you were
designing a green-field language, and couldn't find any other way
to express callable literals -- so it only came down to a 2-way
choice between lambda and func as keywords with the same semantics,
I guess I would suggest func as the lesser evil.

If such a change were plausibly within the boundaries changes allowed
for the as-yet hypothetical 3.0 (or 3000), and there were not a more
radical change that I preferred, I might seriously propose this.
Upgrading would be simple than most other code-break changes.

How about a very hypothetical (post ``==repr deprecation)
results = sequence..map(`x+23`)

How would this notation imply that x is an argument rather than,
say, a global?

The above was a minimal 'concept' proposal to test the aesthetics of
something structurally different from current 'lambda's. I think I
would make all identifiers params by default, since I believe this to
be more common, and 'tag' non-locals, perhaps with one of the
reserved, as yet unused symbols. Example: lambda x: x + y == `x + @y`
or `x+y@`. Since expressions cannot assign, no global declaration is
needed.

Terry

Jul 18 '05 #172

Donn Cave

Quoth Ville Vainio <vi********************@spamtut.fi>:
....
| I was merely arguing that 'reduce' is not more readable or intuitive
| than 'sum', which was the argument of OP.

My point is that readability and intuition depend on acquired
experience. Your non-programmer girlfriend lacks enough experience
to be able to understand all kinds of stuff in Python, so she's not
a useful gauge, because at that level you can't use Python anyway.
It isn't that simple, and I don't think there really is any good
evidence for what is readable. We could talk about why someone might
think reduce is about readability, but it can only be speculation.

| I wouldn't mind Python getting more influence from functional realm,
| as Python seems to me to be *the* hybrid language that can pull the FP
| thing while still remaining practical and intuitive (and delightfully
| non-academic).

Python may be a hybrid, but certainly none of its parents were FPLs.
Trying to make it one gives both Python and FP a bad name. If you
want a language that really supports both functional and procedural
styles, I think you need Lisp. Look at Dylan, I haven't tried it but
it may be quite a bit more comfortable for Python and C programmers.

Donn Cave, do**@drizzle.com

Jul 18 '05 #173

Georgy Pruss

Maybe, it's worth to have str(x,8) and str(x,16) instead of oct(x) and hex(x)
and make str() a true inverse function to int()?
G-:
--
a='a=;print a[:2]+chr(39)+a+chr(39)+a[2:]';print a[:2]+chr(39)+a+chr(39)+a[2:]

Jul 18 '05 #174

Georgy Pruss

"Steve Lamb" <gr**@despair.dmiyu.org> wrote in message news:sl*****************@dmiyu.org...
| <..>
| Because as much as it helps when reading the code to only have to learn a
| minimal set of controls the same applies to writing code as well. When I
| approach a problem I don't have to agonize over "well, should I do a
| do...until(), a for(;;), a while(), or something else?" It breaks down to
| this.

I can't agree here. I NEVER wondered which of constructions should I
use in any particular case. I just solved the problem and wrote its solution
in that language and the language HELPED me to express myself clearly.

| Is it a sequence? For. Is it a condition to be met? While. There,
| done, move along.

Aha. Half of all the conditions in real Python programs are 1 or True. :-)

| --
| Steve C. Lamb | I'm your priest, I'm your shrink, I'm your
| PGP Key: 8B6E99C5 | main connection to the switchboard of souls.
| -------------------------------+---------------------------------------------

--
Georgy Pruss
E^mail: 'ZDAwMTEyMHQwMzMwQGhvdG1haWwuY29t\n'.decode('base6 4')

Jul 18 '05 #175

Andrew Dalke

Paul Rubin:

The change was that native lists will implement stable sort. My gripe
is that if native lists are required to sort stably and list-alikes
can sort unstably, then list.sort and listalike.sort act differently
in a way that can lead to subtle bugs. ...
That's a statement only that list.sort shall be stable and
not that all .sort() methods must be stable.

It would be icky if some .sort() methods are required to be stable but
others are not.

I've been thinking about this. I'm of two minds about it, both
for and against.

The 'for' side, which dominates, says that the only extra work is
for those people who implement a version of Python. There will
only be few of these, and the work needed to implement at least
a rough cut at a stable sort is pretty minimal. Any user of a list-alike
sort who is concerned about its stability or wants the code to work
for pre-2.3 will use the DSU idiom. People who don't know about
differences in how sort algorithms handle ties will have their sorts
work "as expected".

The 'against' side says, as you do, that having list.sort required
to be stable now places an extra barrier to anyone else who
implements a list-alike sort, because that sort should now also
be stable.

What mollifies the latter view is that some user-defined sorts
won't have ties, so there will never be a problem. And some
user-defined sorts will want a stable sort -- anything which
sorts rows based on a column in a grid display must be stable.
So there are only going to be a relatively few cases where
this will be a problem.

Besides, what I really want is a set of algorithms implemented
independent of the container so that I can have a generic
'sort' function work on a user-defined container. At that
point it would be trivial for a user-defined list-alike to
implement its .sort() method as a call to the stable-sort
algorithm, which further reduces the number of times when
the expectation of a stable sort causes a problem.

Hmmm.. To do this nicely would seem to require a
'swap' method on lists as otherwise there's a lot of
movements when swapping two items in a list.
Note that the most obvious way to implement sort() in C is to call the
standard C library qsort function, which is unstable.

And again I note that I've seen commercial software which
made the expectation that C's qsort(3C) was ... well, not
stable, but it assumed it would give the same results on
different machines. That error was fixed by implementing
their own sort algorithm. (It wasn't caught for years because
there are very few ties in their data and even fewer people
moved data from one platform to another.)

My preference is for a stable sort. It works the right way
for most cases, and no sort algorithm is going to do the
right thing for all cases. (Hence the explosion of sort
variants you rightly pointed out.)

Andrew
da***@dalkescientific.com

Jul 18 '05 #176

Andrew Dalke

Terry Reedy:

Perhaps my career in statistics and data reduction made reduce() more
immediately obvious to me than some other people.

I know when I first saw reduce, in Python BTW, I did not
intuit that meaning. I was in physics grad school and
with a bachelor's in math, with an emphasis in analysis, so
my math background was pretty strong. (Though nowadays
I look through my books and wonder that I actual knew
all that once upon a time... *sigh*)

Andrew
da***@dalkescientific.com

Jul 18 '05 #177

Ville Vainio

"Donn Cave" <do**@drizzle.com> writes:

Python may be a hybrid, but certainly none of its parents were FPLs.
Trying to make it one gives both Python and FP a bad name. If you
I don't think there is much "trying to make Python a FPL" involved w/
making lots of tried-and-true FPL constructs available in Python. It's
just taking stuff that works and enabling us to use it w/o encumbering
the language (because it's in the modules). It's all good as long as
we don't go the way of doing everything w/ recutrsion.
want a language that really supports both functional and procedural
styles, I think you need Lisp. Look at Dylan, I haven't tried it but
it may be quite a bit more comfortable for Python and C programmers.

Lisp is too verbose for my tastes (I don't want to write 'let' ot
'setq'), doesn't have much in the way of libs and generally doesn't
feel as 'right' as Python (I do use Emacs Lisp occasionally,
though.. and will try out some CL one of these days). Dylan, OTOH,
doesn't seem to be all that active a project, at least the last time I
checked.

--
Ville Vainio http://www.students.tut.fi/~vainio24

Jul 18 '05 #178

Andrew Dalke

Georgy Pruss:

Maybe, it's worth to have str(x,8) and str(x,16) instead of oct(x) and hex(x) and make str() a true inverse function to int()?

What then are the results of
str(30+44j, 16)
str(3.1415926, 8)
str([9, 8, 7], 8)
str("A", 16)
str({"A": 20}, 16)
?

Andrew
da***@dalkescientific.com

Jul 18 '05 #179

Georgy Pruss

"Terry Reedy" <tj*****@udel.edu> wrote in message news:sI********************@comcast.com...
|
| "Alex Martelli" <al***@aleax.it> wrote in message
| news:P7********************@news2.tin.it...
| <...>
|
| > > How about a very hypothetical (post ``==repr deprecation)
| > > results = sequence..map(`x+23`)
| >
| > How would this notation imply that x is an argument rather than,
| > say, a global?
|
| The above was a minimal 'concept' proposal to test the aesthetics of
| something structurally different from current 'lambda's. I think I
| would make all identifiers params by default, since I believe this to
| be more common, and 'tag' non-locals, perhaps with one of the
| reserved, as yet unused symbols. Example: lambda x: x + y == `x + @y`
| or `x+y@`. Since expressions cannot assign, no global declaration is
| needed.
|
| Terry

Or maybe
lambda x: x + y == `x: x + y`

G-:

Jul 18 '05 #180

Andrew Dalke

Georgy Pruss:

I NEVER wondered which of constructions should I
use in any particular case. I just solved the problem and wrote its solution in that language and the language HELPED me to express myself clearly.
While I, when writing Perl code, also NEVER had a problem. I
ignored just about all of its variations
if ($cond) {$a++}
$a++ if $cond;
$a++ unless !($cond);
unless (!($cond)) {$a++};

I almost only used the first of these, and when other variations
in structure came up I again had a strong preference for the one
closest to C in feel. I did this not because it was the most expressive
but because I wanted to write code that others could follow,
and indeed one of the praises I got was "wow! It's Perl code I
can actually understand!"
Half of all the conditions in real Python programs are 1 or True. :-)

Cute as a joke, but in real-life it omits while loops, wherein
the conditionals usually end up being true. ;)

x = 1
while x < 1000:
print x
x = x + random.random(10)

Andrew
da***@dalkescientific.com

Jul 18 '05 #181

Ville Vainio

"Andrew Dalke" <ad****@mindspring.com> writes:

Georgy Pruss:
Maybe, it's worth to have str(x,8) and str(x,16) instead of oct(x) and

hex(x)
and make str() a true inverse function to int()?

What then are the results of
str(30+44j, 16)
str(3.1415926, 8)
str([9, 8, 7], 8)
str("A", 16)
str({"A": 20}, 16)
?

TypeError?

--
Ville Vainio http://www.students.tut.fi/~vainio24

Jul 18 '05 #182

Georgy Pruss

"Andrew Dalke" <ad****@mindspring.com> wrote in message news:lJ*****************@newsread2.news.pas.earthl ink.net...
| Georgy Pruss:
| > Maybe, it's worth to have str(x,8) and str(x,16) instead of oct(x) and
| hex(x)
| > and make str() a true inverse function to int()?
|
| What then are the results of
| str(30+44j, 16)
| str(3.1415926, 8)
| str([9, 8, 7], 8)
| str("A", 16)
| str({"A": 20}, 16)
| ?
|
| Andrew
| da***@dalkescientific.com

I guess, the same as for
hex(30+44j)
oct(3.1415926)
oct([9, 8, 7])
hex("A")
hex({"A": 20})

G-:

Jul 18 '05 #183

Steve Lamb

On 2003-11-18, Andrew Dalke <ad****@mindspring.com> wrote:

While I, when writing Perl code, also NEVER had a problem. I
ignored just about all of its variations
if ($cond) {$a++}
$a++ if $cond;
$a++ unless !($cond);
unless (!($cond)) {$a++};
Exactly the case I was thinking of when I wrote my message. 4 ways to
express an if. After getting over the neatness factor I wrote if in exactly
one way. if ($cond) { block }. I found that by doing so my code was not only
far more readable to other people in general it was readable by *ME* just a
few weeks later.

I think the same thing every time we get someone asking for do...until or
do...while. We have while. If I want exactly 1 run before the condition is
tested I can do that. It seems far clearer to me when I see...

x = 1
while x < 2:
x = foo()
print "done"

...what is going on than having to deal with someone putting:

do:
x = foo()
while x < 2:
print "done"

...or having to deal with people who will read my code and tell me "Ya
know, you should use a do...while here." "Why? It's just another form of
while." "Well, yeah, but it guarentees the loop performs once." "So?
Initilizing your variables does that, too."
I did this not because it was the most expressive but because I wanted to
write code that others could follow, and indeed one of the praises I got was
"wow! It's Perl code I can actually understand!"

Same here. The latest compliment was that upon seeing an example of
Python and PHP code that did the same thing that my code was "concise and
clear". It is because I code with consistent structure, not with expressive
structure.

--
Steve C. Lamb | I'm your priest, I'm your shrink, I'm your
PGP Key: 8B6E99C5 | main connection to the switchboard of souls.
-------------------------------+---------------------------------------------

Jul 18 '05 #184

Andrew Dalke

Georgy Pruss:

I guess, the same as for
hex(30+44j)
oct(3.1415926)

...

Which means you want
str(obj) -> result as usual; takes any object, returns a string, all
numbers
are represented in base 10
str(obj, [base=10]) -> converts integer objects (only!) to the given base,
defaults to base 10.

That feels wrong to me. Base conversion is enough of a special
case that it doesn't warrant being part of the standard str constructor.
As an 'int.to_base' method or class method, perhaps, but not in str.

Andrew
da***@dalkescientific.com

Jul 18 '05 #185

Donn Cave

In article <du*************@amadeus.cc.tut.fi>,
Ville Vainio <vi********************@spamtut.fi> wrote:

Lisp is too verbose for my tastes (I don't want to write 'let' ot
'setq'), doesn't have much in the way of libs and generally doesn't
feel as 'right' as Python (I do use Emacs Lisp occasionally,
though.. and will try out some CL one of these days). Dylan, OTOH,
doesn't seem to be all that active a project, at least the last time I
checked.

The current version of Gwydion Dylan http://www.gwydiondylan.org/
is a whole 2 months old, with a fairly good selection of supported
platforms.

Donn Cave, do**@u.washington.edu

Jul 18 '05 #186

Gerrit Holl

Andrew Dalke wrote:

str(obj, [base=10]) -> converts integer objects (only!) to the given base,
defaults to base 10.

Well, str could be defined as follows:

def str(obj, **kwargs):
return obj.__str__(**kwargs)

That doesn't feel too wrong to me.

yours,
Gerrit.

--
39. He may, however, assign a field, garden, or house which he has
bought, and holds as property, to his wife or daughter or give it for
debt.
-- 1780 BC, Hammurabi, Code of Law
--
Asperger Syndroom - een persoonlijke benadering:
http://people.nl.linux.org/~gerrit/
Kom in verzet tegen dit kabinet:
http://www.sp.nl/

Jul 18 '05 #187

Georgy Pruss

"Andrew Dalke" <ad****@mindspring.com> wrote in message news:J4*****************@newsread2.news.pas.earthl ink.net...
| Georgy Pruss:
| > I guess, the same as for
| > hex(30+44j)
| > oct(3.1415926)
| ...
|
| Which means you want
| str(obj) -> result as usual; takes any object, returns a string, all
| numbers
| are represented in base 10
| str(obj, [base=10]) -> converts integer objects (only!) to the given base,
| defaults to base 10.
|
| That feels wrong to me. Base conversion is enough of a special
| case that it doesn't warrant being part of the standard str constructor.
| As an 'int.to_base' method or class method, perhaps, but not in str.
|
| Andrew
| da***@dalkescientific.com

To me, it's very wrong that you can read any radix numbers, but can't
print them. If str(int(s)) == s and int(str(n)) == n (with some limits), I don't
see why str(n,radix) can't be symmetrical to int(s,radix).

BTW there's no symmetry for str() and list()/tuple()/dict() etc.

--
Georgy Pruss
E^mail: 'ZDAwMTEyMHQwMzMwQGhvdG1haWwuY29t\n'.decode('base6 4')

Jul 18 '05 #188

Andrew Dalke

Georgy Pruss:

To me, it's very wrong that you can read any radix numbers, but can't
print them. If str(int(s)) == s and int(str(n)) == n (with some limits), I don't see why str(n,radix) can't be symmetrical to int(s,radix).
But your objection would also be handled with a special-case function
which only took an int/long and a base and returned the string
representation for that base, no? This could be a method or a class
method of int, or a new function in math. What makes it special enough
to warrant being part of the string constructor?
BTW there's no symmetry for str() and list()/tuple()/dict() etc.

There is a symmetry for str(float(s)) == s and str(complex(s)) == s.
Why shouldn't those take a base?

Well, almost symmetry. str(float("1.1")) != "1.1"

Andrew
da***@dalkescientific.com

Jul 18 '05 #189

Georgy Pruss

"Andrew Dalke" <ad****@mindspring.com> wrote in message news:eA*****************@newsread2.news.pas.earthl ink.net...
| Georgy Pruss:
| > To me, it's very wrong that you can read any radix numbers, but can't
| > print them. If str(int(s)) == s and int(str(n)) == n (with some limits), I
| don't
| > see why str(n,radix) can't be symmetrical to int(s,radix).
|
| But your objection would also be handled with a special-case function
| which only took an int/long and a base and returned the string
| representation for that base, no? This could be a method or a class
| method of int, or a new function in math.

I have such function already. :)

| What makes it special enough
| to warrant being part of the string constructor?

Just wanted to get rid of some useless builtin functions and make the
language prettier.

I wouldn't mind if Python had some analog of scanf() as an opposite
operator to '%'.

BTW is there some Python's equivalents to C's strspn, strcspn, strpbrk,
which return a leading sub-string, entirely [not] consisting of characters
of some char.set; and something like strtoul which parses the string and
returns the number and the position where the scan ended?

| > BTW there's no symmetry for str() and list()/tuple()/dict() etc.
|
| There is a symmetry for str(float(s)) == s and str(complex(s)) == s.
| Why shouldn't those take a base?
|
| Well, almost symmetry. str(float("1.1")) != "1.1"

Fortunatelly, str() here is a bit wiser than repr().

str(float("1.1")) == "1.1"

True

Regarding the complex numbers, I guess the inconsistency with them
is to some extent the consequence of the fact that Python has no
complex literals (contrary to e.g. J) and the str(cmplx) returns some
expression in parenthesis instead of one complex number. So this
case is the same as with lists and other compound objects.

Georgy

| Andrew
| da***@dalkescientific.com
|

Jul 18 '05 #190

Andrew Dalke

Georgy Pruss:

Just wanted to get rid of some useless builtin functions and make the
language prettier.
But I don't think that functions with special cases (and the special
case of 'takes a base but only if the first arg is an integer) is pretty.
As I've mentioned, I've not problems if hex/oct are moved to some
module in some future version of Python.
BTW is there some Python's equivalents to C's strspn, strcspn, strpbrk,
which return a leading sub-string, entirely [not] consisting of characters
of some char.set; and something like strtoul which parses the string and
returns the number and the position where the scan ended?
Not directly. In Python those are most often done with regexps.
Eg,

import re
def strspn(s, t):
# kinda slow way to construct the pattern, but it does correctly
# handle the '-' and ']' cases. Usually one would write the regexp
# directly and not try to get close to the C API.
pat = re.compile("(" + "|".join(map(re.escape, t) + ")*")
m = pat.match(s)
if not m:
return 0
return m.end()

Fortunatelly, str() here is a bit wiser than repr().
str(float("1.1")) == "1.1" True
Wiser? Or more approximate? IEE 754 math cannot explicitly
store the value "1.1".

str(1.1000000000000001) '1.1'

Regarding the complex numbers, I guess the inconsistency with them
is to some extent the consequence of the fact that Python has no
complex literals (contrary to e.g. J) and the str(cmplx) returns some
expression in parenthesis instead of one complex number. So this
case is the same as with lists and other compound objects.

In some sense you are correct. Python doesn't have a complex
literal -- it only has an imaginary literal, so '1+2j' is implemented as

0 LOAD_CONST 1 (1)
3 LOAD_CONST 2 (2j)
6 BINARY_ADD

So I'll change my statement and point out that

str(complex("3j")) == "3j"

(note the lack of parens) so the same symmetry you argue for
str/int should work for str/complex (though complex assumes
floats for the real and imaginary values, so it's more akin to the
str/float relationship.)

Andrew
da***@dalkescientific.com

Jul 18 '05 #191

Georgy Pruss

"Andrew Dalke" <ad****@mindspring.com> wrote in message news:Hd*****************@newsread2.news.pas.earthl ink.net...
| Georgy Pruss:
| > BTW is there some Python's equivalents to C's strspn, strcspn, strpbrk,
| > which return a leading sub-string, entirely [not] consisting of characters
| > of some char.set; and something like strtoul which parses the string and
| > returns the number and the position where the scan ended?
|
| Not directly. In Python those are most often done with regexps.
| Eg,
|
| import re
| def strspn(s, t):
| # kinda slow way to construct the pattern, but it does correctly
| # handle the '-' and ']' cases. Usually one would write the regexp
| # directly and not try to get close to the C API.
| pat = re.compile("(" + "|".join(map(re.escape, t) + ")*")
| m = pat.match(s)
| if not m:
| return 0
| return m.end()

Yes, thanks. Especially with regex'es it's just a matter of a minute or two.
I miss those C functions though.

G-:

| <...>
| Andrew
| da***@dalkescientific.com

Jul 18 '05 #192

Paul Rubin

"Andrew Dalke" <ad****@mindspring.com> writes:

The 'against' side says, as you do, that having list.sort required
to be stable now places an extra barrier to anyone else who
implements a list-alike sort, because that sort should now also
be stable.
It's also a barrier to anyone who implements list sort, not just
listalike sort. Of course that doesn't affect CPython or Jython
users, who already have a built-in stable list sort. But what about
PyPy or maybe someone trying to reimplement Python from scratch in C?
What mollifies the latter view is that some user-defined sorts
won't have ties, so there will never be a problem.

I'm not a sorting whiz but my feeling is that if an application needs
sort to be stable, that's already a sign that something isn't really
right about it. Sorting is supposed to mean turning an unordered
permutation into an ordered one. So if list.sort gives you a useable
answer (i.e. one that satisfies your application's needs), then
randomizing the list and then sorting it should also give you a
useable answer. The way to do that is make your comparison function
look at every part of the record that you care about, not just select
some single field and count on stability to take care of the rest. If
at all possible, design the comparison function so that there are
never any ties. To do otherwise has in my experience been a source of
bugs that don't become evident til much later.

Jul 18 '05 #193

Paul Rubin

"Andrew Dalke" <ad****@mindspring.com> writes:

Do you want unix-style unique where repeats are merged into
one, so that 1 2 2 1 -> 1 2 1 or do you want it to return

That list is not already sorted and so uniqing it should throw an
exception. Uniq should only have to work on sorted lists.

Jul 18 '05 #194

Andrew Dalke

Paul Rubin:

It's also a barrier to anyone who implements list sort, not just
listalike sort. Of course that doesn't affect CPython or Jython
users, who already have a built-in stable list sort. But what about
PyPy or maybe someone trying to reimplement Python from scratch in C?
My recent statement on this hasn't changed -- this will affect
very few people, and the code to do a simple (perhaps not optimally
efficient for Python) sort is available in many books and libraries, so
should have very little influence on this decision.
I'm not a sorting whiz but my feeling is that if an application needs
sort to be stable, that's already a sign that something isn't really
right about it.
The standard example for stable sorts is in a spreadsheet-like
grid display where the ordering of the rows can be changed based
on the values in a given column (eg, by clicking on the column
header). Rows with column entries with the same value should
keep the same relative order, because that's what people expect.

Why? Because that lets the user define the sort order -- first
sort on tertiary key, then sort on secondary key, then sort on
primary key.
The way to do that is make your comparison function
look at every part of the record that you care about, not just select
some single field and count on stability to take care of the rest.
There are two answers to that. All non-stable sorts can be
turned into a stable sort by tacking on a final comparison based
on the original position. My spreadsheet example doesn't require
having a stable sort because I create my own. But then there's
the inelegance of creating a new list to store the original position,
and the extra dispatch needed to check that new field.

And if there isn't a stable sort, nor one induced by creating
an extra list, then what is the interface for creating the correct
comparison function? It could keep track of all the column
clicks, to know which ones are used as primary, secondary,
etc. sorts keys... but at some point it ends up with a tie
which cannot be resolved (in which case there's no problem)
or resolved by assuming the datatype of a given column.

For example, suppose I click on the 2nd column in

alpha 1
beta 2
beta 3
gamma 2
delta 2

There are ties with the three values of 2. The first column
can break the tie, but only by assuming that the first column
can be sorted alphabetically, to create

alpha 1
beta 2
delta 2
gamma 2
beta 3

I argue that the correct order should preserve the original
sort order, to create

alpha 1
beta 2
gamma 2
delta 2
beta 3

because the user's order is actually the alphabetical order of the
original greek alphabet, implicit in the original order of the data
file. (The user could click on the column header to resort for
that field, which would likely be done alphabetically, but at that
point that's the same as telling the computer to treat those fields
as english words, so sorting alphabetically is fine.)
If at all possible, design the comparison function so that there are
never any ties. To do otherwise has in my experience been a source of
bugs that don't become evident til much later.

I agree with that. But what to do when there are ties? I believe
people expect a stable sort (even without knowing what 'stable'
means) so it's helpful that Python's native sort be stable.

This is obviously a hard decision to make -- Python's sort has
been around for 10 years before this new requirement -- but I
don't think it's a poor one, and I've only seen theoretical
arguments for otherwise.

Andrew
da***@dalkescientific.com

Jul 18 '05 #195

Andrew Dalke

Paul Rubin:

Uniq should only have to work on sorted lists.

*shrug* Okay, that's different than the unix 'uniq'
command. I've needed the unix one more than I've
needed yours. And I've used what can now be
written as

unique_keys = dict.from_keys(lst).keys()

much more often. Well, it requires that the objects
be hashable and define == while yours requires
that they define == and < (or define < and use it
twice). Note that your uniq wouldn't work on
complex numbers because those don't define <.

Tack on a 'unique_keys.sort()' and those two lines
give pretty much what you want, except the check
that the original list is sorted.

Andrew
da***@dalkescientific.com

Jul 18 '05 #196

Aahz

In article <mx*****************@newsread2.news.pas.earthlink. net>,
Andrew Dalke <ad****@mindspring.com> wrote:

Paul Rubin:

Uniq should only have to work on sorted lists.

*shrug* Okay, that's different than the unix 'uniq' command. I've
needed the unix one more than I've needed yours.

Huh?!?! uniq has always to my knowledge only worked on sorted input.
Reading the man page on two different systems confirms my knowledge.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

Jul 18 '05 #197

Paul Rubin

"Andrew Dalke" <ad****@mindspring.com> writes:

needed yours. And I've used what can now be
written as

unique_keys = dict.from_keys(lst).keys()

This from_keys operation must be something new, and consing up a
dictionary like that is a woeful amount of overhead. But I guess
it would do the job.

Jul 18 '05 #198

A.M. Kuchling

On 19 Nov 2003 19:45:59 -0500,
Aahz <aa**@pythoncraft.com> wrote:

Huh?!?! uniq has always to my knowledge only worked on sorted input.
Reading the man page on two different systems confirms my knowledge.

Yes, but uniq doesn't report an error if its input isn't sorted.

--amk

Jul 18 '05 #199

Paul Rubin

"A.M. Kuchling" <am*@amk.ca> writes:

Huh?!?! uniq has always to my knowledge only worked on sorted input.
Reading the man page on two different systems confirms my knowledge.

Yes, but uniq doesn't report an error if its input isn't sorted.

Maybe it should. If its behavior on unsorted input isn't specified,
you shouldn't assume it will act in any particular way.

Jul 18 '05 #200

Similar topics