473,782 Members | 2,487 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

reduce() anomaly?

This seems like it ought to work, according to the
description of reduce(), but it doesn't. Is this
a bug, or am I missing something?

Python 2.3.2 (#1, Oct 20 2003, 01:04:35)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.
d1 = {'a':1}
d2 = {'b':2}
d3 = {'c':3}
l = [d1, d2, d3]
d4 = reduce(lambda x, y: x.update(y), l) Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update' d4 = reduce(lambda x, y: x.update(y), l, {})

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'update'

- Steve.
Jul 18 '05
226 12699
In article <KL************ *****@newsread2 .news.pas.earth link.net>,
"Andrew Dalke" <ad****@mindspr ing.com> wrote:
That would be max(seq, key=len) in my proposal.


That's a nice option for max (and min, and ??), but ISTM that it would
also be nice to have a factory for efficient iterators of this kind.
It would probably be pretty efficient then to write

maxlen, maxitem = max(funumerate( len,seq))


With generator expressions this is

maxlen, maxitem = max( ((len(x), x) for x in seq) )

It still has the (slight) problem that it assumes x is comparable
when there are multiple items with the same length. Nearly
all of these quicky versions make that assumption. The
quicky will-work-for-any-item version would look more like

maxlen, pos, maxitem = max( ((len(x), i, x) for i, x in enumerate(seq)) )


The new sort(key=...) option works even when the underlying objects are
incomparable. I'd expect the same of max(key=...)

So (supposing such a change ever happens) you could instead write
maxitem = max(seq, key=len)
maxlen = len(maxitem)

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science
Jul 18 '05 #141

"Paul Rubin" <http://ph****@NOSPAM.i nvalid> wrote in message
news:7x******** ****@ruckus.bro uhaha.com...
"Andrew Dalke" <ad****@mindspr ing.com> writes:
list.sort also became guaranteed to be stable (this include
'reverse').
I don't see the reason for that. It seems like a needless restriction on implementers.


Which is why Guido long resisted making such a guarantee, in spite of
many requests. (The previous list.sort was not stable.) However,
timsort is stable, tested for about a year, and so fast for lists both
random and with various types of order, and so close to optimal in
terms of number of comparisons, that Guido was willing to 'annoint' it
as list.sort for the indefinite future. If there were a generally
useful non-stable sort discovered proposed in the future, is could be
added under a different name.

Terry J. Reedy
Jul 18 '05 #142
"Terry Reedy" <tj*****@udel.e du> writes:
Which is why Guido long resisted making such a guarantee, in spite of
many requests. (The previous list.sort was not stable.) However,
timsort is stable, tested for about a year, and so fast for lists both
random and with various types of order, and so close to optimal in
terms of number of comparisons, that Guido was willing to 'annoint' it
as list.sort for the indefinite future. If there were a generally
useful non-stable sort discovered proposed in the future, is could be
added under a different name.


I think that decision is a little too CPython-centric. Some other
Python implementation (maybe even Jython) might like to implement the
list.sort method by calling some other sort routine (like the one
already in the library for that language) rather than by porting
timsort, even if timsort is better. Also, someone might want to
implement an indexable class that isn't a list (maybe it's a disk file
or something) and want to give it a sort method that cannot work by
sucking the keys into memory and calling timsort (maybe there are too
many keys to fit in memory, so external methods must be used). It may
turn out to work best to use some nonstable sorting method, but then
the behavior will be inconsistent with list.sort and that makes more
cruft that the application programmer has to remember.

Most sorting applications don't care about stability. I think if a
new sorting method is going to get added so there's separate methods
for guaranteed-stable and possibly-nonstable sorting, it's best to let
the new method be the stable one (maybe list.ssort) and leave the
existing one alone.

Of course then you want variants that return self instead of None, so
you have list.sort, list.ssort, list.nsort, and list.nssort. It gets
out of hand.

Maybe the best way to do this kind of thing is with keyword arguments
rather than new methods.
Jul 18 '05 #143
Paul Rubin:
"Andrew Dalke" <ad****@mindspr ing.com> writes: [actually, I was quoting from the python-dev summary of Brett C's.

Paul: Changing the behavior of list.sort would be really bad, but I
certainly favor adding a new method (maybe list.nsort). There should
also be list.nuniq which takes a sorted list and strips duplicate
elements.


Do you want unix-style unique where repeats are merged into
one, so that 1 2 2 1 -> 1 2 1 or do you want it to return
items which only occur once, and you don't care about the
order (as in dict.from_keys([1, 2, 2, 1]).keys() which can
return [1, 2] or [2, 1]) or do you want the order of the keys
in the final list to maintain the same order as the initial list,
so that [2, 1, 1, 2] -> [2, 1] always?

Andrew
da***@dalkescie ntific.com
Jul 18 '05 #144
Paul Rubin:
I think that decision is a little too CPython-centric. Some other
Python implementation (maybe even Jython) might like to implement the
list.sort method by calling some other sort routine (like the one
already in the library for that language)
That's not a problem with Jython, since Java has a stable sort, see
http://java.sun.com/j2se/1.4.2/docs/...html#sort(java
..util.List)

There was a OCaml implemenation of a Python variant, called
Vyper. OCaml has a stable sort
http://caml.inria.fr/devtools/doc_oc...sic/Array.html

C++'s STL include a stable_sort
http://www.codeguru.com/cpp/stlguide/stable_sort.shtml

C# doesn't appear to have a stable sort.

WWPyPyD? (What Would PyPy do?) I don't know.
Also, someone might want to
implement an indexable class that isn't a list (maybe it's a disk file
or something) and want to give it a sort method that cannot work by
sucking the keys into memory and calling timsort (maybe there are too
many keys to fit in memory, so external methods must be used).
There are a couple of misconceptions here:

- sort is a method on lists. It only works on lists. Unlike STL,
which distinguishes between a container and an algorithm, there
is no way to apply sort directly to anything which isn't a list. Your
case (using sort on "a disk file or something") cannot occur.

- All the keys must fit in memory. This is a consequence of being
a list. (The keys may depend on other resource to do the
comparison.) C++ is different in this regard as well.

- The sort only rearranges references so there's no way to use
sort to directly sort the values on disk as you could in C++.
It may
turn out to work best to use some nonstable sorting method, but then
the behavior will be inconsistent with list.sort and that makes more
cruft that the application programmer has to remember.
It may have, but now with the restriction on what the sort must do
there's been a redefinition of what 'best' means for Python. No
implementation of Python is now allowed to do so, so the application
programmer won't have to remember it. All it does it make it
slightly harder for a C# programmer to write a Python implementation.
And I assume there's plenty of 3rd party libraries which can help out.
Most sorting applications don't care about stability.
Really? I prefer my sorts to be stable since it best agrees
with what a person would do, assuming infinite patience.
It's nice because it's well defined and platform independent --
which is important because I've seen bugs crop up in some
programs which assumed a sort (3C) under IRIX will give
the same order as under Solaris -- C's sort is not stable.
I think if a
new sorting method is going to get added so there's separate methods
for guaranteed-stable and possibly-nonstable sorting, it's best to let
the new method be the stable one (maybe list.ssort) and leave the
existing one alone.


Humbug. You want an extra method (and possibly a couple of
independent algorithms which can be used on your disk-based
but list-like data structures) which for the most widely used version
of Python (CPython) will not do anything different and for which
the second most widely used version (Jython) is a trivial change,
all so that *someday* someone who implements a Python in a
language which doesn't have a fast stable search doesn't have
to write one (copying out of any standard text book), find a
3rd party version, or translate one?

Why not worry about something more complex, like regular
expressions. CPython and Jython almost certainly have
differences in edge cases, and what will the version of Python
written in Prolog use?

Andrew
da***@dalkescie ntific.com
Jul 18 '05 #145
In article <61************ *****@newsread2 .news.pas.earth link.net>,
Andrew Dalke wrote:
[snip]
- sort is a method on lists. It only works on lists. Unlike STL,
which distinguishes between a container and an algorithm, there
is no way to apply sort directly to anything which isn't a list. Your
case (using sort on "a disk file or something") cannot occur.


Surely he was talking about implementing "list-alikes"...? I.e.
sequences polymorphically equivalent to lists, or at least wrt.
sequence-ness and sorting...? The stability requirement does make this
sort of thing a tad more difficult. (Not that I find it very
problematic; I'm just trying to interpret the meaning of the post.)

--
Magnus Lie Hetland "In this house we obey the laws of
http://hetland.org thermodynamics! " Homer Simpson
Jul 18 '05 #146
All in all, I agree with you. You have pointed out precisely
what has been making Python more attractive than, say, Perl.

In <lc************ @gaffa.mit.edu> , you, Douglas Alan, wrote:
The argument that some programmers might be too lazy to want to learn
powerful, simple, and elegant features that can be taught in seconds,
is no good reason to remove such features from Python and bloat Python
by replacing them with a plethora of less powerful, less elegant
features.
In <lc************ @gaffa.mit.edu> , you also wrote: I like Python because it is a very simple, generic, easy-to-learn
programming language, where I didn't have to learn much of anything
new. It was just like all the other myriad of programming
languages I have programmed in, only less cluttered than most, and
"light-weight" in its implementation and accessibility.
So do I.

As to sum(), when learning string addition (concatenation) ,
one may wonder why sum() does not handle it:
sum(['a', 'b', 'c']) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'str'

while general reduce() does it just as _expected_:
reduce(str.__ad d__, ['a', 'b', 'c'])

'abc'

It may be sum() that is more difficult to learn...

For this particular problem, it is better to use
''.join(['a', 'b', 'c']), you know.
However, it is important for Python to have an easy and generic
way to do things. If we have to read the manual through to solve
anything, what is the point to use Python instead of Perl (or Ruby,
to some extent)?
I having nothing against learning new stuff, by the way, but when I
want to learn new stuff, I'm not particularly interested in the
idiosyncrasies of Python -- I'd spend my effort learning something a
bit more exotic, like OCaml, or Haskell, that might stretch my brain a
bit. Learning a bunch of idiosyncratic detail is just tedious. This
is one of the most important reasons that Perl sucks so badly.
Again in <lc************ @gaffa.mit.edu> And as I have pointed out, it goes against the principle of simplicity
and expressiveness to remove an easy to use and easy to learn simple
and powerful feature with a slew of specific, tailored features. If
reduce() can be relegated to a library or for the user to implement
for himself, then so can sum(). If the language is to only have one,
it should be reduce().


I agree with you.

However, It may be better to give reduce() some nice notation.
Someone (sorry, but I do not remember) suggested such one:
[s: s+c for c in ['a', 'b', 'c']]
or
[s='': s+c for c in ['a', 'b', 'c']]
though it may not be the nicest.

-- SUZUKI Hisao

Jul 18 '05 #147
Magnus Lie Hetland:
Surely he was talking about implementing "list-alikes"...?


Yes, I think you're right about that, and I misinterpreted
his statement.

That being the case, an alternative is to have that 'sort'
implements the Python required semantics, and that
'fsort' or somesuch ('f' for 'fast'?) implement the
appropriate data-structure specific one.

Then again, is the change that "all Python list-alikes must
implement stable sort" or that "Python native lists shall
now and forever implement stable sort"?

The official statement from the BDFL is
http://mail.python.org/pipermail/pyt...er/038773.html
] OK, I pronounce on this: Python's list.sort() shall be stable.

That's a statement only that list.sort shall be stable and
not that all .sort() methods must be stable.

BTW, there was a *HUGH* amount of discussion about
sort and stability and keys and DSU on the python-dev list.
See the archive
http://mail.python.org/pipermail/pyt...ead.html#38772
and look for "decorate-sort-undecorate" as well as subjects
with the word 'sort' in them.

Andrew
da***@dalkescie ntific.com
Jul 18 '05 #148
SUZUKI Hisao <su*******@oki. com> writes:
However, It may be better to give reduce() some nice notation.


I think the idea is to hide the thing somewhere far away from
builtins, not to *increase* its prominence in the language by
introducing new syntax.

--
Ville Vainio http://www.students.tut.fi/~vainio24
Jul 18 '05 #149
SUZUKI Hisao wrote:
...
As to sum(), when learning string addition (concatenation) ,
one may wonder why sum() does not handle it:
I originally had sum try to detect the "summing strings" case and
delegate under the covers to ''.join -- Guido vetoed that as too
"clever" (which has a BAD connotation in Python) and had me forbid
the "summing strings" case instead, for simplicity.
>>> sum(['a', 'b', 'c']) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'str'


whereupon, one hopes, the user checks sum's doc:
print sum.__doc__ sum(sequence, start=0) -> value

Returns the sum of a sequence of numbers (NOT strings) plus the value
of parameter 'start'. When the sequence is empty, returns start.
and if one tries anyway:
sum(['a','b','c'],'')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]

the error message should direct the user to the proper way to sum
strings. Having more than one "obvious way to do it" was avoided.

while general reduce() does it just as _expected_:
>>> reduce(str.__ad d__, ['a', 'b', 'c'])
'abc'


Well, if what you "expect" is the following performance...:

[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,ran ge(999))'
'reduce(str.__a dd__, x)'
1000 loops, best of 3: 1.82e+03 usec per loop
[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,ran ge(999))' "''.join(x) "
10000 loops, best of 3: 68 usec per loop

i.e., a slowdown by about 2700% for a 999-items sequence,

[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,ran ge(1999))'
'reduce(str.__a dd__, x)'
100 loops, best of 3: 5e+03 usec per loop
[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,ran ge(1999))' "''.join(x) "
10000 loops, best of 3: 143 usec per loop

growing to 3500% for a 1999-items sequence, and so on without bounds,
then no doubt ``reduce does it just as expected'' by YOU.

Most people, however, EXPECT sensible performance, not slow-downs by
factors of tens or hundreds of times, when they use constructs that are
considered "normal and supported" in the language and its built-ins.

This makes reduce a terrible performance trap just waiting to catch the
unwary. It SEEMS to work all right, but in fact it's doing nothing of
the kind, nor can it -- it's defined to iteratively run N repetitions
of whatever function you pass as the first argument, therefore it can
never have O(N) performance when used to add up a sequence of strings,
but always, necessarily O(N squared). It's _conceivable_ (although it
currently appears unlikely that Guido will ever countenance it) that
'sum' can be specialcased (to use faster approaches to summation when it
is dealing with sequences, not numbers) to give the O(N) performance
most people (sensibly) DO expect; that just depends on loosening its
current specs, while maintaining the concept of "sum of a sequence".
No such avenue is open for 'reduce': it will always be a terrible
performance trap just waiting to pounce on the unwary.

It may be sum() that is more difficult to learn...
I have enough experience teaching both built-in functions, by now,
that I can rule this hypothesis out entirely.

For this particular problem, it is better to use
''.join(['a', 'b', 'c']), you know.
Yes, and sum's error messages tells you so, so, if you DON'T know,
you learn immediately.
However, it is important for Python to have an easy and generic
way to do things. If we have to read the manual through to solve
anything, what is the point to use Python instead of Perl (or Ruby,
to some extent)?
Why do you need to read the manual after getting an error message
that tells you to """ use ''.join(seq) instead """? As for the
importance of "easy and generic", I would agree -- I'd FAR rather
have 'sum' be able to handle _well_ sums of any kinds -- but Guido
doesn't, so far. If you have arguments that might convince him, make
then. But 'reduce' just isn't a candidate, as the "easy" part simply
cannot apply.

However, It may be better to give reduce() some nice notation.


"Syntax sugar causes cancer of the semicolon". The nicer the
notation, the more superficially attractive you make it to use
a construct with unacceptable performance implications, the
craziest and most terrible the performance trap you're building
for the poor unwary users. I think it's an appalling direction
to want to move the language in.
Alex

Jul 18 '05 #150

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

181
8914
by: Tom Anderson | last post by:
Comrades, During our current discussion of the fate of functional constructs in python, someone brought up Guido's bull on the matter: http://www.artima.com/weblogs/viewpost.jsp?thread=98196 He says he's going to dispose of map, filter, reduce and lambda. He's going to give us product, any and all, though, which is nice of him.
16
2187
by: clintonG | last post by:
At design-time the application just decides to go boom claiming it can't find a dll. This occurs sporadically. Doing a simple edit in the HTML for example and then viewing the application has caused the application to go boom. Sometimes the page will compile and run using F5 and others not. So I do the web search tango looking around the blogs and the tuts and determine I should go into Temporary ASP.NET Files and delete the directory...
1
2805
by: mai | last post by:
Hi everyone, i'm trying to exhibit FIFO anomaly(page replacement algorithm),, I searched over 2000 random strings but i couldnt find any anomaly,, am i I doing it right?,, Please help,,,The following is the code,, #include <iostream> #include <string> #include <stdio.h> #include <stdlib.h> #include <ctime // For time()
7
3505
by: cnb | last post by:
This must be because of implementation right? Shouldn't reduce be faster since it iterates once over the list? doesnt sum first construct the list then sum it? ----------------------- reduce with named function: 37.9864357062 reduce with nested, named function: 39.4710288598 reduce with lambda: 39.2463927678 sum comprehension: 25.9530121845
0
9479
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10311
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9942
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8967
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7492
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6733
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5378
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5509
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4043
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.