Pre-PEP: Dictionary accumulator methods

Raymond Hettinger

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

The rationale is to replace the awkward and slow existing idioms for dictionary
based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)

In simplest form, those two statements would now be coded more readably as:

d.count(key)
d.appendlist(key, value)

In their multi-value forms, they would now be coded as:

d.count(key, qty)
d.appendlist(key, *values)

The error messages returned by the new methods are the same as those returned by
the existing idioms.

The get() method would continue to exist because it is useful for applications
other than accumulation.

The setdefault() method would continue to exist but would likely not make it
into Py3.0.
PROBLEMS BEING SOLVED
---------------------

The readability issues with the existing constructs are:

* They are awkward to teach, create, read, and review.
* Their wording tends to hide the real meaning (accumulation).
* The meaning of setdefault() 's method name is not self-evident.

The performance issues with the existing constructs are:

* They translate into many opcodes which slows them considerably.
* The get() idiom requires two dictionary lookups of the same key.
* The setdefault() idiom instantiates a new, empty list prior to every call.
* That new list is often not needed and is immediately discarded.
* The setdefault() idiom requires an attribute lookup for extend/append.
* The setdefault() idiom makes two function calls.

The latter issues are evident from a disassembly:

dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec')) 1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (get)
6 LOAD_NAME 2 (key)
9 LOAD_CONST 0 (0)
12 CALL_FUNCTION 2
15 LOAD_NAME 3 (qty)
18 BINARY_ADD
19 LOAD_NAME 0 (d)
22 LOAD_NAME 2 (key)
25 STORE_SUBSCR
26 LOAD_CONST 1 (None)
29 RETURN_VALUE
dis(compile('d.setdefault(key, []).extend(values)', '', 'exec'))

1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (setdefault)
6 LOAD_NAME 2 (key)
9 BUILD_LIST 0
12 CALL_FUNCTION 2
15 LOAD_ATTR 3 (extend)
18 LOAD_NAME 4 (values)
21 CALL_FUNCTION 1
24 POP_TOP
25 LOAD_CONST 0 (None)
28 RETURN_VALUE

In contrast, the proposed methods use only a single attribute lookup and
function call, they use only one dictionary lookup, they use very few opcodes,
and they directly access the accumulation functions, PyNumber_Add() or
PyList_Append(). IOW, the performance improvement matches the readability
improvement.
ISSUES
------

The proposed names could possibly be improved (perhaps tally() is more active
and clear than count()).

The appendlist() method is not as versatile as setdefault() which can be used
with other object types (perhaps for creating dictionaries of dictionaries).
However, most uses I've seen are with lists. For other uses, plain Python code
suffices in terms of speed, clarity, and avoiding unnecessary instantiation of
empty containers:

if key not in d:
d.key = {subkey:value}
else:
d[key][subkey] = value

Raymond Hettinger

Jul 18 '05

Subscribe Post Reply

125

7055

Colin J. Williams

Paul Rubin wrote:

"El Pitonero" <pi******@gmail.com> writes:
What about no name at all for the scalar case:

a['hello'] += 1
a['bye'] -= 2

I like this despite the minor surprise that it works even when
a['hello'] is uninitialized.

+1
and if the value is a list:
a['hello']= [1, 2, 3]
a['hello']+= [4] # adding the brackets is a lot simpler than
typing append or extend.

Any user is free to add his/her own subclass to handle defaults.

Colin W.

Jul 18 '05 #101

Roose

> Another option with no storage overhead which goes part way to reducing

the awkwardness would be to provide a decorator class accessible through
dict. The decorator class would take a value or function to be used as
the default, but apart from __getitem__ would simply forward all other
methods through to the underlying dictionary.

I'm not sure I like the decorator -- I would never use that flexibility to
have more than one default. I can't come up with any reason to ever use
that.

I think it works best as a simple subclass:

class DefaultDict(dict):

def __init__(self, default, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
self.default = default

def __getitem__(self, key):
return self.setdefault(key, copy.copy(self.default))

d = DefaultDict(0)
for x in [1, 3, 1, 2, 2, 3, 3, 3, 3]:
d[x] += 1
pprint(d)

d = DefaultDict([])
for i, x in enumerate([1, 3, 1, 2, 2, 3, 3, 3, 3]):
d[x].append(i)
pprint(d)

Output:

{1: 2, 2: 2, 3: 5}
{1: [0, 2], 2: [3, 4], 3: [1, 5, 6, 7, 8]}

Jul 18 '05 #102

Jeremy Bowers

On Sat, 19 Mar 2005 20:07:40 -0800, Kay Schluehr wrote:

It is bad OO design, George. I want to be a bit more become more
specific on this and provide an example:

Having thought about this for a bit, I agree it is a powerful
counter-argument and in many other languages I'd consider this a total win.

But this is Python, and frankly, I've overridden dict more than once and
violated the Liskov substitution principle without thinking twice. (Once,
oh yes, but not twice.) Of course, all the code was under my control then.

I think the tipping point for me depends on how the interfaces in Python
are going to be implemented, which I haven't dug into. If the dict class
gets an interface definition, can I subclass from dict and "cancel" (or
some other term) the interface I inherited?

If so, then this might still be OK, although if interfaces aren't going to
confuse newbies enough, wait 'till we try to explain that their code is
blowing up because they forgot to "cancel" their interface, and they
can't *really* pass their subclass in to something expecting a dict
interface. If you *can't* cancel or downgrade the interface, then I'd say
this argument is still good; dict should be kept minimal and this should
go in a subclass.

Jul 18 '05 #103

Matteo Dell'Amico

Kay Schluehr wrote:

I think that's because you have to instantiate a different object for
each different key. Otherwise, you would instantiate just one list as
a default value for *all* default values. Or the default value will be copied, which is not very hard either or
type(self._default)() will be called. This is all equivalent and it
does not matter ( except for performance reasons ) which way to go as
long only one is selected.

I don't like it very much... it seems too implicit to be pythonic. Also,
it won't work with non-copyable objects, and type(42)() = 0, and getting
0 when the default is 42 looks very strange. I prefer the explicit "give
me a callable" approach.
If the dict has a fixed semantics by applying defaultValue() and it
returns defaults instead of exceptions whenever a key is missing i.e.
behavioural invariance the client of the dict has nothing to worry
about, hasn't he?
For idioms like d[foo].append('blah') to work properly, you'd have to
set the default value every time you access a variable. It can be really
strange to fill up memory only by apparently accessing values.
I suspect the proposal really makes sense only if the dict-values are
of the same type. Filling it with strings, custom objects and other
stuff and receiving 0 or [] or '' if a key is missing would be a
surprise - at least for me. Instantiating dict the way I proposed
indicates type-guards! This is the reason why I want to delay this
issue and discuss it in a broader context. But I'm also undecided.
Guidos Python-3000 musings are in danger to become vaporware. "Now is
better then never"... Therefore +0.

Having duck-typing, we can have things that have common interface but no
common type. For instance, iterables. I can imagine a list of iterables
of different types, and a default value of maybe [] or set([]).

--
Ciao,
Matteo

Jul 18 '05 #104

Alexander Schmolck

Beni Cherniavsky <cb**@users.sf.net> writes:

The relatively recent "improvement" of the dict constructor signature
(``dict(foo=bar,...)``) obviously makes it impossible to just extend the
constructor to ``dict(default=...)`` (or anything else for that matter) which
would seem much less ad hoc. But why not use a classmethod (e.g.
``d=dict.withdefault(0)``) then?
You mean giving a dictionary a default value at creation time, right?

Yes. But creating a defaultdict type with aliased content to the original
dict would also be fine by me.
Such a dictionary could be used very easily, as in <gasp>Perl::

foreach $word ( @words ) {
$d{$word}++; # default of 0 assumed, simple code!
}

</gasp>. You would like to write::

d = dict.withdefault(0) # or something
for word in words:
d[word] += 1 # again, simple code!

I agree that it's a good idea but I'm not sure the default should be specified
at creation time. The problem with that is that if you pass such a dictionary
into an unsuspecting function, it will not behave like a normal dictionary.
Have you got a specific example in mind?

Code that needlessly relies on exceptions for "normal operation" is rather
perverse IMO and I find it hard to think of other examples.
Also, this will go awry if the default is a mutable object, like ``[]`` - you
must create a new one at every access (or introduce a rule that the object is
copied every time, which I dislike).
I think copying should on by default for objects that are mutable (and
explicitly selectable via ``.withdefault(bar,copy=False)``).

Python of course doesn't have an interface to query whether something is
mutable or not (maybe something that'll eventually be fixed), but hashable
might be a first approximation. If that's too dodgy, most commonly the value
will be a builtin type anyway, so copy by default with "efficient
implementation" (i.e. doing nothing) for ints, tuples etc. ought to work fine
in practice.
And there are cases where in different points in the code operating on the
same dictionary you need different default values.
The main problem here is that the obvious .setdefault is already taken to
misnome something else. Which I guess strengthens the point for some kind of
proxy object.
So perhaps specifying the default at every point of use by creating a proxy is
cleaner::

d = {}
for word in words:
d.withdefault(0)[word] += 1
Of course, you can always create the proxy once and still pass it into an
unsuspecting function when that is actually what you mean.
Yup (I'd presumably prefer that second option for the above code).

How should a dictionary with a default value behave (wheter inherently or a
proxy)?

- ``d.__getattr__(key)`` never raises KeyError for missing keys - instead it
returns the default value and stores the value as `d.setdefult()` does.
This is needed for make code like::

d.withdefault([])[key].append(foo)

to work - there is no call of `d.__setattr__()`, so `d.__getattr__()` must
have stored it.
I'm confused -- are you refering to __getitem__/__setitem__? Even then I don't
get what you mean: __getitem__ obviously works differently, but that would be
the whole point.

- `d.__setattr__()` and `d.__delattr__()` behave normally.
s/attr/item/ and agreed.

- Should ``key in d`` return True for all keys?
No. See below.
It is desiarable to have *some* way to know whether a key is really
present. But if it returns False for missing keys, code that checks ``key
in d`` will behave differently from normally equivallent code that uses
try..except. If we use the proxy interface, we can always check on the
original dictionary object, which removes the problem.

- ``d.has_key(key)`` must do whatever we decide ``key in d`` does.

- What should ``d.get(key, [default])`` and ``d.setdefault(key, default)``
do? There is a conflict between the default of `d` and the explicitly given
default. I think consistency is better and these should pretend that `key`
is always present. But either way, there is a subtle problem here.
..setdefault ought to trump defaultdict's default. I feel that code that
operated without raising an KeyError on normal dicts should also operate the
same way on defaultdicts were possible. I'd also suspect that if you're
effectively desiring to override .setdefault's default you're up to something
dodgy.
- Of course `iter(d)`, `d.items()` and the like should only see the keys
that are really present (the alternative inventing an infinite amount of
items out of the blue is clearly bogus).

If the idea that the default should be specified in every operation (creating
a proxy) is accepted, there is a simpler and more fool-proof solution: the
ptoxy will not support anything except `__getitem__()` and `__setitem__()` at
all. Use the original dictionary for everything else. This prevents subtle
ambiguities.
Yes, that sounds like a fine solution to me -- if something goes wrong one is
at least likely to get an error immediately.

However the name .withdefault is then possibly a bit misleading -- but
..proxywithdefault is maybe a bit too long...

BTW, this scheme could also be extended to other collection types (lists and
sets, e.g.). e.g. ``l = []; l.proxywithdefault(0)[2] = 1;l `` => ``[0,0,1]``.

Whilst I think such behavior is asking for trouble if it's enabled by default
(as in Perl and Ruby, IIRC) and also lacks flexibility (as you can't specify
the fill value), occasionally it would be quite handy and I see little harm in
providing it when it's explicitly asked for.

Or, for the first and most common case, just a bag type?

Too specialized IMHO. You want a dictionary with any default anyway. If you
have that, what will be the benefit of a bag type?

I more thought of the bag type as an alternative to having a dictionary with
default value (the counting case occurs most frequently and conceptually it is
well modelled by a bag).

And I don't feelt that a bag type is too specialized (plausibly too
specialized for a builtin -- but not for something provided by a module). Just
because there is natural tendency to shoehorn everything into the
bread-and-butter types of some language (dicts and lists in python's case),
doesn't mean one can't overdo it, because eventually one will end up with a
semantic mess.

Anyway my current preferences would be a proxy with default value and only
__getitem__ and __setitem__ methods -- as you suggested above, but possibly
also for other collection types than just dict.

'as

Jul 18 '05 #105

Francis Girard

Hi,

I really do not like it. So -1 for me. Your two methods are very specialized
whereas the dict type is very generic. Usually, when I see something like
this in code, I can smell it's a patch to overcome some shortcomings on a
previous design, thereby making the economy of re-designing. Simply said,
that's bad programming.

After that patch to provide a solution for only two of the more common
use-cases, you are nonetheless stucked with the old solution for all the
other use-cases (what if the value type is another dictionary or some
user-made class ?).

Here's an alternate solution which I think answers all of the problems you
mentionned while being generic.

=== BEGIN SNAP

def update_or_another_great_name(self, key, createFunc, updtFunc):
try:
self[key] <<<= updtFunc(self[key])
## This is "slow" with Python "=" since the key has to be searched
## twice But the new built-in method just has to update the value the
## first time the key is found. Therefore speed should be ok.
return True
except KeyError:
self[key] = createFunc()
return false

## Now your two specialized methods can be easily written as :

## A built-in should be provided for this (if not already proposed) :
def identical(val):
return val

def count(self, key, qty=1):
self.update_or_another_great_name(key, identical,
partial(operator.add, qty))
## partial is coming from : http://www.python.org/peps/pep-0309.html
## Using only built-in function (assuming "identical") as arguments makesit
## ok for speed (I guess).

def appendlist(self, key, *values):
self.update_or_another_great_name(key,
partial(list, values),
partial(ListType.extend, X = values))
## The first "partial" usage here is an abuse just to make sure that the
## list is not actually constructed before needed. It should work.
## The second usage is more uncertain as we need to bind the arguments from
## the right. Therefore I have to use the name of the parameter and I am not
## sure if there's one. As this list is very prolific, someone might havean
## idea on how to improve this.

=== END SNAP

By using only built-in constructs, this should be fast enough. Otherwise,
optimizing these built-ins is a much more clean and sane way of thinking then
messing the API with ad-hoc propositions.

Reviewing the problems you mention :

The readability issues with the existing constructs are:

* They are awkward to teach, create, read, and review.
The method update_or_another_great_name is easy to understand, I think. Butit
might not always be easy to use it efficiently with built-ins. But this is
always the case. "Recipees" can be added to show how to efficiently use the
method.
* Their wording tends to hide the real meaning (accumulation).
Solved.
* The meaning of setdefault() 's method name is not self-evident.
Solved.

The performance issues with the existing constructs are:

* They translate into many opcodes which slows them considerably.
I really don't know what will be the outcome of the solution I propose. I
certainly do not know anything about how my Python code translates into
opcodes.
* The get() idiom requires two dictionary lookups of the same key.
Solved
* The setdefault() idiom instantiates a new, empty list prior to every
Solved
call. * That new list is often not needed and is immediately discarded.
Solved
* The setdefault() idiom requires an attribute lookup for extend/append.
Solved
* The setdefault() idiom makes two function calls.
Solved

And perhaps, what you say here is also true for your two special use-cases :
For other
uses, plain Python code suffices in terms of speed, clarity, and avoiding
unnecessary instantiation of empty containers:

if key not in d:
d.key = {subkey:value}
else:
d[key][subkey] = value

Much better than adding special cases on a generic class. Special cases always
demultiply and if we open the door ....

Regards,

Francis Girard
Le samedi 19 Mars 2005 02:24, Raymond Hettinger a Ã©critÂ*: I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

The rationale is to replace the awkward and slow existing idioms for
dictionary based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)

In simplest form, those two statements would now be coded more readably as:

d.count(key)
d.appendlist(key, value)

In their multi-value forms, they would now be coded as:

d.count(key, qty)
d.appendlist(key, *values)

The error messages returned by the new methods are the same as those
returned by the existing idioms.

The get() method would continue to exist because it is useful for
applications other than accumulation.

The setdefault() method would continue to exist but would likely not make
it into Py3.0.
PROBLEMS BEING SOLVED
---------------------

The readability issues with the existing constructs are:

* They are awkward to teach, create, read, and review.
* Their wording tends to hide the real meaning (accumulation).
* The meaning of setdefault() 's method name is not self-evident.

The performance issues with the existing constructs are:

* They translate into many opcodes which slows them considerably.
* The get() idiom requires two dictionary lookups of the same key.
* The setdefault() idiom instantiates a new, empty list prior to every
call. * That new list is often not needed and is immediately discarded.
* The setdefault() idiom requires an attribute lookup for extend/append.
* The setdefault() idiom makes two function calls.

The latter issues are evident from a disassembly:
dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec'))
1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (get)
6 LOAD_NAME 2 (key)
9 LOAD_CONST 0 (0)
12 CALL_FUNCTION 2
15 LOAD_NAME 3 (qty)
18 BINARY_ADD
19 LOAD_NAME 0 (d)
22 LOAD_NAME 2 (key)
25 STORE_SUBSCR
26 LOAD_CONST 1 (None)
29 RETURN_VALUE
dis(compile('d.setdefault(key, []).extend(values)', '', 'exec'))

1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (setdefault)
6 LOAD_NAME 2 (key)
9 BUILD_LIST 0
12 CALL_FUNCTION 2
15 LOAD_ATTR 3 (extend)
18 LOAD_NAME 4 (values)
21 CALL_FUNCTION 1
24 POP_TOP
25 LOAD_CONST 0 (None)
28 RETURN_VALUE

In contrast, the proposed methods use only a single attribute lookup and
function call, they use only one dictionary lookup, they use very few
opcodes, and they directly access the accumulation functions,
PyNumber_Add() or PyList_Append(). IOW, the performance improvement
matches the readability improvement.
ISSUES
------

The proposed names could possibly be improved (perhaps tally() is more
active and clear than count()).

The appendlist() method is not as versatile as setdefault() which can be
used with other object types (perhaps for creating dictionaries of
dictionaries). However, most uses I've seen are with lists. For other
uses, plain Python code suffices in terms of speed, clarity, and avoiding
unnecessary instantiation of empty containers:

if key not in d:
d.key = {subkey:value}
else:
d[key][subkey] = value

Raymond Hettinger

Jul 18 '05 #106

Mike Rovner

Reinhold Birkenfeld wrote:

I don't quite understand that. Which dict item are you extending? Don't
you need something like

dl[key].append("word")

Rigth. It was just a typo on my part. Thanks for fixing.

Mike

Jul 18 '05 #107

Aahz

In article <11**********************@f14g2000cwb.googlegroups .com>,
Michele Simionato <mi***************@gmail.com> wrote:

I am surprised nobody suggested we put those two methods into a
separate module (say dictutils or even UserDict) as functions:

from dictutils import tally, listappend

tally(mydict, key)
listappend(mydict, key, value)

That seems like a reasonable compromise.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR

Jul 18 '05 #108

David Eppstein

In article <d1**********@panix2.panix.com>, aa**@pythoncraft.com (Aahz)
wrote:

I am surprised nobody suggested we put those two methods into a
separate module (say dictutils or even UserDict) as functions:

from dictutils import tally, listappend

tally(mydict, key)
listappend(mydict, key, value)

That seems like a reasonable compromise.

The more messages I see on this thread, the more I think adding a
different new method for each commonly used kind of update is the wrong
solution.

We already have methods that work pretty well and, I think, read better
than the new methods:
mydict[key] += 1
mydict[key].append(value)
The problem is merely that they don't work when key is missing, so we
need to resort to setdefault circumlocutions instead. A better solution
seems to be the one I've seen suggested here several times, of changing
the dict's behavior so that the setdefault is automatic whenever trying
to access a missing key. If this would be in a separate module or
separate subclass of dict, so much the better.

--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/

Jul 18 '05 #109

Ron

On Sun, 20 Mar 2005 15:14:22 -0800, David Eppstein
<ep******@ics.uci.edu> wrote:

In article <d1**********@panix2.panix.com>, aa**@pythoncraft.com (Aahz)
wrote:
>I am surprised nobody suggested we put those two methods into a
>separate module (say dictutils or even UserDict) as functions:
>
>from dictutils import tally, listappend
>
>tally(mydict, key)
>listappend(mydict, key, value)

That seems like a reasonable compromise.

The more messages I see on this thread, the more I think adding a
different new method for each commonly used kind of update is the wrong
solution.

We already have methods that work pretty well and, I think, read better
than the new methods:
mydict[key] += 1
mydict[key].append(value)
The problem is merely that they don't work when key is missing, so we
need to resort to setdefault circumlocutions instead. A better solution
seems to be the one I've seen suggested here several times, of changing
the dict's behavior so that the setdefault is automatic whenever trying
to access a missing key. If this would be in a separate module or
separate subclass of dict, so much the better.

I think that the setdefault behavior needs to be done on an per
application basis because whose to say what default is best?.

With a preset default mode, it then becomes possible to inadvertently
create default values that will cause problems without knowing it. So
then we have to remember to change the setdefault value to None or
null to avoid problems. Ouch!

Also pythons normal behavior for retrieving objects that are not
defined is to give an error. So having dictionaries that auto
defaults to a mode that doesn't behave that way is inconsistent with
the rest of the language.

Yet, I'm all for the creation of specialized containers in a standard
module! :) Then we can have string dicts, and int dicts, and card
dicts, account dicts, etc, as well as specialized lists. Call them
'smart containers'. But they should not be built into the base class.

Ron

Jul 18 '05 #110

haraldarminmassa

Raymond,

I am +1 for both suggestions, tally and appendlist.

Extended:

Also, in all of my code base, I've not run across a single opportunity to use
something like unionset(). This is surprising because I'm the set() author and
frequently use set based algorithms. Your example was a good one and I can
also image a graph represented as a dictionary of sets. Still, I don't mind
writing out the plain Python for this one if it only comes up once in a blue
moon.

I am more than sure you are right about this. But, please keep in mind
that you and we all have come very, very accustomed to using lists for
everything and the kitchen sink in Python.

Lists where there from the beginning of Python, and even before the
birth of Python; very powerfull, well implemented and theoretically
well founded datastructures - I heared there is a whole language based
on list processing. *pun intended*

sets on the other hand --- I know, in mathematics they have a deep,
long history. But in programming? Yeah, in SQL and ABAP/4 basically
you are doing set operations on every join. But its rather uncommon to
call it set.

With 2.3 Python grew a set module. And, in ONLY ONE revision it got
promoted to a buildin type - a honour only those who read c.l.p.d.
regualary can value correctly.

And sets are SO NATURALLY for a lot of problems ... I never thought of
replacing my "list in dict" constructs with sets before, BUT ....
there are 90% of problem domains where order is not important, AND
fast membership testing is a unique sales point.

So please for best impressions: let us have a look at our code, where
we use the
dict.setdefault(key,[]).append() idiom, where it could be replaced to
a better effectivity with dict.setdefault(key,set()).add()

If it is less than 60%, forget it. If it is more....

Harald

Jul 18 '05 #111

AndrewN

Raymond Hettinger wrote:

I would like to get everyone's thoughts on two new dictionary methods:
def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

-0.9

Not impressed, they feel too specific for being builtin dictionary
methods and give the impression of just trying to save a few lines here
and there. I don't feel the names convey the functionality of the
methods either.

I know there's the speed argument but I'd rather not have these on the
dict at all.

+0.1

I sort of feel a slight need for this. But where would you stop? What
if people decrement lots? what if next there's a need for division? How
would you determine how you add the item to the key if it already
exists? In a general way:

mydict.set(key, value=None, default=None, how=operator.setitem)

This feels slightly better as it's not tied down to what sort of item
you're setting. But:

for word in words:
mydict.set(word, 1, 0, operator.add)

I dunno, feels a bit verbose maybe.
The setdefault() method would continue to exist but would likely not make it into Py3.0.
I agree that setdefault is wart though.

And for dict.default = value:

(Quoth RON):

"""With a preset default mode, it then becomes possible to
inadvertently
create default values that will cause problems without knowing it. So
then we have to remember to change the setdefault value to None or
null to avoid problems. Ouch!"""

Agreed, -1 there then.
PROBLEMS BEING SOLVED
---------------------

The readability issues with the existing constructs are:

* They are awkward to teach, create, read, and review.
* Their wording tends to hide the real meaning (accumulation).
* The meaning of setdefault() 's method name is not self-evident.

I feel this only really applies for setdefault (which I wouldn't be
sorry to see the back of). And your examples:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)

Would better be written in a long-handed fashion anyway as per the
implementations were suggested:

try:
d[key] += qty
except KeyError:
d[key] = 0

Yeah, yeah, I know, speed. But not like this. Sorry.

Jul 18 '05 #112

Michele Simionato

FWIW, here is my take on the defaultdict approach:

def defaultdict(defaultfactory, dictclass=dict):
class defdict(dictclass):
def __getitem__(self, key):
try:
return super(defdict, self).__getitem__(key)
except KeyError:
return self.setdefault(key, defaultfactory())
return defdict

d = defaultdict(int)()
d["x"] += 1
d["x"] += 1
d["y"] += 1
print d

d = defaultdict(list)()
d["x"].append(1)
d["x"].append(2)
d["y"].append(1)
print d

Michele Simionato

Jul 18 '05 #113

George Sakkis

"Michele Simionato" <mi***************@gmail.com> wrote:

FWIW, here is my take on the defaultdict approach:

def defaultdict(defaultfactory, dictclass=dict):
class defdict(dictclass):
def __getitem__(self, key):
try:
return super(defdict, self).__getitem__(key)
except KeyError:
return self.setdefault(key, defaultfactory())
return defdict

d = defaultdict(int)()
d["x"] += 1
d["x"] += 1
d["y"] += 1
print d

d = defaultdict(list)()
d["x"].append(1)
d["x"].append(2)
d["y"].append(1)
print d

Michele Simionato

Best solution so far. If it wasn't for the really bad decision to add the dict(**kwargs)
constructor, I'd love to see something like
d = dict(valType=int)
d["x"] += 1

George

Jul 18 '05 #114

Evan Simpson

Raymond Hettinger wrote:

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):

def appendlist(self, key, *values):

-1.0

When I need these, I just use subtype recipes. They seem way too
special-purpose for the base dict type.

class Counter(dict):
def __iadd__(self, other):
if other in self:
self[other] += 1
else:
self[other] = 1
return self

c = Counter()
for item in items:
c += item

class Collector(dict):
def add(self, key, value):
if key in self:
self[key].append(value)
else:
self[key] = [value]

c = Collector()
for k,v in items:
c.add(k, v)

Cheers,

Evan @ 4-am

Jul 18 '05 #115

Greg Ewing

Michele Simionato wrote:

def defaultdict(defaultfactory, dictclass=dict):
class defdict(dictclass):
def __getitem__(self, key):
try:
return super(defdict, self).__getitem__(key)
except KeyError:
return self.setdefault(key, defaultfactory())
return defdict

That looks really nice!

I'd prefer a more elegant name than 'defaultdict', though.
How about 'table'?

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg

Jul 18 '05 #116

Roose

I agree -- I find myself NEEDING sets more and more. I use them with this
idiom quite often. Once they become more widely available (i.e. Python 2.3
is installed everywhere), I will use them almost as much as lists I bet.

So any solution IMO needs to at least encompass sets. But preferably
something like the Dict with Default approach which encompasses all
possibilities.

Roose

I am more than sure you are right about this. But, please keep in mind
that you and we all have come very, very accustomed to using lists for
everything and the kitchen sink in Python.

Lists where there from the beginning of Python, and even before the
birth of Python; very powerfull, well implemented and theoretically
well founded datastructures - I heared there is a whole language based
on list processing. *pun intended*

sets on the other hand --- I know, in mathematics they have a deep,
long history. But in programming? Yeah, in SQL and ABAP/4 basically
you are doing set operations on every join. But its rather uncommon to
call it set.

With 2.3 Python grew a set module. And, in ONLY ONE revision it got
promoted to a buildin type - a honour only those who read c.l.p.d.
regualary can value correctly.

And sets are SO NATURALLY for a lot of problems ... I never thought of
replacing my "list in dict" constructs with sets before, BUT ....
there are 90% of problem domains where order is not important, AND
fast membership testing is a unique sales point.

So please for best impressions: let us have a look at our code, where
we use the
dict.setdefault(key,[]).append() idiom, where it could be replaced to
a better effectivity with dict.setdefault(key,set()).add()

If it is less than 60%, forget it. If it is more....

Harald

Jul 18 '05 #117

Steve Holden

Greg Ewing wrote:

Michele Simionato wrote:
def defaultdict(defaultfactory, dictclass=dict):
class defdict(dictclass):
def __getitem__(self, key):
try:
return super(defdict, self).__getitem__(key)
except KeyError:
return self.setdefault(key, defaultfactory())
return defdict

That looks really nice!

I'd prefer a more elegant name than 'defaultdict', though.
How about 'table'?

By obvious analogy with Icon (where the dictionary-like object was
created with the option of a default value) this gets my +1.

regards
Steve
--
Meet the Python developers and your c.l.py favorites March 23-25
Come to PyCon DC 2005 http://www.pycon.org/
Steve Holden http://www.holdenweb.com/

Jul 18 '05 #118

bearophileHUGS

R.H.:

The setdefault() method would continue to exist but
would likely not make it into Py3.0.

I agee to remove the setdefault.

I like the new count method, but I don't like the appendlist method,
because I think it's too much specilized.

I too use sets a lot; recently I've suggested to add a couple of set
methods to dicts (working on the keys): intersection() and
difference().

Bearophile

Jul 18 '05 #119

Christos TZOTZIOY Georgiou

On Sat, 19 Mar 2005 01:24:57 GMT, rumours say that "Raymond Hettinger"
<vz******@verizon.net> might have written:

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

Both are useful and often needed, so I am +1 on adding such
functionality. However, I am -0 on adding methods to dict.

I believe BJörn Lindqvist suggested a subtype of dict instead, which
feels more right. I believe this is a type of 'bag' collection, and it
could go to the collections module.

The default argument 99% of the time is the same for all calls to
setdefault of a specific instance. So I would suggest that the default
argument should be an attribute of the bag instance, given at instance
creation. And since unbound methods are going to stay, we can use the
accumulator method as a default argument (ie int.__add__ or list.append)

Based on the above, I would suggest something like the following
implementation, waiting criticism on names, algorithm or applicability:)

..class bag(dict):
.. def __init__(self, accumulator=int.__add__):
.. self.accumulator = accumulator
..
.. # refinement needed for the following
.. self.accu_class = accumulator.__objclass__
..
.. # if there was an exception, probably the accumulator
.. # provided was not appropriate
..
.. def accumulate(self, key, value):
.. try:
.. old_value = self[key]
.. except KeyError:
.. self[key] = old_value = self.accu_class()
.. new_value = self.accumulator(old_value, item)
..
.. # and this needs refinement
.. if new_value is not None: # method of immutable object
.. self[key] = new_value

This works ok for int.__add__ and list.append.

PS I wrote these more than 36 hours ago, and before having read the
so-far downloaded messages of the thread. I kept on reading and
obviously others thought the same too (default argument at
initialisation).

What the heck, Bengt at least could like the class method idea :)
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...

Jul 18 '05 #120

Steven Bethard

Michele Simionato wrote:

I am surprised nobody suggested we put those two methods into a
separate module (say dictutils or even UserDict) as functions:

from dictutils import tally, listappend

tally(mydict, key)
listappend(mydict, key, value)

Sorry to join the discussion so late (I've been away from my email for a
week) but this was exactly my reaction too. In fact, I have a
'dicttools' module with similar methods in it:

# like "tally" but without ability to set increment
def counts(iterable, key=None):
result = {}
for item in iterable:
# apply key function if necessary
if key is None:
k = item
else:
k = key(item)
# increment key's count
try:
result[k] += 1
except KeyError:
result[k] = 1
return result

# like "listappend" but with the option to use key and value funcs
def groupby(iterable, key=None, value=None):
result = {}
for item in iterable:
# apply key function if necessary
if key is None:
k = item
else:
k = key(item)
# apply value function if necessary
if value is None:
v = item
else:
v = value(item)
# append value to key's list
try:
result[k].append(v)
except KeyError:
result[k] = [v]
return result

These two functions have covered all my use cases for "tally" and
"listappend" -- I always want to perform the increments or list appends
over a sequence of values, so having functions that operate on sequences
covers all my needs.

STeVe

Jul 18 '05 #121

Steven Bethard

Michele Simionato wrote:

FWIW, here is my take on the defaultdict approach:

def defaultdict(defaultfactory, dictclass=dict):
class defdict(dictclass):
def __getitem__(self, key):
try:
return super(defdict, self).__getitem__(key)
except KeyError:
return self.setdefault(key, defaultfactory())
return defdict

d = defaultdict(int)()
d["x"] += 1
d["x"] += 1
d["y"] += 1
print d

d = defaultdict(list)()
d["x"].append(1)
d["x"].append(2)
d["y"].append(1)
print d

Michele Simionato

Very pretty! =)

It does mean, however, that if the defaultfactory function takes any
arguments, you have to wrap the function to make this work. I'd
probably prefer something like:

py> def defaultdict(*args, **kwargs):
.... defaultfactory, args = args[0], args[1:]
.... class defdict(dict):
.... def __getitem__(self, key):
.... try:
.... return super(defdict, self).__getitem__(key)
.... except KeyError:
.... return self.setdefault(key, defaultfactory(
.... *args, **kwargs))
.... return defdict
....
py> d = defaultdict(int)()
py> d['x'] += 1
py> d['x'] += 1
py> d['y'] += 1
py> d
{'y': 1, 'x': 2}
py> d = defaultdict(list, [0])()
py> d['x'].append(1)
py> d['x'].append(2)
py> d['y'].append(1)
py> d
{'y': [0, 1], 'x': [0, 1, 2]}

That said, I still think a dictools module is a better solution to this
problem.

STeVe

Jul 18 '05 #122

Jack Diederich

On Sun, Mar 27, 2005 at 02:20:33PM -0700, Steven Bethard wrote:

Michele Simionato wrote:
I am surprised nobody suggested we put those two methods into a
separate module (say dictutils or even UserDict) as functions:

from dictutils import tally, listappend

tally(mydict, key)
listappend(mydict, key, value)

Sorry to join the discussion so late (I've been away from my email for a
week) but this was exactly my reaction too. In fact, I have a
'dicttools' module with similar methods in it:

<snipped>

I like this approach, it will give us a chance to test & tweak the signature
before hanging it off dict proper. It feels similar to the strings module
to str transition, sets module to set builtin, and itertools module to iter
transition.

itertools to iter transition, huh? I slipped that one in, I mentioned it
to Raymond at PyCon and he didn't flinch. It would be nice not to have to
sprinkle 'import itertools as it' in code. iter could also become a type
wrapper instead of a function, so an iter instance could be a wrapper that
figures out whether to call .next or __getitem__ depending on it's argument.
for item in iter(mylist).imap:
print item
or
for item in iter.imap(mylist):
print item

I haven't digested that too much, just a thought.

-jackdied

Jul 18 '05 #123

Michael Spencer

Jack Diederich wrote:

On Sun, Mar 27, 2005 at 02:20:33PM -0700, Steven Bethard wrote:
Michele Simionato wrote:
I am surprised nobody suggested we put those two methods into a
separate module (say dictutils or even UserDict) as functions:
from dictutils import tally, listappend

tally(mydict, key)
listappend(mydict, key, value)

Sorry to join the discussion so late (I've been away from my email for a
week) but this was exactly my reaction too. In fact, I have a
'dicttools' module with similar methods in it:

<snipped>

I like this approach, it will give us a chance to test & tweak the signature
before hanging it off dict proper. It feels similar to the strings module
to str transition, sets module to set builtin, and itertools module to iter
transition.

itertools to iter transition, huh? I slipped that one in, I mentioned it
to Raymond at PyCon and he didn't flinch. It would be nice not to have to
sprinkle 'import itertools as it' in code.

Not that that is such a pain, but accessing itertools functions from an
"outlying" module seems somewhat incompatible with putting iterative approaches
center stage.

iter could also become a type wrapper instead of a function, so an iter instance could be a wrapper that
figures out whether to call .next or __getitem__ depending on it's argument.
for item in iter(mylist).imap:
print item
Also opening the door for iter to be subclassed. For example, could listiter
become a specialization of iter - one that uses __getitem__ and which could
allow reverse iteration?
or
for item in iter.imap(mylist):
print item

I haven't digested that too much, just a thought.

-jackdied

A very good thought IMO.

Michael

Jul 18 '05 #124

Greg Ewing

Steven Bethard wrote:

py> def defaultdict(*args, **kwargs):
... defaultfactory, args = args[0], args[1:]

which can be written more succinctly as

def defaultdict(defaultfactory, *args, **kwargs):
...

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg

Jul 18 '05 #125

Steven Bethard

Greg Ewing wrote:

Steven Bethard wrote:
py> def defaultdict(*args, **kwargs):
... defaultfactory, args = args[0], args[1:]

which can be written more succinctly as

def defaultdict(defaultfactory, *args, **kwargs):
...

Not if you want to allow the defaultfactory to be called with a keyword
argument 'defaultfactory'. Compare my code:

py> def defaultdict(*args, **kwargs):
.... defaultfactory, args = args[0], args[1:]
.... print defaultfactory, args, kwargs
....
py> defaultdict(dict, defaultfactory=True)
<type 'dict'> () {'defaultfactory': True}

with the code you suggested:

py> def defaultdict(defaultfactory, *args, **kwargs):
.... print defaultfactory, args, kwargs
....
py> defaultdict(dict, defaultfactory=True)
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
TypeError: defaultdict() got multiple values for keyword argument
'defaultfactory'

Uncommon, sure, but I'd rather not rule it out if there's no need to.

STeVe

Jul 18 '05 #126

Similar topics