469,304 Members | 2,192 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,304 developers. It's quick & easy.

Pre-PEP: Dictionary accumulator methods

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

The rationale is to replace the awkward and slow existing idioms for dictionary
based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)

In simplest form, those two statements would now be coded more readably as:

d.count(key)
d.appendlist(key, value)

In their multi-value forms, they would now be coded as:

d.count(key, qty)
d.appendlist(key, *values)

The error messages returned by the new methods are the same as those returned by
the existing idioms.

The get() method would continue to exist because it is useful for applications
other than accumulation.

The setdefault() method would continue to exist but would likely not make it
into Py3.0.
PROBLEMS BEING SOLVED
---------------------

The readability issues with the existing constructs are:

* They are awkward to teach, create, read, and review.
* Their wording tends to hide the real meaning (accumulation).
* The meaning of setdefault() 's method name is not self-evident.

The performance issues with the existing constructs are:

* They translate into many opcodes which slows them considerably.
* The get() idiom requires two dictionary lookups of the same key.
* The setdefault() idiom instantiates a new, empty list prior to every call.
* That new list is often not needed and is immediately discarded.
* The setdefault() idiom requires an attribute lookup for extend/append.
* The setdefault() idiom makes two function calls.

The latter issues are evident from a disassembly:
dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec')) 1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (get)
6 LOAD_NAME 2 (key)
9 LOAD_CONST 0 (0)
12 CALL_FUNCTION 2
15 LOAD_NAME 3 (qty)
18 BINARY_ADD
19 LOAD_NAME 0 (d)
22 LOAD_NAME 2 (key)
25 STORE_SUBSCR
26 LOAD_CONST 1 (None)
29 RETURN_VALUE
dis(compile('d.setdefault(key, []).extend(values)', '', 'exec'))

1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (setdefault)
6 LOAD_NAME 2 (key)
9 BUILD_LIST 0
12 CALL_FUNCTION 2
15 LOAD_ATTR 3 (extend)
18 LOAD_NAME 4 (values)
21 CALL_FUNCTION 1
24 POP_TOP
25 LOAD_CONST 0 (None)
28 RETURN_VALUE

In contrast, the proposed methods use only a single attribute lookup and
function call, they use only one dictionary lookup, they use very few opcodes,
and they directly access the accumulation functions, PyNumber_Add() or
PyList_Append(). IOW, the performance improvement matches the readability
improvement.
ISSUES
------

The proposed names could possibly be improved (perhaps tally() is more active
and clear than count()).

The appendlist() method is not as versatile as setdefault() which can be used
with other object types (perhaps for creating dictionaries of dictionaries).
However, most uses I've seen are with lists. For other uses, plain Python code
suffices in terms of speed, clarity, and avoiding unnecessary instantiation of
empty containers:

if key not in d:
d.key = {subkey:value}
else:
d[key][subkey] = value

Raymond Hettinger
Jul 18 '05
125 6453
[El Pitonero]
Is it even necessary to use a method name?

import copy
class safedict(dict):
def __init__(self, default=None):
self.default = default
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return copy.copy(self.default)
x = safedict(0)
x[3] += 1
y = safedict([])
y[5] += range(3)
print x, y
print x[123], y[234]


safedict() and variants have been previously proposed with the name defaultdict
or some such.

For the most part, adding methods is much less disruptive than introducing a new
type.

As written out above, the += syntax works fine but does not work with append().

As written, the copy.copy() approach is dog slow but can be optimized for lists
and ints while retaining its type flexibility.

BTW, there is no need to make the same post three times.
Raymond


Jul 18 '05 #51
Kent Johnson said unto the world upon 2005-03-19 07:19:
Brian van den Broek wrote:
Raymond Hettinger said unto the world upon 2005-03-18 20:24:
I would like to get everyone's thoughts on two new dictionary methods:

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

For appendlist, I would have expected

def appendlist(self, key, sequence):
try:
self[key].extend(sequence)
except KeyError:
self[key] = list(sequence)

The original proposal reads better at the point of call when values is a
single item. In my experience this will be the typical usage:
d.appendlist(key, 'some value')

as opposed to your proposal which has to be written
d.appendlist(key, ['some value'])

The original allows values to be a sequence using
d.appendlist(key, *value_list)

Kent


Right. I did try the alternatives out and get the issue you point to.

But:

1) In my own code, cases where I'd use the proposed appendlist method
are typically cases where I'd want to add multiple items that have
already been collected in a sequence. But, since I've little coding
under my belt, I concede that considerations drawn from my experience
are not terribly probative. :-)

2) Much more important, IMHO, is that the method name `appendlist'
really does suggest it's a list that will be appended. Hence my stated
expectation. While it would make the method name longer, given the
original interface Raymond posted, I would find appendtolist more
transparent.

out-of-my-depth-ly y'rs,

Brian vdB

Jul 18 '05 #52
"Raymond Hettinger" <vz******@verizon.net> wrote:
[Jeff Epler]
Maybe something for sets like 'appendlist' ('unionset'?)


While this could work and potentially be useful, I think it is better to keep
the proposal focused on the two common use cases. Adding a third would reduce
the chance of acceptance.

Also, in all of my code base, I've not run across a single opportunity to use
something like unionset(). This is surprising because I'm the set() author and
frequently use set based algorithms. Your example was a good one and I can
also image a graph represented as a dictionary of sets. Still, I don't mind
writing out the plain Python for this one if it only comes up once in a blue
moon.


Good example. I actually have a directed graph and multigraph module that uses dictionary of sets
internally. It turns out I've used setdefault 8 times in this module alone !

George
Jul 18 '05 #53
Ah OK, I stand corrected. Whoops. I just read the web page and thought the
wrong thing, that makes sense.
Think about it. A key= function is quite a different thing. It provides a *temporary* comparison key while retaining the original value. IOW, your
re-write is incorrect:
L = ['the', 'quick', 'brownish', 'toad']
max(L, key=len) 'brownish' max(len(x) for x in L)

8
Remain calm. Keep the faith. Guido's design works fine.

No important use cases were left unserved by any() and all().

Raymond Hettinger

Jul 18 '05 #54
Hi
if key not in d:
d[key] = {subkey:value}
else:
d[key][subkey] = value
and

d[(key,subkey)] = value

?
Michel Claveau
Jul 18 '05 #55
Raymond Hettinger wrote:

As written out above, the += syntax works fine but does not work with append(). ...
BTW, there is no need to make the same post three times.


The append() syntax works, if you use the other definition of safedict
(*). There are more than one way of defining safedict, see the subtle
differences between the two versions of safedict, and you'll be glad
more than one version has been posted. At any rate, what has been
presented is a general idea, nitpicking details is kind of out of
place. Programmers know how to modify a general receipe to suit their
actual needs, right?

(*) In some cases, people do not want to create a dictionary entry when
an inquiry is done on a missing item. In some case, they do. A general
receipe cannot cater to the needs of everybody.

Jul 18 '05 #56
"Aahz" <aa**@pythoncraft.com> wrote:
In article <JbL_d.8237$qN3.2116@trndny01>,
Raymond Hettinger <py****@rcn.com> wrote:

The proposed names could possibly be improved (perhaps tally() is more active
and clear than count()).


+1 tally()


-1 for count(): Implies an accessor, not a mutator.
-1 for tally(): Unfriendly to non-native english speakers.
+0.5 for add, increment. If incrementing a negative is unacceptable, how about
update/updateby/updateBy ?
+1 for accumulate. I don't think that separating the two cases -- adding to a scalar or appending to
a list -- is that essential; a self-respecting program should make this obvious by the name of the
parameter anyway ("dictionary.accumulate('hello', words)" vs "a.accumulate('hello', b)").

George
Jul 18 '05 #57
George Sakkis wrote:
"Aahz" <aa**@pythoncraft.com> wrote:
In article <JbL_d.8237$qN3.2116@trndny01>,
Raymond Hettinger <py****@rcn.com> wrote:

The proposed names could possibly be improved (perhaps tally() is more activeand clear than count()).
+1 tally()


-1 for count(): Implies an accessor, not a mutator.
-1 for tally(): Unfriendly to non-native english speakers.
+0.5 for add, increment. If incrementing a negative is unacceptable,

how about update/updateby/updateBy ?
+1 for accumulate. I don't think that separating the two cases -- adding to a scalar or appending to a list -- is that essential; a self-respecting program should make this obvious by the name of the parameter anyway ("dictionary.accumulate('hello', words)" vs

"a.accumulate('hello', b)").

What about no name at all for the scalar case:

a['hello'] += 1
a['bye'] -= 2

and append() (or augmented assignment) for the list case:

a['hello'].append(word)
a['bye'] += [word]

?

Jul 18 '05 #58
On Sat, 19 Mar 2005 15:17:59 GMT,
"Raymond Hettinger" <vz******@verizon.net> wrote:
[Dan Sommers]
Curious that in this lengthy discussion, a method name of
"accumulate" never came up. I'm not sure how to separate the two
cases (accumulating scalars vs. accumulating a list), though.

Separating the two cases is essential. Also, the wording should
contain strong cues that remind you of addition and of building a
list.


Agreed, with a slight hedge towards accumulation or tabulation rather
than addition. I don't think "summation" gets us anywhere, either.

Are the use cases for qty != 1 for weighted averages (that's the only
one I can think of off the top of my head)? Is something like this:

def accumulate( self, key, *values ):
if values == ( ):
values = 1
try:
self[ key ] += values
except KeyError:
if type( key ) == int:
self[ key ] = 1
else
self[ key ] = *values

possible? It's more "klunky" than I thought it would be before I
started typing it out.

Then we'd have these two use cases:

histogram = { }
for word in text.split( ):
histogram.accumulate( word )

and

org_chart = { }
for employee in employees:
org_chart.accumulate( employee.manager, employee.name )

Regards,
Dan

--
Dan Sommers
<http://www.tombstonezero.net/dan/>
μ₀ × ε₀ × c² = 1
Jul 18 '05 #59
Raymond Hettinger wrote:
I would like to get everyone's thoughts on two new dictionary methods:
def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)


-1 form me.

I'm not very glad with both of them ( not a naming issue ) because i
think that the dict type should offer only methods that apply to each
dict whatever it contains. count() specializes to dict values that are
addable and appendlist to those that are extendable. Why not
subtractable, dividable or right-shiftable? Because of majority
approval? I'm mot a speed fetishist and destroying the clarity of a
very fundamental data structure for speedup rather arbitrary
accumulations seems to be a bad idea. I would move this stuff in a
subclass.

Regards Kay

Jul 18 '05 #60
"El Pitonero" <pi******@gmail.com> writes:
What about no name at all for the scalar case:

a['hello'] += 1
a['bye'] -= 2


I like this despite the minor surprise that it works even when
a['hello'] is uninitialized.
Jul 18 '05 #61
On Sat, 19 Mar 2005 07:13:15 -0500, Kent Johnson <ke****@tds.net> wrote:
Bengt Richter wrote:
On Sat, 19 Mar 2005 01:24:57 GMT, "Raymond Hettinger" <vz******@verizon.net> wrote:
I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

How about an efficient duck-typing value-incrementer to replace both? E.g. functionally like:
>>> class xdict(dict):

... def valadd(self, key, incr=1):
... try: self[key] = self[key] + type(self[key])(incr)
... except KeyError: self[key] = incr


A big problem with this is that there are reasonable use cases for both
d.count(key, <some integer>)
and
d.appendlist(key, <some integer>)

Word counting is an obvious use for the first. Consolidating a list of key, value pairs where the
values are ints requires the second.

Combining count() and appendlist() into one function eliminates the second possibility.

I don't see a "big problem" ;-)

d.addval doesn't eliminate the functionalities if you want them. You just have to spell them
d.addval(key, <some integer>)
and
d.addval(key, [<some integer>])
respectively.

My example interactive stuff in a previous post shows both of these using xd['x'] and xd.['y']:
"""
xd = xdict()
xd {}

Default integer 1 arg creates initial int value xd.valadd('x')
xd {'x': 1}

Explicit list arg create initial list value xd.valadd('y', range(3))
xd {'y': [0, 1, 2], 'x': 1}

Explicit int increment adds to existing int xd.valadd('x', 100)
xd['x'] 101

Explicit list arg extends existing list with contents of increment list
which you can of course optionally use with a length-1 list to achieve the .append effect xd.valadd('y', range(3,6))
xd['y']

[0, 1, 2, 3, 4, 5]
"""

Granted, d.appendlist will result in more efficient code, since the temp list arg
[<some integer>] does not have to be created and the type of the existing dict value
does not have to be tested as generally. OTOH, you can create and extend tuple
or even custom object values using valadd, and extend with alternate sequences that
get converted to the existing type if its contructor knows how to accept alternatives.

So I think you are not prevented from doing anything. IMO Raymond's Zen concerns
are the ones to think about first, and then efficiency, which was one of the motivators
in the first place ;-)

Regards,
Bengt Richter
Jul 18 '05 #62

Raymond Hettinger wrote:
I would like to get everyone's thoughts on two new dictionary methods:

+1 for each.
PROBLEMS BEING SOLVED
---------------------

The readability issues with the existing constructs are:

* They are awkward to teach, create, read, and review.
* Their wording tends to hide the real meaning (accumulation).
* The meaning of setdefault() 's method name is not self-evident.
+1 to EACH of the above 3 points.

A question directed to the folk who had to look up "tally" in the
dictionary: Which dictionary includes "setdefault", "updateBy", etc?

The performance issues with the existing constructs are:
[MANY]
the performance improvement matches the readability
improvement.
Agreed.


ISSUES
------

The proposed names could possibly be improved (perhaps tally() is more active and clear than count()).
+3 for tally !!!

appendtolist is better than appendlist

The appendlist() method is not as versatile as setdefault() which can be used with other object types (perhaps for creating dictionaries of dictionaries). However, most uses I've seen are with lists.


My use cases for tally:
(1) Yes the text-book "word" frequency gig applied in production
data-matching etc applications
(2) quick stats like from SQL "group by" e.g.
customer.tally(state)
customer_value.tally(state, dollar_value) # real or *DECIMAL*

Use cases for appendlist: many; in general, how else do you implement a
one-to-many relationship in memory??

Jul 18 '05 #63
John Machin wrote:
Raymond Hettinger wrote:
I would like to get everyone's thoughts on two new dictionary

methods:

+1 for each.
PROBLEMS BEING SOLVED
---------------------

The readability issues with the existing constructs are:

* They are awkward to teach, create, read, and review.
* Their wording tends to hide the real meaning (accumulation).
* The meaning of setdefault() 's method name is not self-evident.


+1 to EACH of the above 3 points.

A question directed to the folk who had to look up "tally" in the
dictionary: Which dictionary includes "setdefault", "updateBy", etc?


Are you kidding? If you know what "set" and "default" means, you will be
able to guess what "setdefault" means. Same goes for updateBy.

Of course, I had to look up setdefault in the Python docs the first time
I ran across it, but in the case of tally, I would have to look up both
in the Python docs and in my dictionary.

Reinhold
Jul 18 '05 #64
[Bengt Richter]
IMO Raymond's Zen concerns
are the ones to think about first, and then efficiency, which was one of the motivators in the first place ;-)


Well said.

I find the disassembly (presented in the first post) to be telling. The
compiler has done a great job and there is no fluff -- all of those steps have
been specified by the programmer and he/she must at some level be aware of every
one of them (pre-instantiation, multiple method lookups and calls, multiple
dictionary accesses, etc). That is too bad because the intent could have been
stated atomically: d.appendlist(k, v). Instead, the current idiom turns you
away from what you want done and focuses the attention on how it is done:
d.setdefault(k, []).append(v). That is too many steps for what should be an
atomic operation (in the eyes of the programmer and code readers).

Likewise, d.tally(k) is as atomic as this expression can get. Any other steps,
added verbiage, new types, extra arguments, or whatnot are an unnecessary waste
of brain cells.
Raymond Hettinger
Jul 18 '05 #65
"Raymond Hettinger" <vz******@verizon.net> writes:
I find the disassembly (presented in the first post) to be telling.
The compiler has done a great job and there is no fluff -- all of
those steps have been specified by the programmer and he/she must at
some level be aware of every one of them (pre-instantiation,
multiple method lookups and calls, multiple dictionary accesses, etc).


If the compiler can do some type inference, it can optimize out those
multiple calls pretty straightforwardly.
Jul 18 '05 #66

Reinhold Birkenfeld wrote:
John Machin wrote:
Are you kidding? If you know what "set" and "default" means, you will be able to guess what "setdefault" means. Same goes for updateBy.


No I'm not kidding -- people from some cultures have no difficulty at
all in mentally splitting up "words" like "setdefault" or the German
equivalent of "Danubesteamnavigationcompany'sdirector'swife" ; others
from other cultures where agglutinisation is not quite so rife will
have extreme difficulty.

And "Updateby" sounds like a village somewhere in the Danelaw :-)

Jul 18 '05 #67
In article <JbL_d.8237$qN3.2116@trndny01>,
"Raymond Hettinger" <vz******@verizon.net> wrote:
The rationale is to replace the awkward and slow existing idioms for
dictionary
based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)

In simplest form, those two statements would now be coded more readably as:

d.count(key)
d.appendlist(key, value)

In their multi-value forms, they would now be coded as:

d.count(key, qty)
d.appendlist(key, *values)

The error messages returned by the new methods are the same as those returned
by
the existing idioms.

The get() method would continue to exist because it is useful for
applications
other than accumulation.

The setdefault() method would continue to exist but would likely not make it
into Py3.0.


The other dictionary based accumulation I've used but don't see here is
with sets as dictionary values, i.e.
dictionary.setdefault(key,set()).add(value).

I suppose it would be possible to appendlist then make a set from the
list, but if you were to take that minimalist philosophy to an extreme
you wouldn't need count either, you could just appendlist then use len.

--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
Jul 18 '05 #68
In article <GoX_d.9343$qN3.2772@trndny01>,
"Raymond Hettinger" <vz******@verizon.net> wrote:
Also, in all of my code base, I've not run across a single opportunity to use
something like unionset().


In my code, this would be roughly tied with appendlist for frequency of
use.

--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
Jul 18 '05 #69
Ron
On 19 Mar 2005 11:33:20 -0800, "Kay Schluehr" <ka**********@gmx.net>
wrote:
Raymond Hettinger wrote:
I would like to get everyone's thoughts on two new dictionary

methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)


-1 form me.

I'm not very glad with both of them ( not a naming issue ) because i
think that the dict type should offer only methods that apply to each
dict whatever it contains. count() specializes to dict values that are
addable and appendlist to those that are extendable. Why not
subtractable, dividable or right-shiftable? Because of majority
approval? I'm mot a speed fetishist and destroying the clarity of a
very fundamental data structure for speedup rather arbitrary
accumulations seems to be a bad idea. I would move this stuff in a
subclass.

Regards Kay


-1 for me too.

I agree with this. The other dictionary functions are all data
nuetral. Adding special methods to handle data should be in a
durrived class and not a built in. imho.

# Roll your own specialized dictionaries.

class NumDict( dict):
def count(self, key, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty
def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

mydict = NumDict()

n = 0
for k in list('abcdefg'):
n += 1
mydict[k] = n

print mydict
# {'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4, 'g': 7, 'f': 6}

mydict.count('c')
mydict['e'] = ['a']
mydict.appendlist('e', 'bcdef')
print mydict
# {'a': 1, 'c': 4, 'b': 2, 'e': ['a', 'bcdef'], 'd': 4, 'g': 7, 'f':
6}
This isn't that hard to do. I don't think the extra methods should be
added to the base class.

Ron

Jul 18 '05 #70
> -1 form me.

I'm not very glad with both of them ( not a naming issue ) because i
think that the dict type should offer only methods that apply to each
dict whatever it contains. count() specializes to dict values that are
addable and appendlist to those that are extendable. Why not
subtractable, dividable or right-shiftable? Because of majority
approval? I'm mot a speed fetishist and destroying the clarity of a
very fundamental data structure for speedup rather arbitrary
accumulations seems to be a bad idea. I would move this stuff in a
subclass.

Regards Kay


+1 on this. The new suggested operations are meaningful for a subset of all valid dicts, so they
should not be part of the base dict API. If any version of this is approved, it will clearly be an
application of the "practicality beats purity" zen rule, and the justification for applying it in
this case instead of subclassing should better be pretty strong; so far I'm not convinced though.

George
Jul 18 '05 #71
Raymond Hettinger wrote:
I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)


These undoubtedly address common cases, which are unsatisfactory when spelled
using setdefault. However...

Use of these methods implicitly specializes the dictionary. The methods are
more-or-less mutually exclusive i.e., it would be at least strange to use count
and appendlist on the same dictionary. Moreover, on many dictionary instances,
the methods would fail or produce meaningless results.

This seems to be at odds with the other methods of built-in container types
which can be meaningfully applied, no matter what the types of the contents.
(There may be exceptions, but I can't think of any at the moment)

Does anyone else think this is a problem?

Michael

Jul 18 '05 #72
Michael Spencer wrote:
Raymond Hettinger wrote:
I would like to get everyone's thoughts on two new dictionary methods:
def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

These undoubtedly address common cases, which are unsatisfactory when

spelled using setdefault. However...

Use of these methods implicitly specializes the dictionary. The methods are more-or-less mutually exclusive i.e., it would be at least strange to use count and appendlist on the same dictionary. Moreover, on many dictionary instances, the methods would fail or produce meaningless results.

This seems to be at odds with the other methods of built-in container types which can be meaningfully applied, no matter what the types of the contents. (There may be exceptions, but I can't think of any at the moment)

Does anyone else think this is a problem?

Michael

Yep, at least three more people in this thread:
- http://tinyurl.com/4bsdf
- http://tinyurl.com/3seqx
- http://tinyurl.com/6db27

George

Jul 18 '05 #73
"Raymond Hettinger" <vz******@verizon.net> writes:
The rationale is to replace the awkward and slow existing idioms for dictionary
based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)

In simplest form, those two statements would now be coded more readably as:

d.count(key)
d.appendlist(key, value)


Yuck.

The relatively recent "improvement" of the dict constructor signature
(``dict(foo=bar,...)``) obviously makes it impossible to just extend the
constructor to ``dict(default=...)`` (or anything else for that matter) which
would seem much less ad hoc. But why not use a classmethod (e.g.
``d=dict.withdefault(0)``) then?

Or, for the first and most common case, just a bag type?
'as
Jul 18 '05 #74

George Sakkis wrote:
+1 on this. The new suggested operations are meaningful for a subset of all valid dicts, so they should not be part of the base dict API. If any version of this is approved, it will clearly be an application of the "practicality beats purity" zen rule, and the justification for applying it in this case instead of subclassing should better be pretty strong; so

far I'm not convinced though.

My background: I've been subclassing dict since this was possible, to
provide not only a count/tally method but also a DIY intern method.
Before that I just copied dictmodule.c (every release!) and diffed and
hacked about till I had a mydict module.

*any* version? Could we make you happy by having a subclass
TotallySinfulDict provided as part of the core? You don't have to use
it -- come to think of it, you don't have to use a sinful method in the
base dict. You could even avert your gaze when reading the
documentation.

The justification for having it in the core: it's in C, not in Python,
it gets implemented and maintained (a) ONCE (b) by folk like the timbot
and Raymond instead of having (a) numerous versions lashed up (b) by
you and me and a whole bunch of n00bz and b1ffz :-)

Jul 18 '05 #75
Alexander Schmolck wrote:
"Raymond Hettinger" <vz******@verizon.net> writes:
The rationale is to replace the awkward and slow existing idioms for dictionary
based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)
Indeed not too readable. The try..except version is better but is too
verbose. There is a simple concept underneath of assuming a default value and
we need "one obvious" way to write it.
In simplest form, those two statements would now be coded more readably as:

d.count(key)
d.appendlist(key, value)
Yuck.

-1 from me too on these two methods because they only add "duct tape" for the
problem instead of solving it. We need to improve upon `dict.setdefault()`,
not specialize it.
The relatively recent "improvement" of the dict constructor signature
(``dict(foo=bar,...)``) obviously makes it impossible to just extend the
constructor to ``dict(default=...)`` (or anything else for that matter) which
would seem much less ad hoc. But why not use a classmethod (e.g.
``d=dict.withdefault(0)``) then?
You mean giving a dictionary a default value at creation time, right?

Such a dictionary could be used very easily, as in <gasp>Perl::

foreach $word ( @words ) {
$d{$word}++; # default of 0 assumed, simple code!
}

</gasp>. You would like to write::

d = dict.withdefault(0) # or something
for word in words:
d[word] += 1 # again, simple code!

I agree that it's a good idea but I'm not sure the default should be specified
at creation time. The problem with that is that if you pass such a dictionary
into an unsuspecting function, it will not behave like a normal dictionary.
Also, this will go awry if the default is a mutable object, like ``[]`` - you
must create a new one at every access (or introduce a rule that the object is
copied every time, which I dislike). And there are cases where in different
points in the code operating on the same dictionary you need different default
values.

So perhaps specifying the default at every point of use by creating a proxy is
cleaner::

d = {}
for word in words:
d.withdefault(0)[word] += 1

Of course, you can always create the proxy once and still pass it into an
unsuspecting function when that is actually what you mean.

How should a dictionary with a default value behave (wheter inherently or a
proxy)?

- ``d.__getattr__(key)`` never raises KeyError for missing keys - instead it
returns the default value and stores the value as `d.setdefult()` does.
This is needed for make code like::

d.withdefault([])[key].append(foo)

to work - there is no call of `d.__setattr__()`, so `d.__getattr__()` must
have stored it.

- `d.__setattr__()` and `d.__delattr__()` behave normally.

- Should ``key in d`` return True for all keys? It is desiarable to have
*some* way to know whether a key is really present. But if it returns False
for missing keys, code that checks ``key in d`` will behave differently from
normally equivallent code that uses try..except. If we use the proxy
interface, we can always check on the original dictionary object, which
removes the problem.

- ``d.has_key(key)`` must do whatever we decide ``key in d`` does.

- What should ``d.get(key, [default])`` and ``d.setdefault(key, default)``
do? There is a conflict between the default of `d` and the explicitly given
default. I think consistency is better and these should pretend that `key`
is always present. But either way, there is a subtle problem here.

- Of course `iter(d)`, `d.items()` and the like should only see the keys
that are really present (the alternative inventing an infinite amount of
items out of the blue is clearly bogus).

If the idea that the default should be specified in every operation (creating
a proxy) is accepted, there is a simpler and more fool-proof solution: the
ptoxy will not support anything except `__getitem__()` and `__setitem__()` at
all. Use the original dictionary for everything else. This prevents subtle
ambiguities.
Or, for the first and most common case, just a bag type?

Too specialized IMHO. You want a dictionary with any default anyway. If you
have that, what will be the benefit of a bag type?

Jul 18 '05 #76
"John Machin" <sj******@lexicon.net> wrote in message
news:11**********************@g14g2000cwa.googlegr oups.com...

George Sakkis wrote:
+1 on this. The new suggested operations are meaningful for a subset

of all valid dicts, so they
should not be part of the base dict API. If any version of this is

approved, it will clearly be an
application of the "practicality beats purity" zen rule, and the

justification for applying it in
this case instead of subclassing should better be pretty strong; so

far I'm not convinced though.

My background: I've been subclassing dict since this was possible, to
provide not only a count/tally method but also a DIY intern method.
Before that I just copied dictmodule.c (every release!) and diffed and
hacked about till I had a mydict module.

*any* version? Could we make you happy by having a subclass
TotallySinfulDict provided as part of the core? You don't have to use
it -- come to think of it, you don't have to use a sinful method in the
base dict. You could even avert your gaze when reading the
documentation.

The justification for having it in the core: it's in C, not in Python,
it gets implemented and maintained (a) ONCE (b) by folk like the timbot
and Raymond instead of having (a) numerous versions lashed up (b) by
you and me and a whole bunch of n00bz and b1ffz :-)


I believe it was pretty clear that I'm not against a new dict extension, in the core or in the
standard library; the proposed functionality is certainly useful and it would be most welcome. I
just don't find it appropriate for the existing base dict because it is not applicable to *every*
dictionary. As for the "you don't have to use feature X if you don't like it" argument, it's rarely
relevant in language design.

George
Jul 18 '05 #77
George Sakkis wrote:
-1 form me.

I'm not very glad with both of them ( not a naming issue ) because i
think that the dict type should offer only methods that apply to each dict whatever it contains. count() specializes to dict values that are addable and appendlist to those that are extendable. Why not
subtractable, dividable or right-shiftable? Because of majority
approval? I'm mot a speed fetishist and destroying the clarity of a
very fundamental data structure for speedup rather arbitrary
accumulations seems to be a bad idea. I would move this stuff in a
subclass.

Regards Kay
+1 on this. The new suggested operations are meaningful for a subset of all valid dicts, so they
should not be part of the base dict API. If any version of this is approved, > it will clearly be an application of the "practicality beats purity" zen rule, and the
justification for applying it in
this case instead of subclassing should better be pretty strong; so far I'm not convinced though.

George


It is bad OO design, George. I want to be a bit more become more
specific on this and provide an example:

Let be <intdict> a custom subclass of dict that stores only ints as
keys as well as values:

class intdict(dict):
def __setitem__(self, key, value):
assert isinstance(key, int) and isinstance(value, int)
dict.__setitem__(self,key,value)

or in Python3000 typeguard fashion:

class intdict(dict):
def __setitem__(self, key:int, value:int):
dict.__setitem__(self,key,value)

But <intdict> still overloads appendlist() i.e. a method that does not
work for any intdict is still part of it's public interface! This is
really bad design and should not be justified by a "practicality beats
purity" wisdom which should be cited with much care but not
carelesness.

Maybe also the subclassing idea I introduced falls for short for the
same reasons. Adding an accumulator to a dict should be implemented as
a *decorator* pattern in the GoF meaning of the word i.e. adding an
interface to some object at runtime that provides special facilities.

Usage:
d = intdict(extend=MyAccumulator)
hasattr(d,"tally") True hasattr(d,"appendlist")

False

This could be generalized to other fundamental data structures as well.

Regards Kay

Jul 18 '05 #78
Kay Schluehr wrote:
Maybe also the subclassing idea I introduced falls for short for the
same reasons. Adding an accumulator to a dict should be implemented as
a *decorator* pattern in the GoF meaning of the word i.e. adding an
interface to some object at runtime that provides special facilities.

Usage:

d = intdict(extend=MyAccumulator)
hasattr(d,"tally")
True
hasattr(d,"appendlist")
False

This could be generalized to other fundamental data structures as well.

Regards Kay

Or similarly, something like the 'reversed' view of a sequence:

I could imagine a class: accumulator(mapping, default, incremetor) such that:
my_tally = accumulator({}, 0, operator.add) or my_dict_of_lists = accumulator({}, [], list.append) or my_dict_of_sets = accumulator({}, set(), set.add)


then: .accumulate(key, value) "does the right thing" in each case.

a bit cumbersome, because of having to specify the accumulation method, but
avoids adding methods to, or sub-classing dict

Michael

Jul 18 '05 #79
I am surprised nobody suggested we put those two methods into a
separate module (say dictutils or even UserDict) as functions:

from dictutils import tally, listappend

tally(mydict, key)
listappend(mydict, key, value)

I am -1 about a specific subclass of dict in the standard library, I
would not mind about a few new functions
in an utility module.

Michele Simionato

Jul 18 '05 #80
> > +1 on this. The new suggested operations are meaningful for a subset
of all
valid dicts, so they
should not be part of the base dict API. If any version of this is

approved, > it will clearly be an
application of the "practicality beats purity" zen rule, and the
justification for applying it in
this case instead of subclassing should better be pretty strong; so

far I'm
not convinced though.

George


It is bad OO design, George. I want to be a bit more become more
specific on this and provide an example:


Kay, the '+1' was for your post, not the pre-PEP; I fully agree with you in that it's bad design. I
just tried to play devil's advocate by citing an argument that might be used to back the addition of
the proposed accumulating methods.

Regards,
George
Jul 18 '05 #81
Ron
On Sat, 19 Mar 2005 01:24:57 GMT, "Raymond Hettinger"
<vz******@verizon.net> wrote:
def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

Why is it better than this?

dict[key]+=n
dict[key]+=list

Ron

Jul 18 '05 #82
"Michael Spencer" <ma**@telcopartners.com> wrote:
I could imagine a class: accumulator(mapping, default, incremetor) such that:
>>> my_tally = accumulator({}, 0, operator.add) or >>> my_dict_of_lists = accumulator({}, [], list.append) or >>> my_dict_of_sets = accumulator({}, set(), set.add)


then: .accumulate(key, value) "does the right thing" in each case.

a bit cumbersome, because of having to specify the accumulation method, but
avoids adding methods to, or sub-classing dict

Michael


That's the equivalent of reduce() for mappings. Given the current trend of moving away from
traditional functional features (lambda,map,filter,etc.), I would guess it's not likely to become
mainstream.

George
Jul 18 '05 #83
*pling* !

I'm sometimes a bit slow :)

Regards Kay

Jul 18 '05 #84
Paul Rubin wrote:
If the compiler can do some type inference, it can optimize out those
multiple calls pretty straightforwardly.


It can be tipped like that:

di = dict(int)
di.setdefault(0)
di[key] += 1

dl = dict(list)
dl.setdefault([])
dl.append("word")
dl.extend(mylist)

But the point is that if method not found in dict it delegated to
container type specified in constructor.

It solves dict specialization without bloating dict class and is generic.

Mike

Jul 18 '05 #85
Mike Rovner <mr*****@propel.com> writes:
It can be tipped like that:

di = dict(int)
di.setdefault(0)
di[key] += 1 .... But the point is that if method not found in dict it delegated to
container type specified in constructor.

It solves dict specialization without bloating dict class and is generic.


Hey, I like that. I'd let the default be an optional extra arg to the
constructor:

di = dict(int, default=0)
di[key] += 1

without the setdefault. I might even add optional type checking:

di = dict(int, default=0, typecheck=True)
di[key] = 'foo' # raises TypeError
Jul 18 '05 #86
Raymond Hettinger wrote:
The rationale is to replace the awkward and slow existing idioms for
dictionary based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)


How about the alternative approach of allowing the user to override the
action to be taken when accessing a non-existent key?

d.defaultValue(0)

and the accumulation becomes:

d[key] += 1

and:

d.defaultValue(function=list)

would allow a safe:

d[key].extend(values)

Jul 18 '05 #87
Raymond Hettinger wrote:
I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)


They look as a special-case to me. They don't solve the problem for
lists of sets or lists of deques for instance, not to mention other
possible user-defined containers.

defaultdicts look to me as a solution that is more elegant and solves
more problems. What is the problem with them?

--
Ciao,
Matteo
Jul 18 '05 #88
Mike Rovner wrote:
Paul Rubin wrote:
If the compiler can do some type inference, it can optimize out those
multiple calls pretty straightforwardly.
It can be tipped like that:

di = dict(int)
di.setdefault(0)
di[key] += 1


Interesting, but why do you need to give the int type to the constructor?
dl = dict(list)
dl.setdefault([])
dl.append("word")
dl.extend(mylist)


I don't quite understand that. Which dict item are you extending? Don't
you need something like

dl[key].append("word")

?

Anyway, using `setdefault' as the method name is quite confusing,
although yours is IMHO a much better behavior given the name ;)

So what about `setdefaultvalue'?

Reinhold
Jul 18 '05 #89
John Machin wrote:
Reinhold Birkenfeld wrote:
John Machin wrote:
Are you kidding? If you know what "set" and "default" means, you will

be
able to guess what "setdefault" means. Same goes for updateBy.


No I'm not kidding -- people from some cultures have no difficulty at
all in mentally splitting up "words" like "setdefault" or the German
equivalent of "Danubesteamnavigationcompany'sdirector'swife" ; others
from other cultures where agglutinisation is not quite so rife will
have extreme difficulty.


Okay - as I'm German I might be preoccupied on this matter <wink>

Reinhold
Jul 18 '05 #90
George Sakkis wrote:
-1 form me.

I'm not very glad with both of them ( not a naming issue ) because i
think that the dict type should offer only methods that apply to each
dict whatever it contains. count() specializes to dict values that are
addable and appendlist to those that are extendable. Why not
subtractable, dividable or right-shiftable? Because of majority
approval? I'm mot a speed fetishist and destroying the clarity of a
very fundamental data structure for speedup rather arbitrary
accumulations seems to be a bad idea. I would move this stuff in a
subclass.

Regards Kay


+1 on this. The new suggested operations are meaningful for a subset of all valid dicts, so they
should not be part of the base dict API. If any version of this is approved, it will clearly be an
application of the "practicality beats purity" zen rule, and the justification for applying it in
this case instead of subclassing should better be pretty strong; so far I'm not convinced though.


So, would the `setdefaultvalue' approach be more consistent in your eyes?

Reinhold
Jul 18 '05 #91
> How about the alternative approach of allowing the user to override the
action to be taken when accessing a non-existent key?

d.defaultValue(0)


I like this a lot. It makes it more clear from the code what is going on,
rather than having to figure out what the name appendlist, count, tally,
whatever, is supposed to mean. When you see the value you'll know.

It's more general, because you can support dictionaries and sets then as
well.

I think someone mentioned that it might be a problem to add another piece of
state to all dicts though. I don't know enough about the internals to say
anything about this.

setdefault gets around this by having you pass in the value every time, so
it doesn't have to store it. It's very similar, but somehow many times more
awkward.

Jul 18 '05 #92
Duncan Booth wrote:
Raymond Hettinger wrote:
The rationale is to replace the awkward and slow existing idioms for dictionary based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)

How about the alternative approach of allowing the user to override

the action to be taken when accessing a non-existent key?

d.defaultValue(0)

and the accumulation becomes:

d[key] += 1

and:

d.defaultValue(function=list)

would allow a safe:

d[key].extend(values)


+0

The best suggestion up to now. But i find this premature because it
addresses only a special aspect of typing issues which should be
disussed together with Guidos type guard proposals in a broader
context. Besides this the suggestion though feeling pythonic is still
uneven.

Why do You set

d.defaultValue(0)
d.defaultValue(function=list)

but not

d.defaultValue(0)
d.defaultValue([])

?

And why not dict(type=int), dict(type=list) instead where default
values are instantiated during object creation? A consistent pythonic
handling of all types should be envisioned not some ad hoc solutions
that go deprecated two Python releases later.

Regards Kay

Jul 18 '05 #93
Max
Paul Rubin wrote:
Reinhold Birkenfeld <re************************@wolke7.net> writes:
Any takers for tally()?


Well, as a non-native speaker, I had to look up this one in my
dictionary. That said, it may be bad luck on my side, but it may be that
this word is relatively uncommon and there are many others who would be
happier with increment.

It is sort of an uncommon word. As a US English speaker I'd say it
sounds a bit old-fashioned, except when used idiomatically ("let's
tally up the posts about accumulator messages") or in nonstandard
dialect ("Hey mister tally man, tally me banana" is a song about
working on plantations in Jamaica). It may be more common in UK
English. There's an expression "tally-ho!" which had something to do
with British fox hunts, but they don't have those any more.


Has anyone _not_ heard Jeff Probst say, "I'll go tally the votes"?!

:)

--Max
Jul 18 '05 #94
Max wrote:
Has anyone _not_ heard Jeff Probst say, "I'll go tally the votes"?!

:)


Who is Jeff Probst?
Jul 18 '05 #95
Kay Schluehr wrote:
Why do You set

d.defaultValue(0)
d.defaultValue(function=list)

but not

d.defaultValue(0)
d.defaultValue([])

?
I think that's because you have to instantiate a different object for
each different key. Otherwise, you would instantiate just one list as a
default value for *all* default values. In other words, given:

class DefDict(dict):
def __init__(self, default):
self.default = default
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
return self.default

you'll get

In [12]: d = DefDict([])

In [13]: d[42].extend(['foo'])

In [14]: d.default
Out[14]: ['foo']

In [15]: d[10].extend(['bar'])

In [16]: d.default
Out[16]: ['foo', 'bar']

In [17]: d[10]
Out[17]: ['foo', 'bar']

In [18]: d[10] is d.default
Out[18]: True

and this isn't what you really wanted.

By the way, to really work, I think that Duncan's proposal should create
new objects when you try to access them, and to me it seems a bit
counterintuitive. Nevertheless, I'm +0 on it.
And why not dict(type=int), dict(type=list) instead where default
values are instantiated during object creation? A consistent pythonic
handling of all types should be envisioned not some ad hoc solutions
that go deprecated two Python releases later.


I don't really understand you. What should 'type' return? A callable
that returns a new default value? That's exactly what Duncan proposed
with the "function" keyword argument.

--
Ciao,
Matteo
Jul 18 '05 #96
In article <JbL_d.8237$qN3.2116@trndny01>, Raymond Hettinger wrote:
I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty
Yes, yes, YES!

*Man*, this would be useful.
def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)


Woohoo! *Just* as useful.

I'd *definitely* use these.

Hot 100% sure about the names, though. (add() and append() seem like
more natural names -- but they may be confusing, considering their
other uses...)

+1 on both (possibly allowing for some naming discussion...)

--
Magnus Lie Hetland Time flies like the wind. Fruit flies
http://hetland.org like bananas. -- Groucho Marx
Jul 18 '05 #97
I like count() and appendlist() or whatever they will be named. But I
have one question/idea:

Why does the methods have to be put in dict? Can't their be a subtype
of dict that includes those two methods? I.e.:

..histogram = counting_dict()
..for ch in text:
.. histogram.count(ch)

Then maybe some more methods can be added tailor-mode for these two
types of dicts?:

..for ch in string.ascii_letters:
.. print "Frequency of %s = %d." % (ch, histogram.freq(ch))

Or something, you get the idea.

--
mvh Bjrn
Jul 18 '05 #98
Roose wrote:
I think someone mentioned that it might be a problem to add another
piece of state to all dicts though. I don't know enough about the
internals to say anything about this.

setdefault gets around this by having you pass in the value every
time, so it doesn't have to store it. It's very similar, but somehow
many times more awkward.


Another option with no storage overhead which goes part way to reducing
the awkwardness would be to provide a decorator class accessible through
dict. The decorator class would take a value or function to be used as
the default, but apart from __getitem__ would simply forward all other
methods through to the underlying dictionary.

That gives you the ability to have not one default value for a
dictionary, but many different ones: you just decorate the dictionary
anywhere you need a default and use the underlying dictionary everywhere
else.

Some code which demonstrates the principle rather than the
implementation. dictDefaultValue could be imagined as
dict.defaultValue, dictDefaultValue(d, ...) could be
d.defaultValue(...) although the actual name used needs work:
class dictDefaultValue(object): def __init__(self, d, value=_marker, function=_marker):
self.__d = d
if value is _marker:
if function is _marker:
raise TypeError, "expected either value or function argument"
self.__dv = function
else:
def defaultValue():
return value
self.__dv = defaultValue

def __getattr__(self, name):
return getattr(self.__d, name)

def __getitem__(self, name):
try:
return self.__d[name]
except KeyError:
value = self.__dv()
self.__d[name] = value
return value

def __setitem__(self, name, value):
self.__d[name] = value

d = {}
accumulator = dictDefaultValue(d, 0)
accumulator['a'] += 1
aggregator = dictDefaultValue(d, function=list)
aggregator['b'].append('hello')
d {'a': 1, 'b': ['hello']}

Jul 18 '05 #99
Matteo Dell'Amico wrote:
Kay Schluehr wrote:
Why do You set

d.defaultValue(0)
d.defaultValue(function=list)

but not

d.defaultValue(0)
d.defaultValue([])

?
I think that's because you have to instantiate a different object for

each different key. Otherwise, you would instantiate just one list as a default value for *all* default values.
Or the default value will be copied, which is not very hard either or
type(self._default)() will be called. This is all equivalent and it
does not matter ( except for performance reasons ) which way to go as
long only one is selected.

[...]
By the way, to really work, I think that Duncan's proposal should create new objects when you try to access them, and to me it seems a bit
counterintuitive.
If the dict has a fixed semantics by applying defaultValue() and it
returns defaults instead of exceptions whenever a key is missing i.e.
behavioural invariance the client of the dict has nothing to worry
about, hasn't he?

And why not dict(type=int), dict(type=list) instead where default
values are instantiated during object creation? A consistent pythonic handling of all types should be envisioned not some ad hoc solutions that go deprecated two Python releases later.


I don't really understand you. What should 'type' return?
A callable
that returns a new default value? That's exactly what Duncan proposed

with the "function" keyword argument.


I suspect the proposal really makes sense only if the dict-values are
of the same type. Filling it with strings, custom objects and other
stuff and receiving 0 or [] or '' if a key is missing would be a
surprise - at least for me. Instantiating dict the way I proposed
indicates type-guards! This is the reason why I want to delay this
issue and discuss it in a broader context. But I'm also undecided.
Guidos Python-3000 musings are in danger to become vaporware. "Now is
better then never"... Therefore +0.

Regards Kay

Jul 18 '05 #100

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

21 posts views Thread by Headless | last post: by
7 posts views Thread by Alan Illeman | last post: by
2 posts views Thread by Buck Turgidson | last post: by
5 posts views Thread by Michael Shell | last post: by
8 posts views Thread by Jarno Suni not | last post: by
9 posts views Thread by Eric Lindsay | last post: by
14 posts views Thread by Schraalhans Keukenmeester | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
1 post views Thread by Geralt96 | last post: by
reply views Thread by harlem98 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.