Pre-PEP: Dictionary accumulator methods

Raymond Hettinger

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

The rationale is to replace the awkward and slow existing idioms for dictionary
based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)

In simplest form, those two statements would now be coded more readably as:

d.count(key)
d.appendlist(key, value)

In their multi-value forms, they would now be coded as:

d.count(key, qty)
d.appendlist(key, *values)

The error messages returned by the new methods are the same as those returned by
the existing idioms.

The get() method would continue to exist because it is useful for applications
other than accumulation.

The setdefault() method would continue to exist but would likely not make it
into Py3.0.
PROBLEMS BEING SOLVED
---------------------

The readability issues with the existing constructs are:

* They are awkward to teach, create, read, and review.
* Their wording tends to hide the real meaning (accumulation).
* The meaning of setdefault() 's method name is not self-evident.

The performance issues with the existing constructs are:

* They translate into many opcodes which slows them considerably.
* The get() idiom requires two dictionary lookups of the same key.
* The setdefault() idiom instantiates a new, empty list prior to every call.
* That new list is often not needed and is immediately discarded.
* The setdefault() idiom requires an attribute lookup for extend/append.
* The setdefault() idiom makes two function calls.

The latter issues are evident from a disassembly:

dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec')) 1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (get)
6 LOAD_NAME 2 (key)
9 LOAD_CONST 0 (0)
12 CALL_FUNCTION 2
15 LOAD_NAME 3 (qty)
18 BINARY_ADD
19 LOAD_NAME 0 (d)
22 LOAD_NAME 2 (key)
25 STORE_SUBSCR
26 LOAD_CONST 1 (None)
29 RETURN_VALUE
dis(compile('d.setdefault(key, []).extend(values)', '', 'exec'))

1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (setdefault)
6 LOAD_NAME 2 (key)
9 BUILD_LIST 0
12 CALL_FUNCTION 2
15 LOAD_ATTR 3 (extend)
18 LOAD_NAME 4 (values)
21 CALL_FUNCTION 1
24 POP_TOP
25 LOAD_CONST 0 (None)
28 RETURN_VALUE

In contrast, the proposed methods use only a single attribute lookup and
function call, they use only one dictionary lookup, they use very few opcodes,
and they directly access the accumulation functions, PyNumber_Add() or
PyList_Append(). IOW, the performance improvement matches the readability
improvement.
ISSUES
------

The proposed names could possibly be improved (perhaps tally() is more active
and clear than count()).

The appendlist() method is not as versatile as setdefault() which can be used
with other object types (perhaps for creating dictionaries of dictionaries).
However, most uses I've seen are with lists. For other uses, plain Python code
suffices in terms of speed, clarity, and avoiding unnecessary instantiation of
empty containers:

if key not in d:
d.key = {subkey:value}
else:
d[key][subkey] = value

Raymond Hettinger

Jul 18 '05 #1

Subscribe Post Reply

125

7056

Ivan Van Laningham

Hi All--
Maybe I'm not getting it, but I'd think a better name for count would be
add. As in

d.add(key)
d.add(key,-1)
d.add(key,399)
etc.

Raymond Hettinger wrote:

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

There is no existing add() method for dictionaries. Given the name
change, I'd like to see it.

Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.pauahtun.org/
http://www.andi-holmes.com/
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours

Jul 18 '05 #2

Jeff Shannon

Raymond Hettinger wrote:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

I presume that the argument list is a typo, and should actually be

def count(self, key, qty=1): ...

Correct?

Jeff Shannon

Jul 18 '05 #3

Aahz

In article <JbL_d.8237$qN3.2116@trndny01>,
Raymond Hettinger <py****@rcn.com> wrote:

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

You mean

def count(self, key, qty=1)

Right?
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR

Jul 18 '05 #4

Mike Rovner

Ivan Van Laningham wrote:

Hi All--
Maybe I'm not getting it, but I'd think a better name for count would be
add. As in

d.add(key)
d.add(key,-1)
d.add(key,399)
etc.

IMHO inc (for increment) is better.

d.inc(key)

add can be read as add key to d

Mike

Jul 18 '05 #5

Raymond Hettinger

> > def count(self, value, qty=1):

[Aahz]

You mean
def count(self, key, qty=1)

Right?

Yes.

Also, there is a typo in the final snippet (pure python version of dictionary of
dictionaries). It should read:

if key not in d:
d[key] = {subkey:value}
else:
d[key][subkey] = value
Raymond

Jul 18 '05 #6

Roose

I like this, it is short, low impact, and makes things more readable. I
tend to go with just the literal way of doing it instead of using get and
setdefault, which I find awkward.

But alas I had a my short, low impact, useful suggestion and I think it
died. It was for any() and all() for lists. Actually Google just released
their "functional.py" module on code.google.com with the exact same thing.
Except they are missing the identity as a default which is very useful, i.e.
any(lst, f=lambda x: x) instead of any(lst, f).

Maybe you can tack that onto your PEP :)

That is kind of related, they are accumulators as well. They could probably
be generalized for dictionaries, but I don't know how useful that would be.

"Raymond Hettinger" <vz******@verizon.net> wrote in message
news:JbL_d.8237$qN3.2116@trndny01...

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

The rationale is to replace the awkward and slow existing idioms for dictionary based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)

In simplest form, those two statements would now be coded more readably as:
d.count(key)
d.appendlist(key, value)

In their multi-value forms, they would now be coded as:

d.count(key, qty)
d.appendlist(key, *values)

The error messages returned by the new methods are the same as those returned by the existing idioms.

The get() method would continue to exist because it is useful for applications other than accumulation.

The setdefault() method would continue to exist but would likely not make it into Py3.0.
PROBLEMS BEING SOLVED
---------------------

The readability issues with the existing constructs are:

* They are awkward to teach, create, read, and review.
* Their wording tends to hide the real meaning (accumulation).
* The meaning of setdefault() 's method name is not self-evident.

The performance issues with the existing constructs are:

* They translate into many opcodes which slows them considerably.
* The get() idiom requires two dictionary lookups of the same key.
* The setdefault() idiom instantiates a new, empty list prior to every call. * That new list is often not needed and is immediately discarded.
* The setdefault() idiom requires an attribute lookup for extend/append.
* The setdefault() idiom makes two function calls.

The latter issues are evident from a disassembly:
dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec')) 1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (get)
6 LOAD_NAME 2 (key)
9 LOAD_CONST 0 (0)
12 CALL_FUNCTION 2
15 LOAD_NAME 3 (qty)
18 BINARY_ADD
19 LOAD_NAME 0 (d)
22 LOAD_NAME 2 (key)
25 STORE_SUBSCR
26 LOAD_CONST 1 (None)
29 RETURN_VALUE
dis(compile('d.setdefault(key, []).extend(values)', '', 'exec'))
1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (setdefault)
6 LOAD_NAME 2 (key)
9 BUILD_LIST 0
12 CALL_FUNCTION 2
15 LOAD_ATTR 3 (extend)
18 LOAD_NAME 4 (values)
21 CALL_FUNCTION 1
24 POP_TOP
25 LOAD_CONST 0 (None)
28 RETURN_VALUE

In contrast, the proposed methods use only a single attribute lookup and
function call, they use only one dictionary lookup, they use very few

opcodes, and they directly access the accumulation functions, PyNumber_Add() or
PyList_Append(). IOW, the performance improvement matches the readability
improvement.
ISSUES
------

The proposed names could possibly be improved (perhaps tally() is more active and clear than count()).

The appendlist() method is not as versatile as setdefault() which can be used with other object types (perhaps for creating dictionaries of dictionaries). However, most uses I've seen are with lists. For other uses, plain Python code suffices in terms of speed, clarity, and avoiding unnecessary instantiation of empty containers:

if key not in d:
d.key = {subkey:value}
else:
d[key][subkey] = value

Raymond Hettinger

Jul 18 '05 #7

Bengt Richter

On Sat, 19 Mar 2005 01:24:57 GMT, "Raymond Hettinger" <vz******@verizon.net> wrote:

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

The rationale is to replace the awkward and slow existing idioms for dictionary
based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)

In simplest form, those two statements would now be coded more readably as:

d.count(key)
d.appendlist(key, value)

In their multi-value forms, they would now be coded as:

d.count(key, qty)
d.appendlist(key, *values)
How about an efficient duck-typing value-incrementer to replace both? E.g. functionally like:

class xdict(dict): ... def valadd(self, key, incr=1):
... try: self[key] = self[key] + type(self[key])(incr)
... except KeyError: self[key] = incr
... xd = xdict()
xd {} xd.valadd('x')
xd {'x': 1} xd.valadd('x', range(3)) Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 3, in valadd
TypeError: int() argument must be a string or a number xd.valadd('y', range(3))
xd {'y': [0, 1, 2], 'x': 1} xd.valadd('z', (1,2))
xd {'y': [0, 1, 2], 'x': 1, 'z': (1, 2)} xd.valadd('x', 100)
xd['x'] 101 xd.valadd('y', range(3,6))
xd['y'] [0, 1, 2, 3, 4, 5] xd.valadd('z', (3,4))
xd['z']

(1, 2, 3, 4)
ISSUES
------

The proposed names could possibly be improved (perhaps tally() is more active
and clear than count()).
I'm thinking the idea that the counting is happening with the value corresponding
to the key should be emphasised more. Hence valadd or such?

The appendlist() method is not as versatile as setdefault() which can be used
with other object types (perhaps for creating dictionaries of dictionaries).
However, most uses I've seen are with lists. For other uses, plain Python code
suffices in terms of speed, clarity, and avoiding unnecessary instantiation of
empty containers:

if key not in d:
d.key = {subkey:value}
else:
d[key][subkey] = value

Yes, but duck typing for any obj that supports "+" gets you a lot, ISTM at this stage
of this BF ;-)

Regards,
Bengt Richter

Jul 18 '05 #8

Jeff Epler

Maybe something for sets like 'appendlist' ('unionset'?)

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFCO6KTJd01MZaTXX0RAj9IAKCr0dxRjOtbgo4GUyR5K6 SbUSpA+gCgp75t
FkFSrxoiMQZcCg+GRzdaTnw=
=YF1H
-----END PGP SIGNATURE-----

Jul 18 '05 #9

Raymond Hettinger

[Jeff Epler]

Maybe something for sets like 'appendlist' ('unionset'?)

I do not follow. Can you provide a pure python equivalent?
Raymond

Jul 18 '05 #10

Raymond Hettinger

[Roose]

I like this, it is short, low impact, and makes things more readable. I
tend to go with just the literal way of doing it instead of using get and
setdefault, which I find awkward.
Thanks. Many people find setdefault() to be an oddball.

But alas I had a my short, low impact, useful suggestion and I think it
died. It was for any() and all() for lists. Actually Google just released
their "functional.py" module on code.google.com with the exact same thing.
Except they are missing the identity as a default which is very useful, i.e.
any(lst, f=lambda x: x) instead of any(lst, f).

Maybe you can tack that onto your PEP :)

Py2.5 is already going to include any() and all() as builtins. The signature
does not include a function, identity or otherwise. Instead, the caller can
write a listcomp or genexp that evaluates to True or False:

any(x >= 42 for x in data)

If you wanted an identify function, that simplifies to just:

any(data)
Raymond Hettinger

Jul 18 '05 #11

Brian van den Broek

Raymond Hettinger said unto the world upon 2005-03-18 20:24:

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

The rationale is to replace the awkward and slow existing idioms
for dictionary based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(key, []).extend(values)

<SNIP>
Hi all,

I am *far* less experienced with Python and programming than those
who've weighed in as yet.

I quite like count, though I agree with posts up thread that `count'
might not be the best name.

For appendlist, I would have expected

def appendlist(self, key, sequence):
try:
self[key].extend(sequence)
except KeyError:
self[key] = list(sequence)
I am, however, very open to the possibility that this says more about
my level of experience than it does about which way is best :-)

Best to all,

Brian vdB

Jul 18 '05 #12

Bengt Richter

On Sat, 19 Mar 2005 03:14:07 GMT, bo**@oz.net (Bengt Richter) wrote:
[...]

Yes, but duck typing for any obj that supports "+" gets you a lot, ISTM at this stage
of this BF ;-)

Just in case, by "this BF," I meant to refer to my addval idea,
with no offensive charaterization of anyone else's ideas intended ;-)

Regards,
Bengt Richter

Jul 18 '05 #13

Michele Simionato

+1 for inc instead of count.
appendlist seems a bit too specific (I do not use dictionaries of lists
that often).
The problem with setdefault is the name, not the functionality.
get_or_set would be a better name: we could use it as an alias for
setdefault and then remove setdefault in Python 3000.

Just my 2 Eurocents,

Michele Simionato

Jul 18 '05 #14

Paul Rubin

"Michele Simionato" <mi***************@gmail.com> writes:

+1 for inc instead of count.

I'd prefer incr or increment to inc. add is also ok. count isn't so great.
Something like add_count or inc_count or add_num or whatever could be ok.

Jul 18 '05 #15

Michael Spencer

Raymond Hettinger wrote:

I would like to get everyone's thoughts on two new dictionary methods:
+1 count
? appendlist

The proposed names could possibly be improved (perhaps tally() is more active
and clear than count()).
IMO 'tally' is exactly the right method name

One issue is with negative increments reaching zero i.e., deleting the key once
the tally goes to 0: this behavior would match the automatic key creation.
The appendlist() method is not as versatile as setdefault() which can be used

Simpler list initialization augmentation would be very handy - but I would also
like the equivalent functionality sets (don't care about dicts of dicts though).

Given the difference in set/list augmentation methods, this may present a
challenge. Alternatively, perhaps dict_of_list and dict_of_set could become
specialized containers - they might then have value_append/value_extend and
value_add methods respectively. I imagine (without any basis in fact) that it
would also be possible the optimize the performance of large mappings of
containers compared with the generic dict.
Michael

Jul 18 '05 #16

Raymond Hettinger

[Michele Simionato]

+1 for inc instead of count.
Any takers for tally()?

We should avoid abbreviations like inc() or incr() that different people tend to
abbreviate differently (for example, that is why the new partial() function has
its "keywords" argument spelled-out). The only other issue I see with that name
is that historically incrementing is more associated with +=1 than with +=n.
Also, there are reasonable use cases for a negative n and it would be misleading
to call it incrementing when decrementing is what is intended.

The issue with add() is that other types with that method use it for a radically
different purpose. For example, aSet.add(n) is not at all similar in function
to the proposed aDict.tally(n) or whatever it ends up being called. Of course,
count() is also problematic because the meaning doesn't parallel that for
list.count().
appendlist seems a bit too specific (I do not use dictionaries of lists
that often).
I'm curious. When you do use setdefault, what is the typical second argument?
In all the code I've encountered, nine times out of ten it is []. In the rare
case of {}, the resulting statement is a mess because both the subkey and value
need to be applied -- a pure python equivalent is much clearer. That leaves two
other mutable containers, set() and collections.deque() neither of which I've
ever seen used with setdefault().

IOW, I believe that, in practice, setdefault() is all about dictionaries of
lists. If so, I'm recommending a method that gets straight to the point with no
fuss, no waste, and no obfuscation.

In order to have some unused and unneeded versatility with respect to the
default object, I'm asserting that we've been burdened with an awkward, slow
idiom that is unnecesarily hard to learn and explain.
The problem with setdefault is the name, not the functionality.
Are you happy with the readability of the argument order? To me, the key and
default value are not at all related. Do you prefer having the default value
pre-instantiated on every call when the effort is likely to be wasted? Do you
like the current design of returning an object and then making a further (second
dot) method lookup and call for append or extend? When you first saw setdefault
explained, was it immediately obvious or did it taking more learning effort than
other dictionary methods? To me, it is the least explainable dictionary method.
Even when given a good definition of setdefault(), it is not immediately obvious
that it is meant to be futher combined with append() or some such. When showing
code to newbies or non-pythonistas, do they find the meaning of the current
idiom self-evident? That last question is not compelling, but it does contrast
with other Python code which tends to be grokkable by non-pythonistas and
clients.
get_or_set would be a better name: we could use it as an alias for
setdefault and then remove setdefault in Python 3000.
While get_or_set would be a bit of an improvement, it is still obtuse.
Eventhough a set operation only occurs conditionally, the get always occurs.
The proposed name doesn't make it clear that the method alway returns an object.

Even if a wording is found that better describes the both the get and set
operation, it is still a distractor from the intent of the combined statement,
the intent of building up a list. That is an intrinsic wording limitation that
cannot be solved by a better name for setdefault. If any change is made at all,
we ought to go the distance and provide a better designed tool rather than just
a name change.
Just my 2 Eurocents,

I raise you by a ruble and a pound ;-)
Raymond Hettinger

Jul 18 '05 #17

Paul Rubin

"Raymond Hettinger" <vz******@verizon.net> writes:

[Michele Simionato]
+1 for inc instead of count.
Any takers for tally()?

I'd say "tally" has some connotation of a counter that can never go
negative. I don't know if that behavior is desirable. Someone suggested
deleting the key if the tally is decremented to 0. I'd suggest instead
throwing an exception on an attempt to decrement it to less than 0.
We should avoid abbreviations like inc() or incr() that different
people tend to abbreviate differently (for example, that is why the
new partial() function has its "keywords" argument spelled-out).
Ok, "increment" then.
The only other issue I see with that name is that historically
incrementing is more associated with +=1 than with +=n. Also, there
are reasonable use cases for a negative n and it would be misleading
to call it incrementing when decrementing is what is intended.
Setting the default to 1 is enough for that. I mean, adding a negative
number to something is normally called "subtraction", but you can still
pass a negative argument to __iadd__.
The issue with add() is that other types with that method use it for
a radically different purpose. For example, aSet.add(n) is not at
all similar in function to the proposed aDict.tally(n)
Hmm, ok.
I'm curious. When you do use setdefault, what is the typical second
argument? In all the code I've encountered, nine times out of ten
it is [].

Yeah, me too.

Jul 18 '05 #18

Reinhold Birkenfeld

Raymond Hettinger wrote:

[Michele Simionato]
+1 for inc instead of count.

Any takers for tally()?

Well, as a non-native speaker, I had to look up this one in my
dictionary. That said, it may be bad luck on my side, but it may be that
this word is relatively uncommon and there are many others who would be
happier with increment.

Reinhold

Jul 18 '05 #19

Roose

> +1 for inc instead of count.

appendlist seems a bit too specific (I do not use dictionaries of lists
that often).

No way, I use that all the time. I use that more than count, I would say.

Roose

Jul 18 '05 #20

Raymond Hettinger

> > d.count(key, qty)

d.appendlist(key, *values)

[Bengt Richter] How about an efficient duck-typing value-incrementer to replace both?
There is some Zen of Python that argues against this interesting idea. Also, I'm
concerned that by folding appendlist() into valadd() we would lose an important
cue that a list is being built-up.

Another issue is that duck-typed multiple-dispatch is only readable when the
type of the input argument is obvious from the surrounding code. Given
d.valadd(x), it is hard to grok if x was created by some code far away. Since a
primary goal is readability and clarity, having two separate, concrete methods
is likely better than having a single more-abstracted multi-purpose method. The
performance gains are just icing on the cake.
I'm thinking the idea that the counting is happening with the value corresponding to the key should be emphasised more. Hence valadd or such?

How about countkey() or tabulate()?

Raymond Hettinger

Jul 18 '05 #21

Michele Simionato

Raymond Hettinger:

Any takers for tally()?
Dunno, to me "tally" reads "counts the numbers of votes for a candidate
in an election".
We should avoid abbreviations like inc() or incr() that different people tend to abbreviate differently (for example, that is why the new partial() function has its "keywords" argument spelled-out). The only other issue I see with that name is that historically incrementing is more associated with +=1 than with +=n. Also, there are reasonable use cases for a negative n and it would be misleading to call it incrementing when decrementing is what is intended.
I agree with Paul Rubin's argument on that issue, let's use increment()
and do not
worry about negative increments.

appendlist seems a bit too specific (I do not use dictionaries of lists that often).

I'm curious. When you do use setdefault, what is the typical second

argument?

Well, I have used setdefault *very few times* in years of heavy Python
usage.
His disappearence would not bother me that much. Grepping my source
code I find that practically
my main use case for setdefault is in a memoize recipe where the result
of a function call
is stored in a dictionary (if not already there) and returned. Then I
have a second case
with a list as second argument.

The problem with setdefault is the name, not the functionality.

Are you happy with the readability of the argument order? To me, the

key and default value are not at all related. Do you prefer having the default value pre-instantiated on every call when the effort is likely to be wasted? Do you like the current design of returning an object and then making a further (second dot) method lookup and call for append or extend? When you first saw setdefault explained, was it immediately obvious or did it taking more learning effort than other dictionary methods? To me, it is the least explainable dictionary method. Even when given a good definition of setdefault(), it is not immediately obvious that it is meant to be futher combined with append() or some such. When showing code to newbies or non-pythonistas, do they find the meaning of the current idiom self-evident? That last question is not compelling, but it does contrast with other Python code which tends to be grokkable by non-pythonistas and clients.
get_or_set would be a better name: we could use it as an alias for
setdefault and then remove setdefault in Python 3000.

While get_or_set would be a bit of an improvement, it is still

obtuse. Eventhough a set operation only occurs conditionally, the get always occurs. The proposed name doesn't make it clear that the method alway returns an object.

Honestly, I don't care about the performance arguments. However I care
a lot about
about readability and clarity. setdefault is terrible in this respect,
since most
of the time it does *not* set a default, it just get a value. So I am
always confused
and I have to read at the documentation to remind to myself what it is
doing. The
only right name would be "get_and_possibly_set" but it is a bit long to
type.
Even if a wording is found that better describes the both the get and set operation, it is still a distractor from the intent of the combined statement, the intent of building up a list. That is an intrinsic wording limitation that cannot be solved by a better name for setdefault. If any change is made at all, we ought to go the distance and provide a better designed tool rather than just a name change.

Well, I never figured out that the intent of setdefault was to build up
a list ;)

Anyway, if I think at how many times I have used setdefault in my code
(practically
twice) and how much time I have spent trying to decipher it (any time I
reread the
code using it) I think I would have better served by NOT having the
setdefault
method available ;)

About appendlist(): still it seems a bit special purpose to me. I mean,
dictionaries
already have lots of methods and I would think twice before adding new
ones; expecially
methods that may turn out not that useful in the long range, or easily
replaceble by
user code.
Michele Simionato

Jul 18 '05 #22

Paul Rubin

Reinhold Birkenfeld <re************************@wolke7.net> writes:

Any takers for tally()?

Well, as a non-native speaker, I had to look up this one in my
dictionary. That said, it may be bad luck on my side, but it may be that
this word is relatively uncommon and there are many others who would be
happier with increment.

It is sort of an uncommon word. As a US English speaker I'd say it
sounds a bit old-fashioned, except when used idiomatically ("let's
tally up the posts about accumulator messages") or in nonstandard
dialect ("Hey mister tally man, tally me banana" is a song about
working on plantations in Jamaica). It may be more common in UK
English. There's an expression "tally-ho!" which had something to do
with British fox hunts, but they don't have those any more.

I'd say I prefer most of the suggested alternatives (count, add,
incr/increment) to "tally".

Jul 18 '05 #23

Roose

> Py2.5 is already going to include any() and all() as builtins. The
signature

does not include a function, identity or otherwise. Instead, the caller can write a listcomp or genexp that evaluates to True or False:

any(x >= 42 for x in data)

If you wanted an identify function, that simplifies to just:

any(data)

Oh great, I just saw that. I was referring to this, which didn't get much
discussion:

http://mail.python.org/pipermail/pyt...ry/051556.html

but it looks like it went much further, to builtins! I'm surprised.

But I wish it could be included in Python 2.4.x. I really hope it won't
have any bugs in it. :) At my job we are probably going to upgrade to 2.4,
and that takes a long time, so it'll probably be a year or 18 months after
that happens (which itself might be months from now) that we would consider
upgrading again. Oh well...

Jul 18 '05 #24

Raymond Hettinger

[Michele Simionato]

Dunno, to me "tally" reads "counts the numbers of votes for a candidate
in an election".
That isn't a pleasant image ;-)
The
only right name would be "get_and_possibly_set" but it is a bit long to
type.
Even if a wording is found that better describes the both the get and
set operation, it is still a distractor from the intent of the combined
statement, the intent of building up a list. That is an intrinsic wording
limitation that cannot be solved by a better name for setdefault.
If any change is made at all, we ought to go the distance and provide a
better designed tool rather than just a name change.

Well, I never figured out that the intent of setdefault was to build up
a list ;)

Right! What does have that intent is the full statement: d.setdefault(k,
[]).append(v).

My thought is that setdefault() is rarely used by itself. Instead, it is
typically part of a longer sentence whose intent and meaning is to accumulate or
build-up. That meaning is not well expressed by the current idiom.

Raymond Hettinger

Jul 18 '05 #25

Raymond Hettinger

> > Py2.5 is already going to include any() and all() as builtins. The

signature does not include a function, identity or otherwise.
Instead, the caller can
write a listcomp or genexp that evaluates to True or False:

any(x >= 42 for x in data)

[Roose] Oh great, I just saw that. . . . But I wish it could be included in Python 2.4.x.

If it is any consolation, the any() can already be expressed somewhat cleanly
and efficiently in Py2.4 with genexps:

True in (x >= 42 for x in data)

The translation for all() is a little less elegant:

False not in (x >= 42 for x in data)
Raymond Hettinger

Jul 18 '05 #26

Roose

> Py2.5 is already going to include any() and all() as builtins. The
signature

does not include a function, identity or otherwise. Instead, the caller can write a listcomp or genexp that evaluates to True or False:
Actually I was just looking at Python 2.5 docs since you mentioned this.

http://www.python.org/dev/doc/devel/whatsnew/node3.html

It says min() and max() will gain a key function parameter, and sort()
gained one in Python 2.4 (news to me).

And they do indeed default to the identity in all 3 cases, so this seems
very inconsistent. If one of them has it, and sort gained the argument even
in Python 2.4 with generator expressions, then they all should have it.
any(x >= 42 for x in data)

Not to belabor the point, but in the example on that page, max(L, key=len)
could be written max(len(x) for x in L).

Now I know why Guido said he didn't want a PEP for this... such a trivial
thing can produce a lot of opinions. : )

Roose

Jul 18 '05 #27

Peter Otten

Roose wrote:

Not to belabor the point, but in the example on that page, max(L, key=len)
could be written max(len(x) for x in L).

No, it can't:

Python 2.5a0 (#2, Mar 5 2005, 17:44:37)
[GCC 3.3.3 (SuSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

max(["a", "bbb", "cc"], key=len)

'bbb'

Peter

Jul 18 '05 #28

Raymond Hettinger

[Roose]

Actually I was just looking at Python 2.5 docs since you mentioned this.

http://www.python.org/dev/doc/devel/whatsnew/node3.html

It says min() and max() will gain a key function parameter, and sort()
gained one in Python 2.4 (news to me).
It also appears in itertools.groupby() and, for Py2.5, in heapq.nsmallest() and
heapq.nlargest().

And they do indeed default to the identity in all 3 cases, so this seems
very inconsistent. If one of them has it, and sort gained the argument even
in Python 2.4 with generator expressions, then they all should have it.
any(x >= 42 for x in data)

Not to belabor the point, but in the example on that page, max(L, key=len)
could be written max(len(x) for x in L).

Think about it. A key= function is quite a different thing. It provides a
*temporary* comparison key while retaining the original value. IOW, your
re-write is incorrect:

L = ['the', 'quick', 'brownish', 'toad']
max(L, key=len) 'brownish' max(len(x) for x in L)

8
Remain calm. Keep the faith. Guido's design works fine.

No important use cases were left unserved by any() and all().

Raymond Hettinger

Jul 18 '05 #29

El Pitonero

On Sat, 19 Mar 2005 01:24:57 GMT, "Raymond Hettinger"
<vz******@verizon.net> wrote:

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

Bengt Richter wrote:

>>> class xdict(dict):

... def valadd(self, key, incr=1):
... try: self[key] = self[key] + type(self[key])(incr)
... except KeyError: self[key] = incr

What about:

import copy
class safedict(dict):
def __init__(self, default=None):
self.default = default
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return copy.copy(self.default)

x = safedict(0)
x[3] += 1
y = safedict([])
y[5] += range(3)
print x, y
print x[123], y[234]

Jul 18 '05 #30

Kent Johnson

Bengt Richter wrote:

On Sat, 19 Mar 2005 01:24:57 GMT, "Raymond Hettinger" <vz******@verizon.net> wrote:
I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

How about an efficient duck-typing value-incrementer to replace both? E.g. functionally like:
>>> class xdict(dict):

... def valadd(self, key, incr=1):
... try: self[key] = self[key] + type(self[key])(incr)
... except KeyError: self[key] = incr

A big problem with this is that there are reasonable use cases for both
d.count(key, <some integer>)
and
d.appendlist(key, <some integer>)

Word counting is an obvious use for the first. Consolidating a list of key, value pairs where the
values are ints requires the second.

Combining count() and appendlist() into one function eliminates the second possibility.

Kent

Jul 18 '05 #31

Carl Banks

Raymond Hettinger wrote:

I would like to get everyone's thoughts on two new dictionary methods:
def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

Emphatic +1

I use both of these idioms all the time. (Kind of surprised to see
people confused about the need for the latter; I do it regularly.)
This is just the kind of thing experience shows cropping up enough that
it makes sense to put it in the language.

About the names: Seeing that these have specific uses, and do something
that is hard to explain in one word, I would suggest that short names
like count might betray the complexity of the operations. Therefore,
I'd suggest:

increment_value() (or add_to_value())
append_to_value()

Although they don't explicitly communicate that a value would be
created if it didn't exist, they do at least make it clear that it
happens to the value, which kind of implies that it would be created.

If we do have to use short names:

I don't like increment (or inc or incr) at all because it has the air
of a mutator method. Maybe it's just my previous experience with Java
and C++, but to me, a.incr() looks like it's incrementing a, and
a.incr(b) looks like it might be adding b to a. I don't like count
because it's too vague; it's pretty obvious what it does as an
iterator, but not as a method of dict. I could live with tally,
though. As for a short name for the other one, maybe fileas or
fileunder?
--
CARL BANKS

Jul 18 '05 #32

Kent Johnson

Brian van den Broek wrote:

Raymond Hettinger said unto the world upon 2005-03-18 20:24:
I would like to get everyone's thoughts on two new dictionary methods:

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

For appendlist, I would have expected

def appendlist(self, key, sequence):
try:
self[key].extend(sequence)
except KeyError:
self[key] = list(sequence)

The original proposal reads better at the point of call when values is a single item. In my
experience this will be the typical usage:
d.appendlist(key, 'some value')

as opposed to your proposal which has to be written
d.appendlist(key, ['some value'])

The original allows values to be a sequence using
d.appendlist(key, *value_list)

Kent

Jul 18 '05 #33

Pierre Barbier de Reuille

Ivan Van Laningham a écrit :

Hi All--
Maybe I'm not getting it, but I'd think a better name for count would be
add. As in

d.add(key)
d.add(key,-1)
d.add(key,399)
etc.

[...]

There is no existing add() method for dictionaries. Given the name
change, I'd like to see it.

Metta,
Ivan
I don't think "add" is a good name ... even if it doesn't exist in
dictionnarie, it exists in sets and, IMHO, this would add confusion ...

Pierre

----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.pauahtun.org/
http://www.andi-holmes.com/
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours

Jul 18 '05 #34

Dan Sommers

On Sat, 19 Mar 2005 01:24:57 GMT,
"Raymond Hettinger" <vz******@verizon.net> wrote:

The proposed names could possibly be improved (perhaps tally() is more
active and clear than count()).

Curious that in this lengthy discussion, a method name of "accumulate"
never came up. I'm not sure how to separate the two cases (accumulating
scalars vs. accumulating a list), though.

Regards,
Dan

--
Dan Sommers
<http://www.tombstonezero.net/dan/>
Î¼â‚€ Ã— Îµâ‚€ Ã— cÂ² = 1

Jul 18 '05 #35

Jeff Epler

> [Jeff Epler]

Maybe something for sets like 'appendlist' ('unionset'?)

On Sat, Mar 19, 2005 at 04:18:43AM +0000, Raymond Hettinger wrote: I do not follow. Can you provide a pure python equivalent?

Here's what I had in mind:

$ python /tmp/unionset.py
Set(['set', 'self', 'since', 's', 'sys', 'source', 'S', 'Set', 'sets', 'starting'])

#------------------------------------------------------------------------
try:
set
except:
from sets import Set as set

def unionset(self, key, *values):
try:
self[key].update(values)
except KeyError:
self[key] = set(values)

if __name__ == '__main__':
import sys, re
index = {}

# We need a source of words. This file will do.
corpus = open(sys.argv[0]).read()
words = re.findall('\w+', corpus)

# Create an index of the words according to the first letter.
# repeated words are listed once since the values are sets
for word in words:
unionset(index, word[0].lower(), word)

# Display the words starting with 'S'
print index['s']
#------------------------------------------------------------------------

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFCPDCNJd01MZaTXX0RArwwAJ49TWEKx9zWBR/ZP+O0vik13LdB7QCfbVpy
2U26jFyYPFwWbBnlXrcnFck=
=1s9E
-----END PGP SIGNATURE-----

Jul 18 '05 #36

Ivan Van Laningham

Hi All--

Raymond Hettinger wrote:

[Michele Simionato]
+1 for inc instead of count.

Any takers for tally()?

Sure. Given the reasons for avoiding add(), tally()'s a much better
choice than count().

What about d.tally(key,0) then? Deleting the key as was suggested by
Michael Spencer seems non-intuitive to me.

Just my 2 Eurocents,

I raise you by a ruble and a pound ;-)

<hardly-anything-is-worth-less-than-vietnamese-dong>-ly y'rs,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/worksh...oceedings.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours

Jul 18 '05 #37

Peter Hansen

Michele Simionato wrote:

+1 for inc instead of count.
-1 for inc, increment, or anything that carries a
connotation of *increasing* the value, so long as
the proposal allows for negative numbers to be
involved. "Incrementing by -1" is a pretty silly
picture.

+1 for add and, given the above, I'm unsure there's
a viable alternative (unless this is restricted to
positive values, or perhaps even to "+1" specifically).
appendlist seems a bit too specific (I do not use dictionaries of lists
that often).
As Raymond does, I use this much more than the other.
The problem with setdefault is the name, not the functionality.
get_or_set would be a better name: we could use it as an alias for
setdefault and then remove setdefault in Python 3000.

Agreed...

-Peter

Jul 18 '05 #38

Reinhold Birkenfeld

Peter Hansen wrote:

Michele Simionato wrote:
+1 for inc instead of count.

-1 for inc, increment, or anything that carries a
connotation of *increasing* the value, so long as
the proposal allows for negative numbers to be
involved. "Incrementing by -1" is a pretty silly
picture.

+1 for add and, given the above, I'm unsure there's
a viable alternative (unless this is restricted to
positive values, or perhaps even to "+1" specifically).

What about `addto()'? add() just has the connotation of adding something
to the dict and not to an item in it.

Reinhold

Jul 18 '05 #39

Peter Hansen

Reinhold Birkenfeld wrote:

Peter Hansen wrote:
+1 for add and, given the above, I'm unsure there's
a viable alternative (unless this is restricted to
positive values, or perhaps even to "+1" specifically).

What about `addto()'? add() just has the connotation of adding something
to the dict and not to an item in it.

Hmm... better than add anyway. I take back my ill-considered
+1 above, and apply instead a +0 to "count". I don't actually
like any of the alternatives at this point... needs more thought
(for my part, anyway).

To be honest, the only time I've ever seen this particular
idiom is in tutorial code or examples of how you produce
a histogram of word usage in a text document. Never in real
code (not that it doesn't happen, just that I've never
stumbled across it). The "appending to a list" idiom, on
the other hand, I've seen and used quite often.

I'm just going to stay out of the "add/inc/count/addto"
debate and consider the other half of the thread now. :-)

-Peter

Jul 18 '05 #40

Raymond Hettinger

[Jeff Epler]

Maybe something for sets like 'appendlist' ('unionset'?)

While this could work and potentially be useful, I think it is better to keep
the proposal focused on the two common use cases. Adding a third would reduce
the chance of acceptance.

Also, in all of my code base, I've not run across a single opportunity to use
something like unionset(). This is surprising because I'm the set() author and
frequently use set based algorithms. Your example was a good one and I can
also image a graph represented as a dictionary of sets. Still, I don't mind
writing out the plain Python for this one if it only comes up once in a blue
moon.
Raymond

Jul 18 '05 #41

Raymond Hettinger

[Dan Sommers]

Curious that in this lengthy discussion, a method name of "accumulate"
never came up. I'm not sure how to separate the two cases (accumulating
scalars vs. accumulating a list), though.

Separating the two cases is essential. Also, the wording should contain strong
cues that remind you of addition and of building a list.

For the first, how about addup():

d = {}
for word in text.split():
d.addup(word)
Raymond

Jul 18 '05 #42

Denis S. Otkidach

On 18 Mar 2005 21:03:52 -0800 Michele Simionato wrote:

MS> +1 for inc instead of count.
MS> appendlist seems a bit too specific (I do not use dictionaries of
MS> lists that often).

inc is too specific too.

MS> The problem with setdefault is the name, not the functionality.

The problem with functionality: d.setdefault(k, v) can't be used as
lvalue. If it could, we wouldn't need count/inc/add/tally method.

MS> get_or_set would be a better name: we could use it as an alias for
MS> setdefault and then remove setdefault in Python 3000.

What about d.get(k, setdefault=v) alternative? Not sure whether it's
good idea to overload get() method, just an idea.

--
Denis S. Otkidach
http://www.python.ru/ [ru]

Jul 18 '05 #43

Ivan Van Laningham

Hi All--

Raymond Hettinger wrote:

Separating the two cases is essential. Also, the wording should contain strong
cues that remind you of addition and of building a list.

For the first, how about addup():

d = {}
for word in text.split():
d.addup(word)

I still prefer tally(), despite perceived political connotations.
They're only connotations, after all, and tally() comprises both
positive and negative incrementing, whereas add() and addup() will tease
users into thinking they are only for incrementing.

What about adding another method, "setincrement()"?

d={}
d.setincrement(-1)
for word in text.split():
d.tally(word,1)
if word.lower() in ["a","an","the"]:
d.tally(word)

Not that there's any real utility in that.

Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.pauahtun.org/
http://www.andi-holmes.com/
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours

Jul 18 '05 #44

El Pitonero

Dan Sommers wrote:

On Sat, 19 Mar 2005 01:24:57 GMT,
"Raymond Hettinger" <vz******@verizon.net> wrote:
The proposed names could possibly be improved (perhaps tally() is more active and clear than count()).
Curious that in this lengthy discussion, a method name of

"accumulate" never came up. I'm not sure how to separate the two cases (accumulating scalars vs. accumulating a list), though.

Is it even necessary to use a method name?

import copy
class safedict(dict):
def __init__(self, default=None):
self.default = default
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return copy.copy(self.default)
x = safedict(0)
x[3] += 1
y = safedict([])
y[5] += range(3)
print x, y
print x[123], y[234]

Jul 18 '05 #45

Raymond Hettinger

[Ivan Van Laningham]

What about adding another method, "setincrement()"? . . .
Not that there's any real utility in that.

That was a short lived suggestion ;-)

Also, it would entail storing an extra value in the dictionary header. That
alone would be a killer.
Raymond

Jul 18 '05 #46

Paul McGuire

-1 on set increment.
I think this makes your intent much clearer:

..d={}
..for word in text.split():
.. d.tally(word)
.. if word.lower() in ["a","an","the"]:
.. d.tally(word,-1)

or perhaps simplest:

..d={}
..for word in text.split():
.. if word.lower() not in ["a","an","the"]:
.. d.tally(word)

Personally, I'm +1 for tally(), and possibly tallyList() and tallySet()
to complete the thought for the cumulative container cases. I think
there is something to be gained if these methods get named in some
similar manner.

For those dead set against tally() and its ilk, how about accum(),
accumList() and accumSet()?

-- Paul

Jul 18 '05 #47

Aahz

In article <JbL_d.8237$qN3.2116@trndny01>,
Raymond Hettinger <py****@rcn.com> wrote:

The proposed names could possibly be improved (perhaps tally() is more active
and clear than count()).

+1 tally()
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR

Jul 18 '05 #48

El Pitonero

Raymond Hettinger wrote:

Separating the two cases is essential. Also, the wording should contain strong cues that remind you of addition and of building a list.

For the first, how about addup():

d = {}
for word in text.split():
d.addup(word)

import copy
class safedict(dict):
def __init__(self, default=None):
self.default = default
def __getitem__(self, key):
if not self.has_key(key):
self[key] = copy.copy(self.default)
return dict.__getitem__(self, key)

text = 'a b c b a'
words = text.split()
counts = safedict(0)
positions = safedict([])
for i, word in enumerate(words):
counts[word] += 1
positions[word].append(i)

print counts, positions

Jul 18 '05 #49

Aahz

In article <mVQ_d.9216$u76.1850@trndny08>,
Raymond Hettinger <py****@rcn.com> wrote:

How about countkey() or tabulate()?

Those rank roughly equal to tally() for me, with a slight edge to these
two for clarity and a slight edge to tally() for conciseness.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR

Jul 18 '05 #50

Pre-PEP: Dictionary accumulator methods

Similar topics