473,763 Members | 6,772 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Pre-PEP: Dictionary accumulator methods

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self , key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

The rationale is to replace the awkward and slow existing idioms for dictionary
based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(ke y, []).extend(values )

In simplest form, those two statements would now be coded more readably as:

d.count(key)
d.appendlist(ke y, value)

In their multi-value forms, they would now be coded as:

d.count(key, qty)
d.appendlist(ke y, *values)

The error messages returned by the new methods are the same as those returned by
the existing idioms.

The get() method would continue to exist because it is useful for applications
other than accumulation.

The setdefault() method would continue to exist but would likely not make it
into Py3.0.
PROBLEMS BEING SOLVED
---------------------

The readability issues with the existing constructs are:

* They are awkward to teach, create, read, and review.
* Their wording tends to hide the real meaning (accumulation).
* The meaning of setdefault() 's method name is not self-evident.

The performance issues with the existing constructs are:

* They translate into many opcodes which slows them considerably.
* The get() idiom requires two dictionary lookups of the same key.
* The setdefault() idiom instantiates a new, empty list prior to every call.
* That new list is often not needed and is immediately discarded.
* The setdefault() idiom requires an attribute lookup for extend/append.
* The setdefault() idiom makes two function calls.

The latter issues are evident from a disassembly:
dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec')) 1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (get)
6 LOAD_NAME 2 (key)
9 LOAD_CONST 0 (0)
12 CALL_FUNCTION 2
15 LOAD_NAME 3 (qty)
18 BINARY_ADD
19 LOAD_NAME 0 (d)
22 LOAD_NAME 2 (key)
25 STORE_SUBSCR
26 LOAD_CONST 1 (None)
29 RETURN_VALUE
dis(compile('d. setdefault(key, []).extend(values )', '', 'exec'))

1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (setdefault)
6 LOAD_NAME 2 (key)
9 BUILD_LIST 0
12 CALL_FUNCTION 2
15 LOAD_ATTR 3 (extend)
18 LOAD_NAME 4 (values)
21 CALL_FUNCTION 1
24 POP_TOP
25 LOAD_CONST 0 (None)
28 RETURN_VALUE

In contrast, the proposed methods use only a single attribute lookup and
function call, they use only one dictionary lookup, they use very few opcodes,
and they directly access the accumulation functions, PyNumber_Add() or
PyList_Append() . IOW, the performance improvement matches the readability
improvement.
ISSUES
------

The proposed names could possibly be improved (perhaps tally() is more active
and clear than count()).

The appendlist() method is not as versatile as setdefault() which can be used
with other object types (perhaps for creating dictionaries of dictionaries).
However, most uses I've seen are with lists. For other uses, plain Python code
suffices in terms of speed, clarity, and avoiding unnecessary instantiation of
empty containers:

if key not in d:
d.key = {subkey:value}
else:
d[key][subkey] = value

Raymond Hettinger
Jul 18 '05
125 7208
> > d.count(key, qty)
d.appendlist(ke y, *values)

[Bengt Richter] How about an efficient duck-typing value-incrementer to replace both?
There is some Zen of Python that argues against this interesting idea. Also, I'm
concerned that by folding appendlist() into valadd() we would lose an important
cue that a list is being built-up.

Another issue is that duck-typed multiple-dispatch is only readable when the
type of the input argument is obvious from the surrounding code. Given
d.valadd(x), it is hard to grok if x was created by some code far away. Since a
primary goal is readability and clarity, having two separate, concrete methods
is likely better than having a single more-abstracted multi-purpose method. The
performance gains are just icing on the cake.
I'm thinking the idea that the counting is happening with the value corresponding to the key should be emphasised more. Hence valadd or such?


How about countkey() or tabulate()?

Raymond Hettinger
Jul 18 '05 #21
Raymond Hettinger:
Any takers for tally()?
Dunno, to me "tally" reads "counts the numbers of votes for a candidate
in an election".
We should avoid abbreviations like inc() or incr() that different people tend to abbreviate differently (for example, that is why the new partial() function has its "keywords" argument spelled-out). The only other issue I see with that name is that historically incrementing is more associated with +=1 than with +=n. Also, there are reasonable use cases for a negative n and it would be misleading to call it incrementing when decrementing is what is intended.
I agree with Paul Rubin's argument on that issue, let's use increment()
and do not
worry about negative increments.
appendlist seems a bit too specific (I do not use dictionaries of lists that often).


I'm curious. When you do use setdefault, what is the typical second

argument?

Well, I have used setdefault *very few times* in years of heavy Python
usage.
His disappearence would not bother me that much. Grepping my source
code I find that practically
my main use case for setdefault is in a memoize recipe where the result
of a function call
is stored in a dictionary (if not already there) and returned. Then I
have a second case
with a list as second argument.
The problem with setdefault is the name, not the functionality.


Are you happy with the readability of the argument order? To me, the

key and default value are not at all related. Do you prefer having the default value pre-instantiated on every call when the effort is likely to be wasted? Do you like the current design of returning an object and then making a further (second dot) method lookup and call for append or extend? When you first saw setdefault explained, was it immediately obvious or did it taking more learning effort than other dictionary methods? To me, it is the least explainable dictionary method. Even when given a good definition of setdefault(), it is not immediately obvious that it is meant to be futher combined with append() or some such. When showing code to newbies or non-pythonistas, do they find the meaning of the current idiom self-evident? That last question is not compelling, but it does contrast with other Python code which tends to be grokkable by non-pythonistas and clients.
get_or_set would be a better name: we could use it as an alias for
setdefault and then remove setdefault in Python 3000.


While get_or_set would be a bit of an improvement, it is still

obtuse. Eventhough a set operation only occurs conditionally, the get always occurs. The proposed name doesn't make it clear that the method alway returns an object.

Honestly, I don't care about the performance arguments. However I care
a lot about
about readability and clarity. setdefault is terrible in this respect,
since most
of the time it does *not* set a default, it just get a value. So I am
always confused
and I have to read at the documentation to remind to myself what it is
doing. The
only right name would be "get_and_possib ly_set" but it is a bit long to
type.
Even if a wording is found that better describes the both the get and set operation, it is still a distractor from the intent of the combined statement, the intent of building up a list. That is an intrinsic wording limitation that cannot be solved by a better name for setdefault. If any change is made at all, we ought to go the distance and provide a better designed tool rather than just a name change.


Well, I never figured out that the intent of setdefault was to build up
a list ;)

Anyway, if I think at how many times I have used setdefault in my code
(practically
twice) and how much time I have spent trying to decipher it (any time I
reread the
code using it) I think I would have better served by NOT having the
setdefault
method available ;)

About appendlist(): still it seems a bit special purpose to me. I mean,
dictionaries
already have lots of methods and I would think twice before adding new
ones; expecially
methods that may turn out not that useful in the long range, or easily
replaceble by
user code.
Michele Simionato

Jul 18 '05 #22
Reinhold Birkenfeld <re************ ************@wo lke7.net> writes:
Any takers for tally()?


Well, as a non-native speaker, I had to look up this one in my
dictionary. That said, it may be bad luck on my side, but it may be that
this word is relatively uncommon and there are many others who would be
happier with increment.


It is sort of an uncommon word. As a US English speaker I'd say it
sounds a bit old-fashioned, except when used idiomatically ("let's
tally up the posts about accumulator messages") or in nonstandard
dialect ("Hey mister tally man, tally me banana" is a song about
working on plantations in Jamaica). It may be more common in UK
English. There's an expression "tally-ho!" which had something to do
with British fox hunts, but they don't have those any more.

I'd say I prefer most of the suggested alternatives (count, add,
incr/increment) to "tally".
Jul 18 '05 #23
> Py2.5 is already going to include any() and all() as builtins. The
signature
does not include a function, identity or otherwise. Instead, the caller can write a listcomp or genexp that evaluates to True or False:

any(x >= 42 for x in data)

If you wanted an identify function, that simplifies to just:

any(data)


Oh great, I just saw that. I was referring to this, which didn't get much
discussion:

http://mail.python.org/pipermail/pyt...ry/051556.html

but it looks like it went much further, to builtins! I'm surprised.

But I wish it could be included in Python 2.4.x. I really hope it won't
have any bugs in it. :) At my job we are probably going to upgrade to 2.4,
and that takes a long time, so it'll probably be a year or 18 months after
that happens (which itself might be months from now) that we would consider
upgrading again. Oh well...

Jul 18 '05 #24
[Michele Simionato]
Dunno, to me "tally" reads "counts the numbers of votes for a candidate
in an election".
That isn't a pleasant image ;-)
The
only right name would be "get_and_possib ly_set" but it is a bit long to
type.
Even if a wording is found that better describes the both the get and
set operation, it is still a distractor from the intent of the combined
statement, the intent of building up a list. That is an intrinsic wording
limitation that cannot be solved by a better name for setdefault.
If any change is made at all, we ought to go the distance and provide a
better designed tool rather than just a name change.


Well, I never figured out that the intent of setdefault was to build up
a list ;)


Right! What does have that intent is the full statement: d.setdefault(k,
[]).append(v).

My thought is that setdefault() is rarely used by itself. Instead, it is
typically part of a longer sentence whose intent and meaning is to accumulate or
build-up. That meaning is not well expressed by the current idiom.

Raymond Hettinger
Jul 18 '05 #25
> > Py2.5 is already going to include any() and all() as builtins. The
signature does not include a function, identity or otherwise.
Instead, the caller can
write a listcomp or genexp that evaluates to True or False:

any(x >= 42 for x in data)

[Roose] Oh great, I just saw that. . . . But I wish it could be included in Python 2.4.x.


If it is any consolation, the any() can already be expressed somewhat cleanly
and efficiently in Py2.4 with genexps:

True in (x >= 42 for x in data)

The translation for all() is a little less elegant:

False not in (x >= 42 for x in data)
Raymond Hettinger
Jul 18 '05 #26
> Py2.5 is already going to include any() and all() as builtins. The
signature
does not include a function, identity or otherwise. Instead, the caller can write a listcomp or genexp that evaluates to True or False:
Actually I was just looking at Python 2.5 docs since you mentioned this.

http://www.python.org/dev/doc/devel/whatsnew/node3.html

It says min() and max() will gain a key function parameter, and sort()
gained one in Python 2.4 (news to me).

And they do indeed default to the identity in all 3 cases, so this seems
very inconsistent. If one of them has it, and sort gained the argument even
in Python 2.4 with generator expressions, then they all should have it.
any(x >= 42 for x in data)


Not to belabor the point, but in the example on that page, max(L, key=len)
could be written max(len(x) for x in L).

Now I know why Guido said he didn't want a PEP for this... such a trivial
thing can produce a lot of opinions. : )

Roose
Jul 18 '05 #27
Roose wrote:
Not to belabor the point, but in the example on that page, max(L, key=len)
could be written max(len(x) for x in L).


No, it can't:

Python 2.5a0 (#2, Mar 5 2005, 17:44:37)
[GCC 3.3.3 (SuSE Linux)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.
max(["a", "bbb", "cc"], key=len)

'bbb'

Peter

Jul 18 '05 #28
[Roose]
Actually I was just looking at Python 2.5 docs since you mentioned this.

http://www.python.org/dev/doc/devel/whatsnew/node3.html

It says min() and max() will gain a key function parameter, and sort()
gained one in Python 2.4 (news to me).
It also appears in itertools.group by() and, for Py2.5, in heapq.nsmallest () and
heapq.nlargest( ).

And they do indeed default to the identity in all 3 cases, so this seems
very inconsistent. If one of them has it, and sort gained the argument even
in Python 2.4 with generator expressions, then they all should have it.
any(x >= 42 for x in data)


Not to belabor the point, but in the example on that page, max(L, key=len)
could be written max(len(x) for x in L).


Think about it. A key= function is quite a different thing. It provides a
*temporary* comparison key while retaining the original value. IOW, your
re-write is incorrect:
L = ['the', 'quick', 'brownish', 'toad']
max(L, key=len) 'brownish' max(len(x) for x in L)

8
Remain calm. Keep the faith. Guido's design works fine.

No important use cases were left unserved by any() and all().

Raymond Hettinger
Jul 18 '05 #29
On Sat, 19 Mar 2005 01:24:57 GMT, "Raymond Hettinger"
<vz******@veriz on.net> wrote:
I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self , key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)


Bengt Richter wrote:
>>> class xdict(dict):

... def valadd(self, key, incr=1):
... try: self[key] = self[key] + type(self[key])(incr)
... except KeyError: self[key] = incr


What about:

import copy
class safedict(dict):
def __init__(self, default=None):
self.default = default
def __getitem__(sel f, key):
try:
return dict.__getitem_ _(self, key)
except KeyError:
return copy.copy(self. default)

x = safedict(0)
x[3] += 1
y = safedict([])
y[5] += range(3)
print x, y
print x[123], y[234]

Jul 18 '05 #30

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

21
10222
by: Headless | last post by:
I've marked up song lyrics with the <pre> tag because it seems the most appropriate type of markup for the type of data. This results in inefficient use of horizontal space due to UA's default rendering of <pre> in a fixed width font. To change that I'd have to specify a proportional font family, thereby falling into the size pitfall that is associated with any sort of author specified font family: a) If I specify a sans serif font...
7
18535
by: Alan Illeman | last post by:
How do I set several different properties for PRE in a CSS stylesheet, rather than resorting to this: <BODY> <PRE STYLE="font-family:monospace; font-size:0.95em; width:40%; border:red 2px solid; color:red;
2
2789
by: Buck Turgidson | last post by:
I want to have a css with 2 PRE styles, one bold with large font, and another non-bold and smaller font. I am new to CSS (and not exactly an expert in HTML, for that matter). Is there a way to do this in CSS? <STYLE TYPE="text/css"> pre{ font-size:xx-large;
5
718
by: Michael Shell | last post by:
Greetings, Consider the XHTML document attached at the end of this post. When viewed under Firefox 1.0.5 on Linux, highlighting and pasting (into a text editor) the <pre> tag listing will preserve formatting (white space and line feeds). However, this is not true when doing the same with the <code> tag listing (it will all be pasted on one line with multiple successive spaces treated as a single space) despite the fact that...
8
3789
by: Jarno Suni not | last post by:
It seems to be invalid in HTML 4.01, but valid in XHTML 1.0. Why is there the difference? Can that pose a problem when such a XHTML document is served as text/html?
7
2749
by: Rocky Moore | last post by:
I have a web site called HintsAndTips.com. On this site people post tips using a very simply webform with a multi line TextBox for inputing the tip text. This text is encode to HTML so that no tags will remain making the page safe (I have to convert the linefeeds to <BR>s because the Server.EncodeHTML does not do that it seems). The problem is that users can use a special tag when editing the top to specify an area of the tip that will...
9
5548
by: Eric Lindsay | last post by:
I can't figure how to best display little snippets of shell script using <pre>. I just got around to organising to bulk validate some of my web pages, and one of the problems occurs with Bash shell pieces like this: <pre><code> #!/bin/sh ftp -i -n ftp.server.com&lt; &lt;EOF user username password epsv4 cd /
23
3648
by: Xah Lee | last post by:
The Concepts and Confusions of Pre-fix, In-fix, Post-fix and Fully Functional Notations Xah Lee, 2006-03-15 Let me summarize: The LISP notation, is a functional notation, and is not a so-called pre-fix notation or algebraic notation. Algebraic notations have the concept of operators, meaning, symbols placed around arguments. In algebraic in-fix notation, different
14
3632
by: Schraalhans Keukenmeester | last post by:
I am building a default sheet for my linux-related pages. Since many linux users still rely on/prefer viewing textmode and unstyled content I try to stick to the correct html tags to pertain good readibility on browsers w/o css-support. For important notes, warnings etc I use the <pre> tag, which shows in a neat bordered box when viewed with css, and depending on its class a clarifying background-image is shown. I would like the...
0
10144
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9937
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9822
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8821
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7366
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5270
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5405
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3917
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3522
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.