473,770 Members | 5,880 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Pre-PEP: Dictionary accumulator methods

I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self , key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

The rationale is to replace the awkward and slow existing idioms for dictionary
based accumulation:

d[key] = d.get(key, 0) + qty
d.setdefault(ke y, []).extend(values )

In simplest form, those two statements would now be coded more readably as:

d.count(key)
d.appendlist(ke y, value)

In their multi-value forms, they would now be coded as:

d.count(key, qty)
d.appendlist(ke y, *values)

The error messages returned by the new methods are the same as those returned by
the existing idioms.

The get() method would continue to exist because it is useful for applications
other than accumulation.

The setdefault() method would continue to exist but would likely not make it
into Py3.0.
PROBLEMS BEING SOLVED
---------------------

The readability issues with the existing constructs are:

* They are awkward to teach, create, read, and review.
* Their wording tends to hide the real meaning (accumulation).
* The meaning of setdefault() 's method name is not self-evident.

The performance issues with the existing constructs are:

* They translate into many opcodes which slows them considerably.
* The get() idiom requires two dictionary lookups of the same key.
* The setdefault() idiom instantiates a new, empty list prior to every call.
* That new list is often not needed and is immediately discarded.
* The setdefault() idiom requires an attribute lookup for extend/append.
* The setdefault() idiom makes two function calls.

The latter issues are evident from a disassembly:
dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec')) 1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (get)
6 LOAD_NAME 2 (key)
9 LOAD_CONST 0 (0)
12 CALL_FUNCTION 2
15 LOAD_NAME 3 (qty)
18 BINARY_ADD
19 LOAD_NAME 0 (d)
22 LOAD_NAME 2 (key)
25 STORE_SUBSCR
26 LOAD_CONST 1 (None)
29 RETURN_VALUE
dis(compile('d. setdefault(key, []).extend(values )', '', 'exec'))

1 0 LOAD_NAME 0 (d)
3 LOAD_ATTR 1 (setdefault)
6 LOAD_NAME 2 (key)
9 BUILD_LIST 0
12 CALL_FUNCTION 2
15 LOAD_ATTR 3 (extend)
18 LOAD_NAME 4 (values)
21 CALL_FUNCTION 1
24 POP_TOP
25 LOAD_CONST 0 (None)
28 RETURN_VALUE

In contrast, the proposed methods use only a single attribute lookup and
function call, they use only one dictionary lookup, they use very few opcodes,
and they directly access the accumulation functions, PyNumber_Add() or
PyList_Append() . IOW, the performance improvement matches the readability
improvement.
ISSUES
------

The proposed names could possibly be improved (perhaps tally() is more active
and clear than count()).

The appendlist() method is not as versatile as setdefault() which can be used
with other object types (perhaps for creating dictionaries of dictionaries).
However, most uses I've seen are with lists. For other uses, plain Python code
suffices in terms of speed, clarity, and avoiding unnecessary instantiation of
empty containers:

if key not in d:
d.key = {subkey:value}
else:
d[key][subkey] = value

Raymond Hettinger
Jul 18 '05
125 7223
Bengt Richter wrote:
On Sat, 19 Mar 2005 01:24:57 GMT, "Raymond Hettinger" <vz******@veriz on.net> wrote:
I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self , key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

How about an efficient duck-typing value-incrementer to replace both? E.g. functionally like:
>>> class xdict(dict):

... def valadd(self, key, incr=1):
... try: self[key] = self[key] + type(self[key])(incr)
... except KeyError: self[key] = incr


A big problem with this is that there are reasonable use cases for both
d.count(key, <some integer>)
and
d.appendlist(ke y, <some integer>)

Word counting is an obvious use for the first. Consolidating a list of key, value pairs where the
values are ints requires the second.

Combining count() and appendlist() into one function eliminates the second possibility.

Kent
Jul 18 '05 #31

Raymond Hettinger wrote:
I would like to get everyone's thoughts on two new dictionary methods:
def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self , key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

Emphatic +1

I use both of these idioms all the time. (Kind of surprised to see
people confused about the need for the latter; I do it regularly.)
This is just the kind of thing experience shows cropping up enough that
it makes sense to put it in the language.

About the names: Seeing that these have specific uses, and do something
that is hard to explain in one word, I would suggest that short names
like count might betray the complexity of the operations. Therefore,
I'd suggest:

increment_value () (or add_to_value())
append_to_value ()

Although they don't explicitly communicate that a value would be
created if it didn't exist, they do at least make it clear that it
happens to the value, which kind of implies that it would be created.

If we do have to use short names:

I don't like increment (or inc or incr) at all because it has the air
of a mutator method. Maybe it's just my previous experience with Java
and C++, but to me, a.incr() looks like it's incrementing a, and
a.incr(b) looks like it might be adding b to a. I don't like count
because it's too vague; it's pretty obvious what it does as an
iterator, but not as a method of dict. I could live with tally,
though. As for a short name for the other one, maybe fileas or
fileunder?
--
CARL BANKS

Jul 18 '05 #32
Brian van den Broek wrote:
Raymond Hettinger said unto the world upon 2005-03-18 20:24:
I would like to get everyone's thoughts on two new dictionary methods:

def appendlist(self , key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

For appendlist, I would have expected

def appendlist(self , key, sequence):
try:
self[key].extend(sequenc e)
except KeyError:
self[key] = list(sequence)


The original proposal reads better at the point of call when values is a single item. In my
experience this will be the typical usage:
d.appendlist(ke y, 'some value')

as opposed to your proposal which has to be written
d.appendlist(ke y, ['some value'])

The original allows values to be a sequence using
d.appendlist(ke y, *value_list)

Kent
Jul 18 '05 #33
Ivan Van Laningham a écrit :
Hi All--
Maybe I'm not getting it, but I'd think a better name for count would be
add. As in

d.add(key)
d.add(key,-1)
d.add(key,399)
etc.

[...]

There is no existing add() method for dictionaries. Given the name
change, I'd like to see it.

Metta,
Ivan
I don't think "add" is a good name ... even if it doesn't exist in
dictionnarie, it exists in sets and, IMHO, this would add confusion ...

Pierre

----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.pauahtun.org/
http://www.andi-holmes.com/
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours

Jul 18 '05 #34
On Sat, 19 Mar 2005 01:24:57 GMT,
"Raymond Hettinger" <vz******@veriz on.net> wrote:
The proposed names could possibly be improved (perhaps tally() is more
active and clear than count()).


Curious that in this lengthy discussion, a method name of "accumulate "
never came up. I'm not sure how to separate the two cases (accumulating
scalars vs. accumulating a list), though.

Regards,
Dan

--
Dan Sommers
<http://www.tombstoneze ro.net/dan/>
μ₀ × ε₀ × c² = 1
Jul 18 '05 #35
> [Jeff Epler]
Maybe something for sets like 'appendlist' ('unionset'?)

On Sat, Mar 19, 2005 at 04:18:43AM +0000, Raymond Hettinger wrote: I do not follow. Can you provide a pure python equivalent?


Here's what I had in mind:

$ python /tmp/unionset.py
Set(['set', 'self', 'since', 's', 'sys', 'source', 'S', 'Set', 'sets', 'starting'])

#------------------------------------------------------------------------
try:
set
except:
from sets import Set as set

def unionset(self, key, *values):
try:
self[key].update(values)
except KeyError:
self[key] = set(values)

if __name__ == '__main__':
import sys, re
index = {}

# We need a source of words. This file will do.
corpus = open(sys.argv[0]).read()
words = re.findall('\w+ ', corpus)

# Create an index of the words according to the first letter.
# repeated words are listed once since the values are sets
for word in words:
unionset(index, word[0].lower(), word)

# Display the words starting with 'S'
print index['s']
#------------------------------------------------------------------------

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFCPDCNJd0 1MZaTXX0RArwwAJ 49TWEKx9zWBR/ZP+O0vik13LdB7Q CfbVpy
2U26jFyYPFwWbBn lXrcnFck=
=1s9E
-----END PGP SIGNATURE-----

Jul 18 '05 #36
Hi All--

Raymond Hettinger wrote:

[Michele Simionato]
+1 for inc instead of count.


Any takers for tally()?


Sure. Given the reasons for avoiding add(), tally()'s a much better
choice than count().

What about d.tally(key,0) then? Deleting the key as was suggested by
Michael Spencer seems non-intuitive to me.
Just my 2 Eurocents,


I raise you by a ruble and a pound ;-)


<hardly-anything-is-worth-less-than-vietnamese-dong>-ly y'rs,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/worksh...oceedings.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
Jul 18 '05 #37
Michele Simionato wrote:
+1 for inc instead of count.
-1 for inc, increment, or anything that carries a
connotation of *increasing* the value, so long as
the proposal allows for negative numbers to be
involved. "Incrementi ng by -1" is a pretty silly
picture.

+1 for add and, given the above, I'm unsure there's
a viable alternative (unless this is restricted to
positive values, or perhaps even to "+1" specifically).
appendlist seems a bit too specific (I do not use dictionaries of lists
that often).
As Raymond does, I use this much more than the other.
The problem with setdefault is the name, not the functionality.
get_or_set would be a better name: we could use it as an alias for
setdefault and then remove setdefault in Python 3000.


Agreed...

-Peter
Jul 18 '05 #38
Peter Hansen wrote:
Michele Simionato wrote:
+1 for inc instead of count.


-1 for inc, increment, or anything that carries a
connotation of *increasing* the value, so long as
the proposal allows for negative numbers to be
involved. "Incrementi ng by -1" is a pretty silly
picture.

+1 for add and, given the above, I'm unsure there's
a viable alternative (unless this is restricted to
positive values, or perhaps even to "+1" specifically).


What about `addto()'? add() just has the connotation of adding something
to the dict and not to an item in it.

Reinhold
Jul 18 '05 #39
Reinhold Birkenfeld wrote:
Peter Hansen wrote:
+1 for add and, given the above, I'm unsure there's
a viable alternative (unless this is restricted to
positive values, or perhaps even to "+1" specifically).


What about `addto()'? add() just has the connotation of adding something
to the dict and not to an item in it.


Hmm... better than add anyway. I take back my ill-considered
+1 above, and apply instead a +0 to "count". I don't actually
like any of the alternatives at this point... needs more thought
(for my part, anyway).

To be honest, the only time I've ever seen this particular
idiom is in tutorial code or examples of how you produce
a histogram of word usage in a text document. Never in real
code (not that it doesn't happen, just that I've never
stumbled across it). The "appending to a list" idiom, on
the other hand, I've seen and used quite often.

I'm just going to stay out of the "add/inc/count/addto"
debate and consider the other half of the thread now. :-)

-Peter
Jul 18 '05 #40

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

21
10223
by: Headless | last post by:
I've marked up song lyrics with the <pre> tag because it seems the most appropriate type of markup for the type of data. This results in inefficient use of horizontal space due to UA's default rendering of <pre> in a fixed width font. To change that I'd have to specify a proportional font family, thereby falling into the size pitfall that is associated with any sort of author specified font family: a) If I specify a sans serif font...
7
18535
by: Alan Illeman | last post by:
How do I set several different properties for PRE in a CSS stylesheet, rather than resorting to this: <BODY> <PRE STYLE="font-family:monospace; font-size:0.95em; width:40%; border:red 2px solid; color:red;
2
2789
by: Buck Turgidson | last post by:
I want to have a css with 2 PRE styles, one bold with large font, and another non-bold and smaller font. I am new to CSS (and not exactly an expert in HTML, for that matter). Is there a way to do this in CSS? <STYLE TYPE="text/css"> pre{ font-size:xx-large;
5
718
by: Michael Shell | last post by:
Greetings, Consider the XHTML document attached at the end of this post. When viewed under Firefox 1.0.5 on Linux, highlighting and pasting (into a text editor) the <pre> tag listing will preserve formatting (white space and line feeds). However, this is not true when doing the same with the <code> tag listing (it will all be pasted on one line with multiple successive spaces treated as a single space) despite the fact that...
8
3791
by: Jarno Suni not | last post by:
It seems to be invalid in HTML 4.01, but valid in XHTML 1.0. Why is there the difference? Can that pose a problem when such a XHTML document is served as text/html?
7
2751
by: Rocky Moore | last post by:
I have a web site called HintsAndTips.com. On this site people post tips using a very simply webform with a multi line TextBox for inputing the tip text. This text is encode to HTML so that no tags will remain making the page safe (I have to convert the linefeeds to <BR>s because the Server.EncodeHTML does not do that it seems). The problem is that users can use a special tag when editing the top to specify an area of the tip that will...
9
5548
by: Eric Lindsay | last post by:
I can't figure how to best display little snippets of shell script using <pre>. I just got around to organising to bulk validate some of my web pages, and one of the problems occurs with Bash shell pieces like this: <pre><code> #!/bin/sh ftp -i -n ftp.server.com&lt; &lt;EOF user username password epsv4 cd /
23
3648
by: Xah Lee | last post by:
The Concepts and Confusions of Pre-fix, In-fix, Post-fix and Fully Functional Notations Xah Lee, 2006-03-15 Let me summarize: The LISP notation, is a functional notation, and is not a so-called pre-fix notation or algebraic notation. Algebraic notations have the concept of operators, meaning, symbols placed around arguments. In algebraic in-fix notation, different
14
3632
by: Schraalhans Keukenmeester | last post by:
I am building a default sheet for my linux-related pages. Since many linux users still rely on/prefer viewing textmode and unstyled content I try to stick to the correct html tags to pertain good readibility on browsers w/o css-support. For important notes, warnings etc I use the <pre> tag, which shows in a neat bordered box when viewed with css, and depending on its class a clarifying background-image is shown. I would like the...
0
9595
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10232
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10008
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9873
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8891
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7420
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5313
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3974
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2822
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.