Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

vatamane

This has been bothering me for a while. Just want to find out if it
just me or perhaps others have thought of this too: Why shouldn't the
keyset of a dictionary be represented as a set instead of a list? I
know that sets were introduced a lot later and lists/dictionaries were
used instead but I think "the only correct way" now is for the
dictionary keys and values to be sets. Presently {1:0,2:0,3:0}.keys()
will produce [1,2,3] but it could also produce [3,1,2] or [3,2,1] on a
different machine architecture. The Python documentation states that:
"""
Keys and values are listed in an _arbitrary_(my emphasis) order which
is non-random, varies across Python implementations, and depends on the
dictionary's history of insertions and deletions.
"""

So on the same machine it will be the case that: {1:0, 2:0,
3:0}.keys() == {3:0, 2:0, 1:0}.keys() is True. But if there are 2 lists
of keys of the same dictionary, one is pickled on another machine, with
a different "arbitrary non-random" ordering, then the keys will not be
equal to each other. It seems like the problem could be solved by
returning a set instead.

The same thing goes for the values(). Here most people will argue that
values are not necessarily unique, so a list is more appropriate, but
in fact the values are unique it is just that more than one key could
map to the same value. What is 'seen' in dictionary is the mapping
rule. Also the values are not ordered and should not be indexable --
they should be a set. Just as the keys(), one ordered list of values
from a dictionary on one machine will not necessarily be equal to
another list of values an on another machine, while in fact, they
should be.

On a more fundamental and general level, a dictionary is actually an
explicit function, also called a 'map'. A set of keys, traditionally
called a 'domain' are mapped to a set of values, traditionally called a
'range'. This mapping produces at most a surjective function (i.e. two
or more keys can map to same value and all the values are mapped to by
some keys). Notice that the traditional counterparts to keys and
values are sets and not just lists. This seems like theory babble, but
in the case of Python staying 'true' to the theory is usually a
GoodThing(tm).

I love Python primarily because it does something in only one, correct,
and reasonable way. The same principle should probably apply to Python
itself including to its built-in data structures.

Of course the compatibilty will be broken. Any code relying on a
certain ordering of keys() or values() returned would need to be
updated. One could argue though that such code was not entirely correct
in the first place to asssume that, and should be fixed anyway.

Obviously this fix should not be in Python 2.X but perhaps would be
worth considering for Python 3000. What does everyone think?

Jul 1 '06 #1

Subscribe Post Reply

3430

cmdrrickhunter

There's a few good reasons.
1 - golden handcuffs. Breaking old code is bad 90% of the time
2 - creating a set MAY be slower.

Python's sets seem to imply to that they will always be a hash map. in
this case, some creative hash map "mapping" could allow one to create a
set without calculating hash codes (make the set hashmap have the same
dimentions and rules as the dictionary one).
If there was intent to allow Python implementations to use trees for
the set, then a list is far faster to create (O(n) time instead of
O(nlogn)).

3 - using a set is sometimes slower (just as using a list is sometimes
slower)
I can't speak for your code, but this is the most common use of keys in
my coding:
# d is some dictionary
keys = d.keys()
keys.sort()
for k in keys:
#blah

sets cannot be sorted, while lists can. If keys() returned a set, then
I'd have to turn it into a list every time.

There's potential to add "views" to python (a key view of a dictionary
being a frozenset containing the dictionary's keys, which is updated
whenever the dictionary updates), but I think thats annother topic
which is out of the scope of your origional idea.

Jul 1 '06 #2

Scott David Daniels

cm************@yaho.com wrote:

> I can't speak for your code, but this is the most common use of keys in
my coding:
# d is some dictionary
keys = d.keys()
keys.sort()
for k in keys:
#blah

This you can rewrite quite effectively as:

for k in sorted(d):
#blah

--Scott David Daniels
sc***********@acm.org

Jul 1 '06 #3

Nick Vatamaniuc

1 - golden handcuffs. Breaking old code is bad 90% of the time
I agree with you on that, mostly for code that counted on list methods
of result of keys() - like you later show with sort. But there is a
side note: old code that assumed a particular ordering of the keys or
values is broken anyway. So even if ks={1:0,2:0,3:0}.keys() returns
[1,2,3] on my machine I should not do something like
'my_savings_account + ks[0]' That code should be fixed anyway, since on
a different machine it might produce different values for ks[0].

2 - creating a set MAY be slower.

Creating a set from the dictionary's keys should not be a lot slower
because the keys are already unique, there is no need to check each key
against the other keys just return them as a set. I assume this is
what you meant by "make the set hashmap have the same dimensions and
rules as the dictionary one". Perhaps a dictionary would internally
just copy its keys to the set and return it rather than construct as
set from scratch (with duplication checks and all).

>3 - using a set is sometimes slower

Again, depending how it is used. In your case you argue that you
usually sort the keys anyway so a list is more convinient. But
different use cases can call for differnent operations on the keys()
after they have been retrieved. What if someone wants to do an
intersection to find common keys with another dictionary, then a set
would be more appropriate. The intent of the set() type was to not temp
anyone into assuming an ordering of keys() just because a list is
indexable. And eliminate the need for a long footnote in the
documentation of the dict type that talks about 'an arbitrary
non-random ordering' - it takes while just to grasp what that means...

In general I believe that a small performance penalty is acceptable in
order to have a correct and consistent data type, especially for Python
i.e. I might not argue the same for Perl or C.

-Nick V.

cm************@yaho.com wrote:

There's a few good reasons.
1 - golden handcuffs. Breaking old code is bad 90% of the time
2 - creating a set MAY be slower.

Python's sets seem to imply to that they will always be a hash map. in
this case, some creative hash map "mapping" could allow one to create a
set without calculating hash codes (make the set hashmap have the same
dimentions and rules as the dictionary one).
If there was intent to allow Python implementations to use trees for
the set, then a list is far faster to create (O(n) time instead of
O(nlogn)).

3 - using a set is sometimes slower (just as using a list is sometimes
slower)
I can't speak for your code, but this is the most common use of keys in
my coding:
# d is some dictionary
keys = d.keys()
keys.sort()
for k in keys:
#blah

sets cannot be sorted, while lists can. If keys() returned a set, then
I'd have to turn it into a list every time.

There's potential to add "views" to python (a key view of a dictionary
being a frozenset containing the dictionary's keys, which is updated
whenever the dictionary updates), but I think thats annother topic
which is out of the scope of your origional idea.

Jul 1 '06 #4

Robert Kern

va******@gmail.com wrote:

The same thing goes for the values(). Here most people will argue that
values are not necessarily unique, so a list is more appropriate, but
in fact the values are unique it is just that more than one key could
map to the same value. What is 'seen' in dictionary is the mapping
rule. Also the values are not ordered and should not be indexable --
they should be a set. Just as the keys(), one ordered list of values
from a dictionary on one machine will not necessarily be equal to
another list of values an on another machine, while in fact, they
should be.

This part is pretty much a non-starter. Not all Python objects are hashable.

In [1]: d = {}

In [2]: for i in range(1, 10):
...: d[i] = range(i)
...:
...:

In [3]: set(d.values())
---------------------------------------------------------------------------
exceptions.TypeError Traceback (most recent call
last)

/Users/kern/<ipython console>

TypeError: list objects are unhashable
Also, I may need keys to map to different objects that happen to be equal.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Jul 1 '06 #5

John Machin

On 2/07/2006 6:01 AM, va******@gmail.com wrote:

This has been bothering me for a while.

[snip]

Summary of OP's post: d.keys() and d.values() should return sets in
Python 3.0.

Observations:
(1) My code [some of which dates back to Python 1.5.1] uses about 12 x
d.items() and 11 x d.keys() for each 1 x d.values()
(2) Most cases of d.X() don't need/want the untrammelled list and could
be replaced by d.iterX(). Example: max_frequency = max(tally.values())

Opinion: d.X() could be deprecated, but I'd rather see a
consciousness-raising for the d.iterX() methods, and for the construct
for key in d:

Cheers,
John

Jul 1 '06 #6

Roy Smith

"Nick Vatamaniuc" <va******@gmail.comwrote:

But there is a side note: old code that assumed a particular ordering
of the keys or values is broken anyway.

From a testing point of view, it would be interesting if there was a flag
which said, "Deliberately change everything which isn't guaranteed to be a
specific way". So, for example, dict.keys() would return a list in reverse
order of how it normally does it (even if it cost more to do it that way).
An alternate hash key generator would be slipped into place. Floating
point math would get a little noise added to the least significant bits.
And so on. Might be interesting to see what sorts of bugs that would shake
out from user code.

Jul 2 '06 #7

Nick Vatamaniuc

You are correct I should have thought of that. I still think the keys()
method should return a set() though.

Robert Kern wrote:

va******@gmail.com wrote:
The same thing goes for the values(). Here most people will argue that
values are not necessarily unique, so a list is more appropriate, but
in fact the values are unique it is just that more than one key could
map to the same value. What is 'seen' in dictionary is the mapping
rule. Also the values are not ordered and should not be indexable --
they should be a set. Just as the keys(), one ordered list of values
from a dictionary on one machine will not necessarily be equal to
another list of values an on another machine, while in fact, they
should be.

This part is pretty much a non-starter. Not all Python objects are hashable.

In [1]: d = {}

In [2]: for i in range(1, 10):
...: d[i] = range(i)
...:
...:

In [3]: set(d.values())
---------------------------------------------------------------------------
exceptions.TypeError Traceback (most recent call
last)

/Users/kern/<ipython console>

TypeError: list objects are unhashable
Also, I may need keys to map to different objects that happen to be equal.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Jul 2 '06 #8

Paul McGuire

<va******@gmail.comwrote in message
news:11**********************@75g2000cwc.googlegro ups.com...

This has been bothering me for a while. Just want to find out if it
just me or perhaps others have thought of this too: Why shouldn't the
keyset of a dictionary be represented as a set instead of a list?

I think this is an interesting suggestion. Of course, the current situation
is as much a product of historical progression as anything: lists and dicts
pre-date sets, so the collection of keys had to be returned as a list.
Since lists also carry some concept of order to them, the documentation for
the list returned by dict.keys() went out of its way to say that the order
of the elements in the dict.keys() list had no bearing on the dict, the
insertion order of entries, or anything else, that the order of keys was
purely arbitrary.

In fact, there is not a little irony to this proposal, since it seems it was
just a few months ago that c.l.py had just about weekly requests for how to
create an "ordered dict," with various ideas of how a dict should be
ordered, but most intended to preserve the order of insertion of items into
the dict. And now here we have just about the opposite proposal - dicts
should not only *not* be ordered, they should revel in their disorderliness.

I liked the example in which the OP (of this thread) wanted to compare the
keys from two different dicts, for equality of keys. Since the keys()
method returns a set of indeterminate order, we can't simply perform
dictA.keys() == dictB.keys(). But how often does this really happen? In
practice, I think the keys of a dict, when this collection is used at all,
are usually sorted and then iterated over, usually to prettyprint the keys
and values in the dict. Internally, this set of items shouldn't even exist
as a separate data structure - the dict's keys are merely labels on nodes in
some sort of hash tree.

Now that we really do have sets in Python, if one were to design dicts all
over again, it does seem like set would be a better choice than list for the
type returned by the keys() method. To preserve compatibility, would a
second method, keyset(), do the trick? The OP used this term himself,
referring to "the keyset of the dictionary". Still given the few cases
where I would access this as a true set, using "set(dictA.keys())" doesn't
seem to be that big a hardship, and its obviousness will probably undercut
any push for a separate keyset() method.

-- Paul

Jul 2 '06 #9

Paddy

va******@gmail.com wrote:

This has been bothering me for a while. Just want to find out if it
just me or perhaps others have thought of this too: Why shouldn't the
keyset of a dictionary be represented as a set instead of a list?

I think the order of the items returned by keys() and values() are
related. I decided on a short empirical test:

>>import random
n=50
d = dict((i,random.randint(0,n-1)) for i in range(n))
k,v = d.keys(), d.values()
[d[k[i]] == v[i] for i in range(n)]

[True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True]

>>## for larger n try
# False in [d[k[i]] == v[i] for i in range(n)]

The order of keys() and the order of values() are related, so a change
to returning sets would also loose this functionality.

Mind you, Never rely on that implied ordering. Always use items().

And so *if* sets were returned for keys() and values() then it should
for items() too.

- Paddy.

Jul 2 '06 #10

bearophileHUGS

Paddy:

Mind you, Never rely on that implied ordering. Always use items().

Using dict.items() is probably better, but the manual says:

>If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond. This allows the creation of (value, key) pairs using zip(): "pairs = zip(a.values(), a.keys())". The same relationship holds for the iterkeys() and itervalues() methods:<

Is this going to change?
dict.keyset() seems nice, but you usually don't want to make a too much
big API.
Keeping APIs small is very important in Python, otherwise you need the
manual to write code.
I think a better solution to solve such key set problems is to optimize
Python itself, so Python computes set(dict) really fast (it can just
"copies" the hash of the dict).

Bye,
bearophile

Jul 2 '06 #11

Simon Forman

Nick Vatamaniuc wrote:

Robert Kern wrote:
va******@gmail.com wrote:
The same thing goes for the values(). Here most people will argue that

....

This part is pretty much a non-starter. Not all Python objects are hashable.

....

Also, I may need keys to map to different objects that happen to be equal.

--
Robert Kern

So, values() can't return a set because of (at least) the two reasons
given above. And since, as Scott David Daniels pointed out, dicts
support the iterator protocol, you can ask for a set of the keys easily
enough if you want it:

>>d = dict(a=1, b=2, c=3)
set(d)

set(['a', 'c', 'b'])

So, IMHO, there's not much point to having keys() return a set.
Peace,
~Simon

Jul 2 '06 #12

Terry Reedy

The meaning of dict.keys, etc, will not change for the 2.x series. For
3.0, I believe that Guido already intends that .keys() no longer return a
separate list. For one thing, a major, if not the main use, of the method
is for iteration, as in 'for keys in d.keys():'. For this, creating and
them discarding a separate list is inefficient and, in a sense, silly.

One option is to make .keys be what .iterkeys is today. Another, more
recent, is to make .keys return a new iterable dict view object, details
not yet worked out. Either way, one would make an independent set or list
with set(d.keys()) or list(d.keys)).

Terry Jan Reedy

Jul 2 '06 #13

Piet van Oostrum

>>>>"Paddy" <pa*******@netscape.net(P) wrote:

>Pva******@gmail.com wrote:

>>This has been bothering me for a while. Just want to find out if it
just me or perhaps others have thought of this too: Why shouldn't the
keyset of a dictionary be represented as a set instead of a list?

>PI think the order of the items returned by keys() and values() are
Prelated. I decided on a short empirical test:

yes, it is documented that their order is related. In fact
d.items() == zip(d.keys(), d.values())
This wouldn't work with sets instead of lists.
--
Piet van Oostrum <pi**@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: pi**@vanoostrum.org

Jul 3 '06 #14

Antoon Pardon

On 2006-07-01, cm************@yaho.com <co**********@gmail.comwrote:

There's a few good reasons.
1 - golden handcuffs. Breaking old code is bad 90% of the time
2 - creating a set MAY be slower.

Python's sets seem to imply to that they will always be a hash map. in
this case, some creative hash map "mapping" could allow one to create a
set without calculating hash codes (make the set hashmap have the same
dimentions and rules as the dictionary one).
If there was intent to allow Python implementations to use trees for
the set, then a list is far faster to create (O(n) time instead of
O(nlogn)).

3 - using a set is sometimes slower (just as using a list is sometimes
slower)
I can't speak for your code, but this is the most common use of keys in
my coding:

# d is some dictionary
keys = d.keys()
keys.sort()
for k in keys:
#blah

Wouldn't you be better of with a tree instead of dictionary? Maybe
there are other reasons to prefer a dict, but have a look at:

http://www.pardon-sleeuwaegen.be/antoon/avltree.html

Suppose t is a tree implementing the same mapping a your dictionary
d above, the code would be:

# t is some tree
for k in t:
#blah

And the keys will be treated in order.
If you try it, let me know what you think of it.

--
Antoon Pardon

Jul 4 '06 #15

Similar topics

Case insensitive dictionary?

by: Elbert Lev | last post by:

Hi! Here is the problem: I have a dictionary. Keys are strings. How to make dictionary lookup case insensitive? In other words: If dict = {'First":"Bob", "Last":"Tom"}, dict should return...

Python

125

Pre-PEP: Dictionary accumulator methods

by: Raymond Hettinger | last post by:

I would like to get everyone's thoughts on two new dictionary methods: def count(self, value, qty=1): try: self += qty except KeyError: self = qty def appendlist(self, key, *values): try:

Python

Why is dictionary.keys() a list and not a set?

by: Christoph Zwerschke | last post by:

Ok, the answer is easy: For historical reasons - built-in sets exist only since Python 2.4. Anyway, I was thinking about whether it would be possible and desirable to change the old behavior in...

Python

Help with a reverse dictionary lookup

by: rh0dium | last post by:

Hi all, I have a dict which looks like this.. dict={'130nm': {'umc': }, '180nm': {'chartered': , 'tsmc': }, '250nm': {'umc': , 'tsmc': } }

Python

Parsing String, Dictionary Lookups, Writing to Database Table

by: Rich Shepard | last post by:

I need to learn how to process a byte stream from a form reader where each pair of bytes has meaning according to lookup dictionaries, then use the values to build an array of rows inserted into a...

Python

Is numeric keys of Python's dictionary automatically sorted?

by: John | last post by:

I am coding a radix sort in python and I think that Python's dictionary may be a choice for bucket. The only problem is that dictionary is a mapping without order. But I just found that if the...

Python

how can I clear a dictionary in python

by: Marko.Cain.23 | last post by:

Hi, I create a dictionary like this myDict = {} and I add entry like this: myDict = 1 but how can I empty the whole dictionary? Thank you.

Python

Finding lowest value in dictionary of objects, how?

by: davenet | last post by:

Hi, I'm new to Python and working on a school assignment. I have setup a dictionary where the keys point to an object. Each object has two member variables. I need to find the smallest value...

Python

why in returns values for array and keys for dictionary

by: ++imanshu | last post by:

Hi, Wouldn't it be nicer to have 'in' return values (or keys) for both arrays and dictionaries. Arrays and Dictionaries looked so similar in Python until I learned this difference. Thanks,...

Python

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp