473,549 Members | 2,679 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

This has been bothering me for a while. Just want to find out if it
just me or perhaps others have thought of this too: Why shouldn't the
keyset of a dictionary be represented as a set instead of a list? I
know that sets were introduced a lot later and lists/dictionaries were
used instead but I think "the only correct way" now is for the
dictionary keys and values to be sets. Presently {1:0,2:0,3:0}.k eys()
will produce [1,2,3] but it could also produce [3,1,2] or [3,2,1] on a
different machine architecture. The Python documentation states that:
"""
Keys and values are listed in an _arbitrary_(my emphasis) order which
is non-random, varies across Python implementations , and depends on the
dictionary's history of insertions and deletions.
"""

So on the same machine it will be the case that: {1:0, 2:0,
3:0}.keys() == {3:0, 2:0, 1:0}.keys() is True. But if there are 2 lists
of keys of the same dictionary, one is pickled on another machine, with
a different "arbitrary non-random" ordering, then the keys will not be
equal to each other. It seems like the problem could be solved by
returning a set instead.

The same thing goes for the values(). Here most people will argue that
values are not necessarily unique, so a list is more appropriate, but
in fact the values are unique it is just that more than one key could
map to the same value. What is 'seen' in dictionary is the mapping
rule. Also the values are not ordered and should not be indexable --
they should be a set. Just as the keys(), one ordered list of values
from a dictionary on one machine will not necessarily be equal to
another list of values an on another machine, while in fact, they
should be.

On a more fundamental and general level, a dictionary is actually an
explicit function, also called a 'map'. A set of keys, traditionally
called a 'domain' are mapped to a set of values, traditionally called a
'range'. This mapping produces at most a surjective function (i.e. two
or more keys can map to same value and all the values are mapped to by
some keys). Notice that the traditional counterparts to keys and
values are sets and not just lists. This seems like theory babble, but
in the case of Python staying 'true' to the theory is usually a
GoodThing(tm).

I love Python primarily because it does something in only one, correct,
and reasonable way. The same principle should probably apply to Python
itself including to its built-in data structures.

Of course the compatibilty will be broken. Any code relying on a
certain ordering of keys() or values() returned would need to be
updated. One could argue though that such code was not entirely correct
in the first place to asssume that, and should be fixed anyway.

Obviously this fix should not be in Python 2.X but perhaps would be
worth considering for Python 3000. What does everyone think?

Jul 1 '06 #1
14 3442
There's a few good reasons.
1 - golden handcuffs. Breaking old code is bad 90% of the time
2 - creating a set MAY be slower.

Python's sets seem to imply to that they will always be a hash map. in
this case, some creative hash map "mapping" could allow one to create a
set without calculating hash codes (make the set hashmap have the same
dimentions and rules as the dictionary one).
If there was intent to allow Python implementations to use trees for
the set, then a list is far faster to create (O(n) time instead of
O(nlogn)).

3 - using a set is sometimes slower (just as using a list is sometimes
slower)
I can't speak for your code, but this is the most common use of keys in
my coding:
# d is some dictionary
keys = d.keys()
keys.sort()
for k in keys:
#blah

sets cannot be sorted, while lists can. If keys() returned a set, then
I'd have to turn it into a list every time.

There's potential to add "views" to python (a key view of a dictionary
being a frozenset containing the dictionary's keys, which is updated
whenever the dictionary updates), but I think thats annother topic
which is out of the scope of your origional idea.

Jul 1 '06 #2
cm************@ yaho.com wrote:
> I can't speak for your code, but this is the most common use of keys in
my coding:
# d is some dictionary
keys = d.keys()
keys.sort()
for k in keys:
#blah
This you can rewrite quite effectively as:

for k in sorted(d):
#blah

--Scott David Daniels
sc***********@a cm.org
Jul 1 '06 #3
1 - golden handcuffs. Breaking old code is bad 90% of the time
I agree with you on that, mostly for code that counted on list methods
of result of keys() - like you later show with sort. But there is a
side note: old code that assumed a particular ordering of the keys or
values is broken anyway. So even if ks={1:0,2:0,3:0 }.keys() returns
[1,2,3] on my machine I should not do something like
'my_savings_acc ount + ks[0]' That code should be fixed anyway, since on
a different machine it might produce different values for ks[0].
2 - creating a set MAY be slower.
Creating a set from the dictionary's keys should not be a lot slower
because the keys are already unique, there is no need to check each key
against the other keys just return them as a set. I assume this is
what you meant by "make the set hashmap have the same dimensions and
rules as the dictionary one". Perhaps a dictionary would internally
just copy its keys to the set and return it rather than construct as
set from scratch (with duplication checks and all).
>3 - using a set is sometimes slower
Again, depending how it is used. In your case you argue that you
usually sort the keys anyway so a list is more convinient. But
different use cases can call for differnent operations on the keys()
after they have been retrieved. What if someone wants to do an
intersection to find common keys with another dictionary, then a set
would be more appropriate. The intent of the set() type was to not temp
anyone into assuming an ordering of keys() just because a list is
indexable. And eliminate the need for a long footnote in the
documentation of the dict type that talks about 'an arbitrary
non-random ordering' - it takes while just to grasp what that means...

In general I believe that a small performance penalty is acceptable in
order to have a correct and consistent data type, especially for Python
i.e. I might not argue the same for Perl or C.

-Nick V.

cm************@ yaho.com wrote:
There's a few good reasons.
1 - golden handcuffs. Breaking old code is bad 90% of the time
2 - creating a set MAY be slower.

Python's sets seem to imply to that they will always be a hash map. in
this case, some creative hash map "mapping" could allow one to create a
set without calculating hash codes (make the set hashmap have the same
dimentions and rules as the dictionary one).
If there was intent to allow Python implementations to use trees for
the set, then a list is far faster to create (O(n) time instead of
O(nlogn)).

3 - using a set is sometimes slower (just as using a list is sometimes
slower)
I can't speak for your code, but this is the most common use of keys in
my coding:
# d is some dictionary
keys = d.keys()
keys.sort()
for k in keys:
#blah

sets cannot be sorted, while lists can. If keys() returned a set, then
I'd have to turn it into a list every time.

There's potential to add "views" to python (a key view of a dictionary
being a frozenset containing the dictionary's keys, which is updated
whenever the dictionary updates), but I think thats annother topic
which is out of the scope of your origional idea.
Jul 1 '06 #4
va******@gmail. com wrote:
The same thing goes for the values(). Here most people will argue that
values are not necessarily unique, so a list is more appropriate, but
in fact the values are unique it is just that more than one key could
map to the same value. What is 'seen' in dictionary is the mapping
rule. Also the values are not ordered and should not be indexable --
they should be a set. Just as the keys(), one ordered list of values
from a dictionary on one machine will not necessarily be equal to
another list of values an on another machine, while in fact, they
should be.
This part is pretty much a non-starter. Not all Python objects are hashable.

In [1]: d = {}

In [2]: for i in range(1, 10):
...: d[i] = range(i)
...:
...:

In [3]: set(d.values())
---------------------------------------------------------------------------
exceptions.Type Error Traceback (most recent call
last)

/Users/kern/<ipython console>

TypeError: list objects are unhashable
Also, I may need keys to map to different objects that happen to be equal.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Jul 1 '06 #5
On 2/07/2006 6:01 AM, va******@gmail. com wrote:
This has been bothering me for a while.
[snip]

Summary of OP's post: d.keys() and d.values() should return sets in
Python 3.0.

Observations:
(1) My code [some of which dates back to Python 1.5.1] uses about 12 x
d.items() and 11 x d.keys() for each 1 x d.values()
(2) Most cases of d.X() don't need/want the untrammelled list and could
be replaced by d.iterX(). Example: max_frequency = max(tally.value s())

Opinion: d.X() could be deprecated, but I'd rather see a
consciousness-raising for the d.iterX() methods, and for the construct
for key in d:

Cheers,
John
Jul 1 '06 #6
"Nick Vatamaniuc" <va******@gmail .comwrote:
But there is a side note: old code that assumed a particular ordering
of the keys or values is broken anyway.
From a testing point of view, it would be interesting if there was a flag
which said, "Deliberate ly change everything which isn't guaranteed to be a
specific way". So, for example, dict.keys() would return a list in reverse
order of how it normally does it (even if it cost more to do it that way).
An alternate hash key generator would be slipped into place. Floating
point math would get a little noise added to the least significant bits.
And so on. Might be interesting to see what sorts of bugs that would shake
out from user code.
Jul 2 '06 #7
You are correct I should have thought of that. I still think the keys()
method should return a set() though.

Robert Kern wrote:
va******@gmail. com wrote:
The same thing goes for the values(). Here most people will argue that
values are not necessarily unique, so a list is more appropriate, but
in fact the values are unique it is just that more than one key could
map to the same value. What is 'seen' in dictionary is the mapping
rule. Also the values are not ordered and should not be indexable --
they should be a set. Just as the keys(), one ordered list of values
from a dictionary on one machine will not necessarily be equal to
another list of values an on another machine, while in fact, they
should be.

This part is pretty much a non-starter. Not all Python objects are hashable.

In [1]: d = {}

In [2]: for i in range(1, 10):
...: d[i] = range(i)
...:
...:

In [3]: set(d.values())
---------------------------------------------------------------------------
exceptions.Type Error Traceback (most recent call
last)

/Users/kern/<ipython console>

TypeError: list objects are unhashable
Also, I may need keys to map to different objects that happen to be equal.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
Jul 2 '06 #8
<va******@gmail .comwrote in message
news:11******** **************@ 75g2000cwc.goog legroups.com...
This has been bothering me for a while. Just want to find out if it
just me or perhaps others have thought of this too: Why shouldn't the
keyset of a dictionary be represented as a set instead of a list?
I think this is an interesting suggestion. Of course, the current situation
is as much a product of historical progression as anything: lists and dicts
pre-date sets, so the collection of keys had to be returned as a list.
Since lists also carry some concept of order to them, the documentation for
the list returned by dict.keys() went out of its way to say that the order
of the elements in the dict.keys() list had no bearing on the dict, the
insertion order of entries, or anything else, that the order of keys was
purely arbitrary.

In fact, there is not a little irony to this proposal, since it seems it was
just a few months ago that c.l.py had just about weekly requests for how to
create an "ordered dict," with various ideas of how a dict should be
ordered, but most intended to preserve the order of insertion of items into
the dict. And now here we have just about the opposite proposal - dicts
should not only *not* be ordered, they should revel in their disorderliness.

I liked the example in which the OP (of this thread) wanted to compare the
keys from two different dicts, for equality of keys. Since the keys()
method returns a set of indeterminate order, we can't simply perform
dictA.keys() == dictB.keys(). But how often does this really happen? In
practice, I think the keys of a dict, when this collection is used at all,
are usually sorted and then iterated over, usually to prettyprint the keys
and values in the dict. Internally, this set of items shouldn't even exist
as a separate data structure - the dict's keys are merely labels on nodes in
some sort of hash tree.

Now that we really do have sets in Python, if one were to design dicts all
over again, it does seem like set would be a better choice than list for the
type returned by the keys() method. To preserve compatibility, would a
second method, keyset(), do the trick? The OP used this term himself,
referring to "the keyset of the dictionary". Still given the few cases
where I would access this as a true set, using "set(dictA.keys ())" doesn't
seem to be that big a hardship, and its obviousness will probably undercut
any push for a separate keyset() method.

-- Paul
Jul 2 '06 #9

va******@gmail. com wrote:
This has been bothering me for a while. Just want to find out if it
just me or perhaps others have thought of this too: Why shouldn't the
keyset of a dictionary be represented as a set instead of a list?
I think the order of the items returned by keys() and values() are
related. I decided on a short empirical test:
>>import random
n=50
d = dict((i,random. randint(0,n-1)) for i in range(n))
k,v = d.keys(), d.values()
[d[k[i]] == v[i] for i in range(n)]
[True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True]
>>## for larger n try
# False in [d[k[i]] == v[i] for i in range(n)]
The order of keys() and the order of values() are related, so a change
to returning sets would also loose this functionality.

Mind you, Never rely on that implied ordering. Always use items().

And so *if* sets were returned for keys() and values() then it should
for items() too.

- Paddy.

Jul 2 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
12475
by: Elbert Lev | last post by:
Hi! Here is the problem: I have a dictionary. Keys are strings. How to make dictionary lookup case insensitive? In other words: If dict = {'First":"Bob", "Last":"Tom"}, dict should return "Bob"
125
7108
by: Raymond Hettinger | last post by:
I would like to get everyone's thoughts on two new dictionary methods: def count(self, value, qty=1): try: self += qty except KeyError: self = qty def appendlist(self, key, *values): try:
90
10728
by: Christoph Zwerschke | last post by:
Ok, the answer is easy: For historical reasons - built-in sets exist only since Python 2.4. Anyway, I was thinking about whether it would be possible and desirable to change the old behavior in future Python versions and let dict.keys() and dict.values() both return sets instead of lists. If d is a dict, code like: for x in d.keys():
8
7035
by: rh0dium | last post by:
Hi all, I have a dict which looks like this.. dict={'130nm': {'umc': }, '180nm': {'chartered': , 'tsmc': }, '250nm': {'umc': , 'tsmc': } }
3
2504
by: Rich Shepard | last post by:
I need to learn how to process a byte stream from a form reader where each pair of bytes has meaning according to lookup dictionaries, then use the values to build an array of rows inserted into a sqlite3 database table. Here's the context: The OMR card reader sends a stream of 69 bytes over the serial line; the last byte is a carriage...
11
11390
by: John | last post by:
I am coding a radix sort in python and I think that Python's dictionary may be a choice for bucket. The only problem is that dictionary is a mapping without order. But I just found that if the keys are numeric, the keys themselves are ordered in the dictionary. part of my code is like this: radix={} for i in range(256):
18
2949
by: Marko.Cain.23 | last post by:
Hi, I create a dictionary like this myDict = {} and I add entry like this: myDict = 1 but how can I empty the whole dictionary? Thank you.
5
6314
by: davenet | last post by:
Hi, I'm new to Python and working on a school assignment. I have setup a dictionary where the keys point to an object. Each object has two member variables. I need to find the smallest value contained in this group of objects. The objects are defined as follows:
10
1969
by: ++imanshu | last post by:
Hi, Wouldn't it be nicer to have 'in' return values (or keys) for both arrays and dictionaries. Arrays and Dictionaries looked so similar in Python until I learned this difference. Thanks, ++imanshu
0
7959
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7810
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6044
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5088
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3501
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3483
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1944
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1061
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
764
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.