469,306 Members | 2,463 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,306 developers. It's quick & easy.

Why is dictionary.keys() a list and not a set?

Ok, the answer is easy: For historical reasons - built-in sets exist
only since Python 2.4.

Anyway, I was thinking about whether it would be possible and desirable
to change the old behavior in future Python versions and let dict.keys()
and dict.values() both return sets instead of lists.

If d is a dict, code like:

for x in d.keys():
...

or even

for x in sorted(d.keys()):
...

would still work and do the same.

However, the older idiom

k = d.keys()
k.sort()
for x in k:
...

would break (since you cannot sort a map directly).

So it seems not possible to change the behavior since it would break old
code. Maybe keys() and values() could be made deprecated functions,
superseded by keyset() and valueset(), but probably this would not be
worth the effort.

Another related question: Actually, dicts can be considered as
subclasses of sets and thus could inherit a lot of set methods and
operators. Only intersection and union must be treated with care,
because dictionaries must be mappings (i.e. map to exactly one value).

In fact, some inheritance from sets has already been implemented:

For instance, by allowing the set operator "in" for dictionaries,
instead of "has_key".

The len(), clear(), copy() and update() functions can also be
interpreted as inherited from set. The dictionary method popitem() is
actually the inherited set method pop(). (d.pop() without arguments
could actually do the same as d.popitem(); I guess the suffix "item" was
chosen to remind you that it returns a key/value pair, not only a value.)

But could other set methods also be useful?

A dictionary could inherit the remove() and discard() method from sets.
d.remove(x) would do the same as del d[x], but d.discard(x) would not
raise a KeyError if x not in d.

Or, for instance, if

a = { 1:11, 2:12 }; b = { 2:22, 3:13 }, c = { 2:32 }

then

c.issubset(b) == c <= b == True
b.issuperset(c) == b >= c == True
a.difference(b) == a - b == { 1:11 }
a.s.symmetric_difference(b) == a ^ b == { 1:11, 3:13 }
a.update(b) == a |= b == a = { 1:11, 2:22, 3:13 }
a.intersection_update(b) == a &= b == a = { 2:22 }
a.difference_update(b) == a -= b == a = { 1:11 }
a.symmetric_difference_update(b) == a ^= b == a = { 1:11, 3:13 }

Of these, a |= b may be particularly interesting as short notation for
a.update(b).

-- Christoph
Nov 23 '05
90 10159
"Martin v. Lwis" <ma****@v.loewis.de> writes:
Mike Meyer wrote:
I trusted the doco which defines a set as "an unordered collection of
immutable values" (http://docs.python.org/lib/types-set.html).

If the hash changes, you've screwed up the set. What it really should
say is "collection of objects with fixed hashes", or words to that
effect. Do you want to file a PR on this?

It's more than that: you really mean "hashable values", where "hashable"
does not just imply that the hash value is fixed, but also that the
equal objects hash same.


I would have phrased it as "identical" instead of equal. Sets should
hold unique elements, where "unique" depends on the use at hand. For
some uses, "==" might be appropriate. For others, "is" might be
appropriate.

Would you care to suggest an improved wording?

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 24 '05 #51
Mike Meyer wrote:
Well, I just made a suggestion. You found the problem - you want to do
the PR, or should I?
Please go ahead if you have the time. By the way, the doco may be also
inexact about the keys of dictionaries.
# Untested code
class Hash(object):
def __new__(cls, obj):
if hasattr(obj, '__hash__'):
return obj
self.__obj = obj
return object.__new__()
def __getattr__(self, name):
return getattr(self.__obj, name)
def __setattr(self, name, value):
setattr(self.__obj, name, value)
def __hash__(self):
return id(self.__obj)
This will not work since even lists seem to have a __hash__ attribute.
Also, you will get an infinite recursion when trying to access
self.__obj. But principally, it should work. Ad hoc solution:

class HashProxy:
def __init__(self, obj):
self.__obj = obj
def __getattr__(self, name):
return getattr(self.__obj, name)
def __setattr(self, name, value):
setattr(self.__obj, name, value)
def __hash__(self):
return id(self.__obj)

def Hash(obj):
try:
hash(obj)
except TypeError:
return HashProxy(obj)
else:
return obj
If you need to turn a list of arbitrary, possibly unhashable, objects
into a set, is there a problem with the above?


Well, first you don't get the real unhashable objects, but proxied
objects. This will lead to problems if you want to check the type in the
following - you will always get the same type. Second, as you have
already mentioned, the elements of the sets will not be guaranteed to be
different anymore (only different in terms of identity (is), but they
still may be equal (==)). This would not be what you expect from a set.

-- Christoph
Nov 25 '05 #52
Martin v. Lwis schrieb:
Your original question was "could it be changed, and should it be
changed?" As the answer to the first part is already "no", the
second part really doesn't raise.
Right of course. The second part of the question was rather hypothetical
(in the sense of "if we could start with Python from scratch, would it
then be desirable").
In a set, all elements are different. In Python, this means that, for
any two elements X and Y of a set, X!=Y. Now, consider this set:
a = [1]
b = [1,2]
S = set(a,b)
a.append(2)
Now, a==b, so the inherent set property breaks. In theory, this should
cause S to contain only a single element; the other one should
disappear.
I just started to see these problems as well. I believed that the
immutability of set elements had only technical reasons (hashing etc.)
but your're right, it has also semantical reasons. In the above case, a
or b would have to magically disappear, and even if that would be
possible it would be completely unclear which of the two, and what would
happen if you subsequently do a.append(3)? Should it magically appear
again? So I understand you will get into very hot water if you try to
implement sets with mutable elements.
This is not only hard to implement (removal from a set
would have to remove all duplicates, iterating over a set would have
to skip over duplicates) - there is also another semantical issue:
If one element is skipped/dropped, which of these (a or b)?


I should have read your posting fully before writing the above ;-)

-- Christoph
Nov 25 '05 #53
Paul Rubin schrieb:
Yes, it's not considered terribly troublesome. Set theory (and
arithmetic) are generally accepted as being consistent but incomplete.
So there will always be something for mathemeticians to do ;-).


Somehow I found this being comforting. Even in the realm of Mathematics
there are things which can only be believed, but not be proven. Just as
in Physics there are things which can only be predicted with a certain
probability, but they are not predetermined completely.

But now we're really getting off topic.

-- Christoph
Nov 25 '05 #54
Martin v. Lwis wrote:
It follows from what is documented. set(<iterable object>) creates a
set that contains all elements in the iterable object:
http://docs.python.org/lib/built-in-funcs.html#l2h-63
Now, is a dictionary an iterable object? Yes, it is:
http://www.python.org/peps/pep-0234.html
Together, this gives the property I demonstrated.


You need to know a third thing, namely that as an iterable object, a
dictionary returns the keys only, not the key/value tuples which would
also make some sense. If it had been designed that way, you could write

for k, v in d:
print k, v

instead of:

for k in d:
print k, d[k]

What I wanted to say is that the doco could mention this possibility to
get the keys as a set at the place where it explains the keys() method.

-- Christoph
Nov 25 '05 #55

Christoph Zwerschke wrote:
je****@unpythonic.net schrieb:
You can already get a set from a dictionary's keys in an efficient manner:
>l = dict.fromkeys(range(10))
>set(l)

Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


Good point. I expected that set(l) = set(l.items()) and not
set(l.keys()), but the latter would not work with mutable values. See
discussion with Martin.

puzzled. items() return tuples which I believe can be element of set ?
Or I misread you ?

Nov 25 '05 #56
"bo****@gmail.com" <bo****@gmail.com> writes:
Christoph Zwerschke wrote:
je****@unpythonic.net schrieb:
> You can already get a set from a dictionary's keys in an efficient manner:
>>>>l = dict.fromkeys(range(10))
>>>>set(l)
> Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Good point. I expected that set(l) = set(l.items()) and not
set(l.keys()), but the latter would not work with mutable values. See
discussion with Martin.

puzzled. items() return tuples which I believe can be element of set ?
Or I misread you ?


Not all tuples can be elements of a set. Elements of a set have to be
hashable. Tuples compute their hash by hashing their contents. If
their contents aren't hashable, the tuple isn't hashable, and hence
can't be an element of a set. If the values in the dictionary aren't
hashable, then the tuples returned by items() won't be hashable
either, and hence can't be elements of a set.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 25 '05 #57

Mike Meyer wrote:
"bo****@gmail.com" <bo****@gmail.com> writes:
Christoph Zwerschke wrote:
je****@unpythonic.net schrieb:
> You can already get a set from a dictionary's keys in an efficient manner:
>>>>l = dict.fromkeys(range(10))
>>>>set(l)
> Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Good point. I expected that set(l) = set(l.items()) and not
set(l.keys()), but the latter would not work with mutable values. See
discussion with Martin.

puzzled. items() return tuples which I believe can be element of set ?
Or I misread you ?


Not all tuples can be elements of a set. Elements of a set have to be
hashable. Tuples compute their hash by hashing their contents. If
their contents aren't hashable, the tuple isn't hashable, and hence
can't be an element of a set. If the values in the dictionary aren't
hashable, then the tuples returned by items() won't be hashable
either, and hence can't be elements of a set.

ah, thanks. I was thinking that tuples being immutable object must be
hashable. So it is not about mutable but hashable.

Nov 25 '05 #58

Mike Meyer wrote:
"bo****@gmail.com" <bo****@gmail.com> writes:
Christoph Zwerschke wrote:
je****@unpythonic.net schrieb:
> You can already get a set from a dictionary's keys in an efficient manner:
>>>>l = dict.fromkeys(range(10))
>>>>set(l)
> Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Good point. I expected that set(l) = set(l.items()) and not
set(l.keys()), but the latter would not work with mutable values. See
discussion with Martin.

puzzled. items() return tuples which I believe can be element of set ?
Or I misread you ?


Not all tuples can be elements of a set. Elements of a set have to be
hashable. Tuples compute their hash by hashing their contents. If
their contents aren't hashable, the tuple isn't hashable, and hence
can't be an element of a set. If the values in the dictionary aren't
hashable, then the tuples returned by items() won't be hashable
either, and hence can't be elements of a set.

A related issue, from the doc :

Set elements are like dictionary keys; they need to define both
__hash__ and __eq__ methods.

and dir(any_tuple) would show __hash__ and __eq__, would that be a bit
confusing as even though tuple has these two properties(or whatever
terms it should be called), it really depends what it contains ?

Or in other words, tuple is hashable if and only if every object it
contains is also hashable ?

If that is the case, would it be better to correct the doc to say such
thing ?

Nov 25 '05 #59

bo****@gmail.com wrote:
Mike Meyer wrote:
"bo****@gmail.com" <bo****@gmail.com> writes:
Christoph Zwerschke wrote:
> je****@unpythonic.net schrieb:
> > You can already get a set from a dictionary's keys in an efficient manner:
> >>>>l = dict.fromkeys(range(10))
> >>>>set(l)
> > Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> Good point. I expected that set(l) = set(l.items()) and not
> set(l.keys()), but the latter would not work with mutable values. See
> discussion with Martin.
puzzled. items() return tuples which I believe can be element of set ?
Or I misread you ?


Not all tuples can be elements of a set. Elements of a set have to be
hashable. Tuples compute their hash by hashing their contents. If
their contents aren't hashable, the tuple isn't hashable, and hence
can't be an element of a set. If the values in the dictionary aren't
hashable, then the tuples returned by items() won't be hashable
either, and hence can't be elements of a set.

A related issue, from the doc :

Set elements are like dictionary keys; they need to define both
__hash__ and __eq__ methods.

and dir(any_tuple) would show __hash__ and __eq__, would that be a bit
confusing as even though tuple has these two properties(or whatever
terms it should be called), it really depends what it contains ?

Or in other words, tuple is hashable if and only if every object it
contains is also hashable ?

If that is the case, would it be better to correct the doc to say such
thing ?


And this, again from the doc(about mapping objects):

A mapping object maps immutable values to arbitrary objects.

Seems that is questionable too.

a=(1,[])
d={}
d[a]=1

again would give TypeError, list object are unhashable.

Nov 25 '05 #60
"bo****@gmail.com" <bo****@gmail.com> writes:
Not all tuples can be elements of a set. Elements of a set have to be
hashable. Tuples compute their hash by hashing their contents. If
their contents aren't hashable, the tuple isn't hashable, and hence
can't be an element of a set. If the values in the dictionary aren't
hashable, then the tuples returned by items() won't be hashable
either, and hence can't be elements of a set.

A related issue, from the doc :

Set elements are like dictionary keys; they need to define both
__hash__ and __eq__ methods.


This is wrong. The objects must have a hash value to be used as a set
element or a dictionary key. Objects without a __hash__ use their id
as a hash value if they don't define their own comparisons. This makes
sense, because objects that don't define their own comparisons use id
testing for equality testing.

The __eq__ comment is even worse - __eq__ is unused on set elements or
dictionary keys. However, the *semantics* of sets and keys are such
that if hash(x) == hash(y), then x == y should also be true. If that's
false, then lots of the statements about dictionaries and sets are
going to be wrong.

The behavior of tuples is a different issue: hashing a non-hashable
object raises a TypeError. So does hashing a tuple that contains a
non-hashable object. This doesn't matter for most containers: lists
aren't hashable in any case; strings, unicode strings and xrange's, as
they contain only hashable objects. Buffers are poorly documented.

The dictionary doc comment is closer to right:

A dictionary's keys are almost arbitrary values. Only values
containing lists, dictionaries or other mutable types (that are
compared by value rather than by object identity) may not be used as
keys.

Proposed new wording, for both places:

A dictionaries keys (set element) can be any hashable value. Immutable
objects are hashable, except that a tuple containing a non-hashable
object isn't hashable. Mutable objects are hashable if they define
__hash__ or if they don't define comparison. A class or type that
defines both comparison and __hash__ should define them so that two
objects that have the same hash compare equal, otherwise using them as
dictionary keys (set elements) will not have the expected behavior.

What do the rest of you think?

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 25 '05 #61

Duncan Booth wrote:
bo****@gmail.com wrote:
As for the (k,v) vs (v,k), I still don't think it is a good example. I
can always use index to access the tuple elements and most other
functions expect the first element to be the key. For example :

a=d.items()
do something about a
b = dict(a)

But using the imaginary example :

a = zip(d.values(), d.keys())
do something about a
b = dict(a) # probably not what I want.


The typical use case for the (v,k) tuple would be when you want sort the
contents of a dictionary by value. e.g. counting the number of occurrences
of each word in a document.

a = zip(d.values(), d.keys())
a.sort()
a.reverse()
print "Today's top 10:"
print str.join(',', [ k for (v,k) in a[:10]])

Ok, so today you can do that another way, but before there was a key
parameter to sort you had to build the tuple somehow:

print str.join(',', sorted(a.iterkeys(),
key=a.__getitem__, reverse=True)[:10])

and the advantage of the latter is of course that it works even when you've
been counting the number of occurences of complex numbers in a document :)


Thanks, so for newer python(2.3+?, don't know when the sort has been
enhanced), this (v,k) is again there for historical reason but not a
real use case any more.

Nov 25 '05 #62
bo****@gmail.com schrieb:
And this, again from the doc(about mapping objects):
A mapping object maps immutable values to arbitrary objects.
Seems that is questionable too.
a=(1,[])
d={}
d[a]=1
again would give TypeError, list object are unhashable.


That's a good example showing that the term 'mutable' is not so
well-defined as it may seem.

If you set b=[]; a=(1,b); should a be considered mutable (because you
can change its value by changing b), or should it be considered
immutable (because it is a tuple)?

-- Christoph
Nov 25 '05 #63
As a general note, I think it would be good to place the exact
description in a footnote, since speaking about hashable objects,
__hash__ and __cmp__ will certainly frighten off newbies and make it
hard to read even for experienced users. The main text may lie/simplyfy
a little bit.

Or as Donald Knuth once recommended:

"Simplify. Lie, if it helps. You can add the correct details later on,
but it is essential to present the reader with something straightforward
to start off with. So dont be afraid to bend the facts initially where
this leads to a useful simplification and then pay back the debt to
truth later by gradual elaborations."

However, since the Python docs are a reference and not a textbook or
manual, I think the debt should not be payed back "later", but
immediately in a footnote.

The docs already follow this principle very well, e.g.:
http://docs.python.org/lib/typesmapping.html

-- Christoph
Nov 25 '05 #64
Christoph Zwerschke <ci**@online.de> writes:
As a general note, I think it would be good to place the exact
description in a footnote, since speaking about hashable objects,
__hash__ and __cmp__ will certainly frighten off newbies and make it
hard to read even for experienced users. The main text may
lie/simplyfy a little bit.
Good point.
However, since the Python docs are a reference and not a textbook or
manual, I think the debt should not be payed back "later", but
immediately in a footnote.


Ok, how about this for dictionaries/sets:

Any hashable object can be used as a dictionary key (set member). Immutable
objects, except for tuples that contain a non-hashable object, are
hashable. Python classes are normally hashable(1).

And the footnote is:

Instances of Python classes are hashable if they define a __hash__
method, or if they don't define either __cmp__ or __eq__.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 25 '05 #65
Mike Meyer schrieb:
Ok, how about this for dictionaries/sets:

Any hashable object can be used as a dictionary key (set member). Immutable
objects, except for tuples that contain a non-hashable object, are
hashable. Python classes are normally hashable(1).
I would be even more simple here, like that:

The keys can be arbitrary immutable objects. Thus lists, sets or
dictionaries are not allowed as keys. Tuples are allowed if they do not
directly or indirectly contain mutable objects. More exactly, the keys
are required to be hashable (1).

And then go on and define "hashable" in the footnot.

I think that is easy and understandable and covers the basic cases.
And the footnote is:

Instances of Python classes are hashable if they define a __hash__
method, or if they don't define either __cmp__ or __eq__.


I think that's not exact enough. For instance, ordinary lists also
define a __hash__ method but are not hashable since that method just
raises an exception. Also, as Martin pointed out, if both is there
(__hash__ and __cmp__/__eq__) you would also require of a "proper"
__hash__ method that equal objects hash the same. Otherwise, semantics
would suffer, you could have dicts with non-unique keys (i.e. keys which
are only distinct as objects, but not different as values).

-- Christoph
Nov 25 '05 #66
Christoph Zwerschke <ci**@online.de> writes:
Mike Meyer schrieb:
Ok, how about this for dictionaries/sets:
Any hashable object can be used as a dictionary key (set
member). Immutable
objects, except for tuples that contain a non-hashable object, are
hashable. Python classes are normally hashable(1). I would be even more simple here, like that:
The keys can be arbitrary immutable objects. Thus lists, sets or
dictionaries are not allowed as keys. Tuples are allowed if they do
not directly or indirectly contain mutable objects. More exactly, the
keys are required to be hashable (1).
And then go on and define "hashable" in the footnot.
I think that is easy and understandable and covers the basic cases.


You're shifting the emphasis from hashable back to mutable, which is
the problem that I'm trying to fix. The mutability - or immutability -
of an object is irrelevant to the question of whether or not they can
be a dictionary key/set element. The critical property is that they
are hashable. *Most* immutable builtin types are hashable, and no
builtin mutable types are hashable, which deserves a mention. Once you
get away from the builtins, mutability simply stops mattering. Giving
mutability top billing is simply wrong.
And the footnote is:
Instances of Python classes are hashable if they define a __hash__
method, or if they don't define either __cmp__ or __eq__.

I think that's not exact enough. For instance, ordinary lists also
define a __hash__ method but are not hashable since that method just
raises an exception.


Ordinary lists aren't Python classes.
Also, as Martin pointed out, if both is there
(__hash__ and __cmp__/__eq__) you would also require of a "proper"
__hash__ method that equal objects hash the same. Otherwise, semantics
would suffer, you could have dicts with non-unique keys (i.e. keys
which are only distinct as objects, but not different as values).


Um, actually, I pointed that out (possibly as well), and included it
in one of the earlier versions. I was contemplating moving it
elsewhere, and accidently left it out. It certainly deserves mention
somewhere; I'm just not sure that here is the right place. Along with
__hash__ seems more appropriate.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 25 '05 #67
Mike Meyer wrote:
The mutability - or immutability - of an object is irrelevant to
the question of whether or not they can be a dictionary key/set
element. The critical property is that they are hashable.
*Most* immutable builtin types are hashable, and no builtin
mutable types are hashable, which deserves a mention.


I think the problem we are facing is that you must distinguish between
hashable/mutable *types* and hashable/mutable objects. You can have
*types* like tuple which are hashable/immutable themselves, but can have
*instances* that are mutable and not hashable, such as ([]).

For the built-in types I think objects are hashable if and only if they
are immutable (not 100% sure, but can you give a counterexample?).
That's why I got back to talk about mutability first. But I agree one
should also explain the problem of non-built-in types.

Maybe one can split the explanation and talk about the built-in
datatypes first and then explain the situation for user-defined types.
And the footnote is:
Instances of Python classes are hashable if they define a __hash__
method, or if they don't define either __cmp__ or __eq__.


I think that's not exact enough. For instance, ordinary lists also
define a __hash__ method but are not hashable since that method just
raises an exception.


Ordinary lists aren't Python classes.


In your first sentence, it was unclear whether "they" refered to the
class or the instance. But anyway, what about the following class:

class mylist(list):
pass

This class definitely has a __hash__ method. But instances of this class
are not hashable.

-- Christoph
Nov 25 '05 #68
Christoph Zwerschke wrote:
You need to know a third thing, namely that as an iterable object, a
dictionary returns the keys only, not the key/value tuples which would
also make some sense. If it had been designed that way, you could write


Right. I forgot to say that, but that is also explained in PEP 234.

Regards,
Martin
Nov 25 '05 #69
Mike Meyer wrote:
I would have phrased it as "identical" instead of equal. Sets should
hold unique elements, where "unique" depends on the use at hand. For
some uses, "==" might be appropriate. For others, "is" might be
appropriate.
Sets, as currently defined, require hashable objects, i.e. objects
which has same all the time and hash same as all other equal objects.
Phrasing it as "identical" would be incorrect: hashes currently operate
on equality, not identity.
Would you care to suggest an improved wording?


No - I have learned that I cannot write documentation that others can
understand. I usually write documentation which only attempts to be
correct, and others than make it understandable.

Regards,
Martin
Nov 25 '05 #70
Christoph Zwerschke <ci**@online.de> writes:
Mike Meyer wrote:
> The mutability - or immutability - of an object is irrelevant to
> the question of whether or not they can be a dictionary key/set
> element. The critical property is that they are hashable.
> *Most* immutable builtin types are hashable, and no builtin
> mutable types are hashable, which deserves a mention. I think the problem we are facing is that you must distinguish between
hashable/mutable *types* and hashable/mutable objects. You can have
*types* like tuple which are hashable/immutable themselves, but can
have *instances* that are mutable and not hashable, such as ([]).


I think you're trying to tweak the wrong definition. Types are
immutable if their instances are immutable. The problem is that python
defines the value of container objects to depend on their contents, so
the value of a container object can change even if the container
itself can't change. Tuples can't change - a tuple will always refer
to the objects it refers to when created. However, those objects can
change, which changes the value of the tuple without changing the
tuple. So whether or not a tuple containing a mutable object is
immutable depends on whether you define a immutable as "the object
can't change" or "the objects value can't change".
For the built-in types I think objects are hashable if and only if
they are immutable (not 100% sure, but can you give a
counterexample?). That's why I got back to talk about mutability
first. But I agree one should also explain the problem of non-built-in
types.
If you define immutable as "the object doesn't change", then the
counterexample is the one you misspelled above: ([],). If you define
immutable as "the value of the object can't change", then I don't have
a counterexample.
class mylist(list):
pass

This class definitely has a __hash__ method. But instances of this
class are not hashable.


Do you think it's really necessary to specify "has a __hash__ method
that returns a value", as opposed to just "has a __hash__ method"?

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 25 '05 #71
Mike Meyer wrote:
Ok, how about this for dictionaries/sets:

Any hashable object can be used as a dictionary key (set member). Immutable
objects, except for tuples that contain a non-hashable object, are
hashable. Python classes are normally hashable(1).


The problem with the last sentence is that can be misunderstood (classes
themselves as objects are always hashable) and has little information
otherwise (what is "normal"?).

-- Christoph
Nov 26 '05 #72
Mike Meyer wrote:
I think you're trying to tweak the wrong definition. Types are
immutable if their instances are immutable.
I'm not trying to tweak it, but to gain more clarity about what is
actually meant when you talk about "mutable" types and objects and
whether it suffices to use these terms as long as you speak about
built-in datatypes.

Tuples have been considered an immutable type since generations, becaue
the elements of a tuple cannot be changed (i.e. the elements themselves,
not their values). Likewise, they should be considered a hashable type
because they have a proper hash function that can be used if all
elements are hashable. Still, you can create instances of this
immutable/hashable data type which are mutable/not hashable. So in the
docs it will be important to be clear and emphasize that the *objects*
have to be immutable/hashable, not only their types.
So whether or not a tuple containing a mutable object is
immutable depends on whether you define a immutable as "the object
can't change" or "the objects value can't change".
Right - if you have an actual tuple instance like (a,) and a is a list,
you can argue whether it should be considered mutable or not. But you
cannot argue whether it is hashable or not. It's simply not hashable. In
so far, speaking about whether an object is hashable is more precise
(well-defined) than asking whether it is mutable, you're right.
Do you think it's really necessary to specify "has a __hash__ method
that returns a value", as opposed to just "has a __hash__ method"?


I think it is necessary to require that it has a "proper" __hash__
function. As far as I understand you can always add a __hash__ method.
Not only invalid __hash__ methods like in this case:

class mylist0(list):
def __hash__(self): return None

But you can also easily add valid __hash__ methods:

class mylist1(list):
def __hash__(self): return 0815

class mylist2(list):
def __hash__(self): return id(self)

In the case of mylist1, everything is ok including semantics, but
performance suffers dramatically. In mylist2, performance is great, but
semantics suffers greatly.

Which of these user-defined types would you call "hashable"? Technically
speaking, both probably are, so you can indeed use list objects of both
types as keys of a dictionary - but I think there should be a hint in
the docs that you need to have a "proper" hash function if you want your
dictionary to be performant and show the usually expected behavior.

-- Christoph
Nov 26 '05 #73
Christoph Zwerschke <ci**@online.de> writes:
Mike Meyer wrote:
> I think you're trying to tweak the wrong definition. Types are
> immutable if their instances are immutable. I'm not trying to tweak it, but to gain more clarity about what is
actually meant when you talk about "mutable" types and objects and
whether it suffices to use these terms as long as you speak about
built-in datatypes.


I think we both agree that mutability isn't adequate, because it's
really irrelevant to the question. What it seems to provide is a
convenient and usually well-understood way of describing which builtin
types can be used: "Mutable builtin types cannot be used as dictionary
keys." "Immutable builtin types can be used as dictionary keys,
except for certain tuples." That exception creates headaches. Then you
have to ponder things like "Is a file instance immutable?" because
they are hashable. Maybe we should just skip the idea completely.
> Do you think it's really necessary to specify "has a __hash__ method
> that returns a value", as opposed to just "has a __hash__ method"?

I think it is necessary to require that it has a "proper" __hash__
function. As far as I understand you can always add a __hash__
method. Not only invalid __hash__ methods like in this case:
class mylist0(list):
def __hash__(self): return None
But you can also easily add valid __hash__ methods:

class mylist1(list):
def __hash__(self): return 0815

class mylist2(list):
def __hash__(self): return id(self)

In the case of mylist1, everything is ok including semantics, but
performance suffers dramatically. In mylist2, performance is great,
but semantics suffers greatly.
Which of these user-defined types would you call "hashable"?


The latter two, as the given __hash__ meets the specifications for
__hash__. __hash__ itself isn't enough to decide the question of
semantics. You can make mylist2 have the right semantics by adding
def __eq__(self, other): return self is other
to it.
Technically speaking, both probably are, so you can indeed use list
objects of both types as keys of a dictionary - but I think there
should be a hint in the docs that you need to have a "proper" hash
function if you want your dictionary to be performant and show the
usually expected behavior.


I agree - but where does that hint belong? I think it belongs in the
documentation for __hash__, not the documentation for sets and
dictionaries.

How about we document it in the spirit of Python's "try it and see"
philosophy:

Any hashable object can be used as a dictionary key/set element. An
object is hashable if calling hash on that object returns an integer.

That's 100% correct. However, is it to little information to be
useful? If so, then going back to the mutable/immutable thing seems to
be right:

Any hashable object can be used as a dictionary key/set
element. Instances of mutable builtins are not hashable. Instances of
immutable builtins are hashable, except for tuples that contain a
non-hashable value. Instances of classes are usually hashable(*).

Footnote *: An instance of a class is hashable if it has a __hash__
method, or does not have either a __cmp__ or __eq__ method. If you're
not sure, call hash() on the instance, and if it returns an integer,
the instance is hashable.

Either way, over in the documentation for __hash__, add a note that:

For instances of a class to work properly with builtin container
types, two instances that compare as equal must have the same hash
value. For best performance, hash values should be different as often
as possible while honoring the equality requirement. (or is that
wrong?)

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 26 '05 #74
Mike Meyer wrote:
class mylist1(list):
def __hash__(self): return 0815

class mylist2(list):
def __hash__(self): return id(self)

In the case of mylist1, everything is ok including semantics, but
performance suffers dramatically. In mylist2, performance is great,
but semantics suffers greatly.
Which of these user-defined types would you call "hashable"?

The latter two, as the given __hash__ meets the specifications for
__hash__.


This is not true. The second definition of __hash__ does not meet
the specifications:

http://docs.python.org/ref/customization.html

"The only required property is that objects which compare equal have the
same hash value"

However, I can have two instances of mylist2 which compare equal,
yet have different hash values.

Regards,
Martin
Nov 26 '05 #75
>>> class mylist1(list):
def __hash__(self): return 0815

class mylist2(list):
def __hash__(self): return id(self)

Which of these user-defined types would you call "hashable"?
Mike Meyer wrote: The latter two, as the given __hash__ meets the specifications for
__hash__.


Martin v. Lwis wrote:
This is not true. The second definition of __hash__ does not meet
the specifications:
"The only required property is that objects which compare equal have the
same hash value"


As Mike has written in his last posting, you could easily "fix" that by
tweaking the equality relation as well. So technically speaking, Mike is
probably right. It would completely break common-sense semantics,
because for mylist2, if I have a=[1] and b=[1] then I will have a!=b.
This would not be very reasonable, but probably allowed by Python.

-- Christoph
Nov 26 '05 #76
Christoph Zwerschke wrote:
This is not true. The second definition of __hash__ does not meet
the specifications:
"The only required property is that objects which compare equal have the
same hash value"

As Mike has written in his last posting, you could easily "fix" that by
tweaking the equality relation as well. So technically speaking, Mike is
probably right.


No. If you define both __hash__ and __eq__ consistently, then __hash__
would meet the specification. As posted in the example, __hash__ does
not meet the specification, contrary to Mike's claim that it does.
It would completely break common-sense semantics,
because for mylist2, if I have a=[1] and b=[1] then I will have a!=b.
This would not be very reasonable, but probably allowed by Python.


If you have a=[1] and b=[1], then *always* a==b; you cannot
change the semantics of list displays. To achieve !=, you
would need to have a=mylist2([1]), b=mylist([2]). Whether or not
it is common sense that classes named "mylist2" compare equal
if they consist of the same items, I don't know - I personally don't
have any intuition as to what a class named "mylist2" should or
should not do.

Just that it has "list" in its name provides not sufficient clue: for
example, IdentityDictionary has "dictionary" in its name, yet I would
*not* expect that equality is used to compare keys, but instead I
would expect that identity is used.

Regards,
Martin
Nov 26 '05 #77
"Martin v. Lwis" <ma****@v.loewis.de> writes:
Mike Meyer wrote:
class mylist1(list):
def __hash__(self): return 0815

class mylist2(list):
def __hash__(self): return id(self)

In the case of mylist1, everything is ok including semantics, but
performance suffers dramatically. In mylist2, performance is great,
but semantics suffers greatly.
Which of these user-defined types would you call "hashable"? The latter two, as the given __hash__ meets the specifications for
__hash__.

This is not true. The second definition of __hash__ does not meet
the specifications:


The specification isn't on the __hash__ method, but on objects.
Instances of mylist2 that aren't otherwise tweaked won't meet the
specification.
http://docs.python.org/ref/customization.html
"The only required property is that objects which compare equal have the
same hash value"
That's pretty flaky. The same docs say:

Should return a 32-bit integer usable as a hash value for dictionary
operations.

the dictionary implementation requires that a key's hash value is
immutable

Maybe those are only "recommended" properties, but any class that
doesn't meet them is going to have instances with interesting
behaviors in sets and as dictionary keys.

FWIW, the mutable builtins have __hash__ methods that don't meet that
requirement. Instances of them don't have has values at all.
However, I can have two instances of mylist2 which compare equal,
yet have different hash values.


That's a problem with mylist2 instances, and may or may not be a
problem in the mylist2 __hash__ method. You can't tell just by
examining the __hash__ method whether or not it meets this
specification.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 26 '05 #78
Mike Meyer wrote:
This is not true. The second definition of __hash__ does not meet
the specifications:

The specification isn't on the __hash__ method, but on objects.


What does it mean for a specification to be "on" something? The
specification I quoted is listed "under" the __hash__ heading
of the reference manual.
Instances of mylist2 that aren't otherwise tweaked won't meet the
specification.
Right, that's what I meant.
That's pretty flaky. The same docs say:

Should return a 32-bit integer usable as a hash value for dictionary
operations.
What is flaky about this?
the dictionary implementation requires that a key's hash value is
immutable
Indeed, the question here is: what does "an object is immutable"
mean in this context? You appear to read it as "an object whose
state does not change", however, what it really means is "an object
which will always use the same state in __hash__ and __eq__".
Maybe those are only "recommended" properties, but any class that
doesn't meet them is going to have instances with interesting
behaviors in sets and as dictionary keys.
Indeed, you can anything return you want in __hash__, or always
raise an exception. What precisely the result of dictionary
operations is when you do so is undefined in Python.
That's a problem with mylist2 instances, and may or may not be a
problem in the mylist2 __hash__ method. You can't tell just by
examining the __hash__ method whether or not it meets this
specification.


Correct. I would expect that this is the case for any specification
of __hash__ you can come up with: __hash__ would typically refer
to the state of the object, and whether or not the results meets
some specified requirement would also depend on what the state is.
(unless either the specification is trivial (e.g. "may or may
not return a value") or the implementation is trivial (e.g.
"return 0"))

Regards,
Martin
Nov 27 '05 #79
"Martin v. Lwis" <ma****@v.loewis.de> writes:
Mike Meyer wrote:
This is not true. The second definition of __hash__ does not meet
the specifications:

The specification isn't on the __hash__ method, but on objects.

What does it mean for a specification to be "on" something? The
specification I quoted is listed "under" the __hash__ heading
of the reference manual.


"applies to"
That's pretty flaky. The same docs say:
Should return a 32-bit integer usable as a hash value for dictionary
operations.

What is flaky about this?


What's flaky is the claim in the docs that "the *only* requirement is
....". There are clearly other requirements, and they're listed in the
same paragraph as that requirement.
the dictionary implementation requires that a key's hash value is
immutable

Indeed, the question here is: what does "an object is immutable"
mean in this context? You appear to read it as "an object whose
state does not change", however, what it really means is "an object
which will always use the same state in __hash__ and __eq__".


I've given up on trying to say what "immutable" means, because common
usage in the CS and Python communities is ambiguous when it comes to
container objects. You give an unambiguous definition, but it's
specific to python, and clashes with common usage in the CS and Python
communities. Every old-style class that doesn't define __hash__,
__eq__ or __cmp__ is immutable by that definition, even if it has
mutator methods. Similar comments apply to new-style classes, but
require more work to pin down.

That's the reason this thread went the direction it did - because I
started discussing fixing the docs to not use "immutable" as the
defining property.
Maybe those are only "recommended" properties, but any class that
doesn't meet them is going to have instances with interesting
behaviors in sets and as dictionary keys.

Indeed, you can anything return you want in __hash__, or always
raise an exception. What precisely the result of dictionary
operations is when you do so is undefined in Python.


I certainly hope that if __hash__ raises an exception, dictionary
operations pass the exception on to their caller. For the other cases,
undefined behavior is perfectly reasonable.

Back onto topic:

If you want to use your definition of "immutable" in the docs for
dictionary keys and set elements, I think that the docs should
explicitly include that definition. We can probably come up with a
less ambiguous (does "same state" mean "same state variables" or
"variables that don't change value"?) wording as well.

Personally, I think we'd be better off to come up with a term for this
property that doesn't have a commonly understood meaning that has such
broad areas of disagreement with the property. I've been using
"hashable", which I would currently define as "has a __hash__ method
with the properties described in the __hash__ documentation, or does
not have either a __cmp__ or a __eq__ method." Listing which builtin
types are hashable and which aren't (and which vary on an
instance-by-instance basis) would seem to be worthwhile as well.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 27 '05 #80
Mike Meyer wrote:
Personally, I think we'd be better off to come up with a term for this
property that doesn't have a commonly understood meaning that has such
broad areas of disagreement with the property. I've been using
"hashable", which I would currently define as "has a __hash__ method
with the properties described in the __hash__ documentation, or does
not have either a __cmp__ or a __eq__ method."


I would like to use "hashable" as a term as well, but it appears that
many people would understand that to mean "has a __hash__
implementation" (i.e. hash(x) returns a value, instead of raising an
exception).

Regards,
Martin
Nov 27 '05 #81
Martin v. Lwis schrieb:
As Mike has written in his last posting, you could easily "fix" that
by tweaking the equality relation as well. So technically speaking,
Mike is probably right.
No. If you define both __hash__ and __eq__ consistently, then __hash__
would meet the specification. As posted in the example, __hash__ does
not meet the specification, contrary to Mike's claim that it does.


Ok, the example was incomplete, but in principle, this could be fixed by
adding a tweaked __eq__ method. So for completeness sake:

class mylist2(list):
def __hash__(self): return id(self)
def __eq__(self, other): return self is other
def __ne__(self, other): return self is not other

list = mylist2

a = list([1])
b = list([1])

print a, b, a == b, a != b
If you have a=[1] and b=[1], then *always* a==b; you cannot
change the semantics of list displays.


Surely, since you can't change the methods of built-in types. But you
can create your own local list displays with a tweaked semantics as in
the above example.

Anyway, the original question was: Are mylist1 and mylist2 (as above) to
be considered "hashable" types or not? I think "technically" speaking,
they are, and the "technical" definition is the only one that counts.

-- Christoph
Nov 27 '05 #82
Christoph Zwerschke wrote:
Anyway, the original question was: Are mylist1 and mylist2 (as above) to
be considered "hashable" types or not?


I long forgot was the original question was (I thought it was
"Why is dictionary.keys() a list and not a set?" :-); anyway,
the answer to this question is certainly "yes".

Regards,
Martin
Nov 27 '05 #83
Martin v. Lwis wrote:
I long forgot was the original question was (I thought it was
"Why is dictionary.keys() a list and not a set?" :-)


Er, yes. Probably we should have opened a new thread instead:
"Improvement of the std lib doc concerning keys of sets/dicts".

-- Christoph
Nov 27 '05 #84
"Martin v. Lwis" <ma****@v.loewis.de> writes:
Mike Meyer wrote:
Personally, I think we'd be better off to come up with a term for this
property that doesn't have a commonly understood meaning that has such
broad areas of disagreement with the property. I've been using
"hashable", which I would currently define as "has a __hash__ method
with the properties described in the __hash__ documentation, or does
not have either a __cmp__ or a __eq__ method."

I would like to use "hashable" as a term as well, but it appears that
many people would understand that to mean "has a __hash__
implementation" (i.e. hash(x) returns a value, instead of raising an
exception).


Well, the two aren't quite the same thing: hash returns a value on
some things that don't have a __hash__. But that is close to the
meaning I want to give it. What meaning did you have in mind?

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 27 '05 #85
Mike Meyer wrote:
Personally, I think we'd be better off to come up with a term for this
property that doesn't have a commonly understood meaning that has such
broad areas of disagreement with the property. I've been using
"hashable", which I would currently define as "has a __hash__ method
with the properties described in the __hash__ documentation, or does
not have either a __cmp__ or a __eq__ method."


I would like to use "hashable" as a term as well, but it appears that
many people would understand that to mean "has a __hash__
implementation" (i.e. hash(x) returns a value, instead of raising an
exception).

Well, the two aren't quite the same thing: hash returns a value on
some things that don't have a __hash__. But that is close to the
meaning I want to give it. What meaning did you have in mind?


Me, personally, I had your definition in mind: hashable should indicate
"returns a value constant over time and consistent with comparison".

I suggested that most people would consider "hashable" to mean:
hash() returns a value. To those people, it is a minor detail whether
you get the fallback implementation of hash() or whether there is a
default __hash__ implementation for all objects that don't otherwise
define __hash__.

Regards,
Martin
Nov 27 '05 #86
"Martin v. Lwis" <ma****@v.loewis.de> writes:
Me, personally, I had your definition in mind: hashable should indicate
"returns a value constant over time and consistent with comparison".

I suggested that most people would consider "hashable" to mean:
hash() returns a value. To those people, it is a minor detail whether
you get the fallback implementation of hash() or whether there is a
default __hash__ implementation for all objects that don't otherwise
define __hash__.


True. I think we ought to leave the behavioral requirements up to the
__hash__ docs, as it's already there. We can use hashable that way,
and maybe define it by implication:

Any object for which hash() returns an appropriate value(1) can be
used as a dictionary key/set element. Lists, sets and dicts are not
hashable, and can not be used. Tuples can be used if all the things
they contain are hashable. instances of all other builin types can be
used. Instances of most classes written in Python can be used(2).

1) See the __hash__ documentation for details on what an approriate
value is.

2) Instances that have a __hash__ method that returns an appropriate
value can be used. Instances that don't have a __cmp__ or an __eq__
method can be used even if they don't have a __hash__ method.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 28 '05 #87
Mike Meyer wrote:
Any object for which hash() returns an appropriate value(1) can be
used as a dictionary key/set element. Lists, sets and dicts are not
hashable, and can not be used. Tuples can be used if all the things
they contain are hashable. instances of all other builin types can be
used. Instances of most classes written in Python can be used(2).

1) See the __hash__ documentation for details on what an approriate
value is.

2) Instances that have a __hash__ method that returns an appropriate
value can be used. Instances that don't have a __cmp__ or an __eq__
method can be used even if they don't have a __hash__ method.


I think that is not so bad. How about this simplification:

Any hashable object(1) can be used as a dictionary key/set element.
Lists, sets and dicts are not hashable, and can not be used. Tuples can
be used if all the things they contain are hashable. Instances of all
other built-in types and most user-defined classes are hashable.

(1) Objects for which the hash() function returns an appropriate
(proper?) value. See the __hash__ documentation for details.

-- Christoph
Nov 28 '05 #88
Christoph Zwerschke <ci**@online.de> writes:
Mike Meyer wrote:
Any object for which hash() returns an appropriate value(1) can be
used as a dictionary key/set element. Lists, sets and dicts are not
hashable, and can not be used. Tuples can be used if all the things
they contain are hashable. instances of all other builin types can be
used. Instances of most classes written in Python can be used(2).
1) See the __hash__ documentation for details on what an approriate
value is.
2) Instances that have a __hash__ method that returns an appropriate
value can be used. Instances that don't have a __cmp__ or an __eq__
method can be used even if they don't have a __hash__ method.


I think that is not so bad. How about this simplification:

Any hashable object(1) can be used as a dictionary key/set
element. Lists, sets and dicts are not hashable, and can not be
used. Tuples can be used if all the things they contain are
hashable. Instances of all other built-in types and most user-defined
classes are hashable.

(1) Objects for which the hash() function returns an appropriate
(proper?) value. See the __hash__ documentation for details.


I think that will do quite nicely, as the information about __hash__,
__cmp__ and __eq__ are all in the __hash__ documentation. You wrote it
- why don't you submit a PR with it?

<nmike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 28 '05 #89
Mike Meyer wrote:
Christoph Zwerschke wrote:
I think that is not so bad. How about this simplification:

Any hashable object(1) can be used as a dictionary key/set
element. Lists, sets and dicts are not hashable, and can not be
used. Tuples can be used if all the things they contain are
hashable. Instances of all other built-in types and most user-defined
classes are hashable.

(1) Objects for which the hash() function returns an appropriate
(proper?) value. See the __hash__ documentation for details.


I think that will do quite nicely, as the information about __hash__,
__cmp__ and __eq__ are all in the __hash__ documentation. You wrote it
- why don't you submit a PR with it?


Actually you wrote it ;-) Do suggestions like this one go to the normal
Python sf bug tracker?

-- Christoph
Nov 28 '05 #90
Christoph Zwerschke <ci**@online.de> writes:
Mike Meyer wrote:
Christoph Zwerschke wrote:
I think that is not so bad. How about this simplification:

Any hashable object(1) can be used as a dictionary key/set
element. Lists, sets and dicts are not hashable, and can not be
used. Tuples can be used if all the things they contain are
hashable. Instances of all other built-in types and most user-defined
classes are hashable.

(1) Objects for which the hash() function returns an appropriate
(proper?) value. See the __hash__ documentation for details.

I think that will do quite nicely, as the information about __hash__,
__cmp__ and __eq__ are all in the __hash__ documentation. You wrote it
- why don't you submit a PR with it?

Actually you wrote it ;-) Do suggestions like this one go to the
normal Python sf bug tracker?


Yup. I went ahead and did it. ID #1368768.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 29 '05 #91

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

14 posts views Thread by Leif K-Brooks | last post: by
8 posts views Thread by rh0dium | last post: by
4 posts views Thread by =?utf-8?B?TWFjaWVqIEJsaXppxYRza2k=?= | last post: by
5 posts views Thread by robinsiebler | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by harlem98 | last post: by
reply views Thread by harlem98 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.