473,466 Members | 1,417 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

dictionary keys, __hash__, __cmp__

In the Python Language Reference, I found the following
statements about using objects as dictionary keys:

1. "__hash__() should return a 32-bit integer."

2. "The only required property is that objects which
compare equal have the same hash value."

3. "If a class does not define a __cmp__() method it
should not define a __hash__() operation either."
Can I asume that:

-- it is guaranteed that id(obj) returns a unique 32-bit integer

-- keys are interchangeable (equivalent),
if the following is valid:

hash(key1) == hash(key2) and key1 == key2

-- I can ignore the 2nd statement, if I am aware of
the fact that: if objects are equal it dosn't mean that
they are the same key.

-- I can savely ignore the 3rd statement, because python
falls back to cmp(id(obj1), id(obj2)), if __cmp__()
is not defined.

Jul 18 '05 #1
8 4893
On Tue, 04 Nov 2003 21:52:42 +0100, Jan-Erik Meyer-Lütgens wrote:
In the Python Language Reference, I found the following statements about
using objects as dictionary keys:

1. "__hash__() should return a 32-bit integer."

2. "The only required property is that objects which
compare equal have the same hash value."

3. "If a class does not define a __cmp__() method it
should not define a __hash__() operation either."
Can I asume that:

-- it is guaranteed that id(obj) returns a unique 32-bit integer
Yes it is - id returns current address of object in memory.

**SNIP**
Help on built-in function id:

id(...)
id(object) -> integer

Return the identity of an object. This is guaranteed to be unique
among simultaneously existing objects. (Hint: it's the object's
memory address.)

**SNIP**
-- keys are interchangeable (equivalent),
if the following is valid:

hash(key1) == hash(key2) and key1 == key2
Yes. note that key1 == key2 implies hash(key1) == hash(key2), but if
hash(key1) == hash(key2) there is no guarantee that key1 == key2. If you
do define __hash__() build-in function hash would use it and thus you need
to be sure about quality of your hash-method. However if you do not define
__hash__ hash() will just fall back into id() that is guaranteed to be
unique.

-- I can ignore the 2nd statement, if I am aware of
the fact that: if objects are equal it dosn't mean that they are
the same key.
So you're introducing scenario where different objects are considered
equal in means of __cmp__ while having different hash. I think that's not
normal.

Since if you do not define __cmp__ / __hash__ python will use id and thus
second rule is valid.

If you define cmp but not hash you will most likely get TypeError. If you
define both you should follow second rule as if I'm right most of the
internal data structures will depend on second rule.

-- I can savely ignore the 3rd statement, because python
falls back to cmp(id(obj1), id(obj2)), if __cmp__() is not defined.


Yes. id(obj1) != id(obj2), so obj1 != obj2. Only requirement left is that
__hash__() returns 32 bit integer. Personally i would emphasis word SHOULD
NOT in third rule. I'm sure there is situations where it's perfectly
normal to use id-value's and custom hashes. Anyways you can redefine
__cmp__() simply to (and thus avoiding to break against third rule):

def __cmp__(self, other):
return id(self).__cmp__(id(other))

--
Miika
Jul 18 '05 #2
Miika Keskinen wrote:
On Tue, 04 Nov 2003 21:52:42 +0100, Jan-Erik Meyer-Lütgens wrote:
In the Python Language Reference, I found the following statements about
using objects as dictionary keys:

1. "__hash__() should return a 32-bit integer."

2. "The only required property is that objects which
compare equal have the same hash value."

3. "If a class does not define a __cmp__() method it
should not define a __hash__() operation either."
Can I asume that:

-- keys are interchangeable (equivalent),
if the following is valid:

hash(key1) == hash(key2) and key1 == key2


Yes. note that key1 == key2 implies hash(key1) == hash(key2)


.... only if I define __cmp__() and __hash__() appropriate
(and not breaking the 2nd rule)

-- I can ignore the 2nd statement, if I am aware of
the fact that: if objects are equal it dosn't mean that they are
the same key.

So you're introducing scenario where different objects are considered
equal in means of __cmp__ while having different hash. I think that's not
normal.

That's the point. I know that this in not normal. But is it possible?
-- I can savely ignore the 3rd statement, because python
falls back to cmp(id(obj1), id(obj2)), if __cmp__() is not defined.

Yes. id(obj1) != id(obj2), so obj1 != obj2. Only requirement left is that
__hash__() returns 32 bit integer. Personally i would emphasis word SHOULD
NOT in third rule. I'm sure there is situations where it's perfectly
normal to use id-value's and custom hashes. Anyways you can redefine
__cmp__() simply to (and thus avoiding to break against third rule):

def __cmp__(self, other):
return id(self).__cmp__(id(other))


But I want __cmp__() for another comparison. As an real world example
I have a file object which should be stored in a dictionary. __cmp__()
compare the contents of two files. Thus I must define the __hash__()
method. I use id(obj) as the hash function.

So I've breaking the rule: Objects which compare equal have the same hash.

I use the file objects as dictionary keys and it seems to work.
My assumption is that keys are interchangeable (equivalent), if
hash(key1) == hash(key2) and key1 == key2. In my example keys are
equivalent, when they are identical.
id(file_object1) == id(file_object2) and file1 have the same contents
as file2.

Is this assumption valid? Or is there some sophistication in the
python implemention, that break things sometimes? Resulting in
deletion of important files :-)

--
Jan-Erik

Jul 18 '05 #3
Jan-Erik Meyer-Lütgens <py****@meyer-luetgens.de> writes:
In the Python Language Reference, I found the following
statements about using objects as dictionary keys:

1. "__hash__() should return a 32-bit integer."

2. "The only required property is that objects which
compare equal have the same hash value."

3. "If a class does not define a __cmp__() method it
should not define a __hash__() operation either."
Can I asume that:

-- it is guaranteed that id(obj) returns a unique 32-bit integer
No. For instance it doesn't on a 64-bit platform (and on a ridiculous
64 bit platform *cough*Win64*cough* it might even return a long).
-- keys are interchangeable (equivalent),
if the following is valid:

hash(key1) == hash(key2) and key1 == key2
Not quite sure what you mean by equivalent, but if that is true
a key inserted under key1 can be retrieved with key2.
-- I can ignore the 2nd statement, if I am aware of
the fact that: if objects are equal it dosn't mean that
they are the same key.
Um, I don't understand you, but I think the answer is "no".

If you ever have a situation where hash(a) != hash(b) but a == b then
you are very much breaking the rules.
-- I can savely ignore the 3rd statement, because python
falls back to cmp(id(obj1), id(obj2)), if __cmp__()
is not defined.


Don't understand.

Cheers,
mwh

--
I wouldn't trust the Anglo-Saxons for much anything else. Given
they way English is spelled, who could trust them on _anything_ that
had to do with writing things down, anyway?
-- Erik Naggum, comp.lang.lisp
Jul 18 '05 #4
Michael Hudson <mw*@python.net> writes:
Jan-Erik Meyer-Lütgens <py****@meyer-luetgens.de> writes:

[snippety]
-- I can ignore the 2nd statement, if I am aware of
the fact that: if objects are equal it dosn't mean that
they are the same key.


Um, I don't understand you, but I think the answer is "no".

If you ever have a situation where hash(a) != hash(b) but a == b then
you are very much breaking the rules.


Having thought about it a bit more, I think the way you're doing
things *is* actually safe given a platform where sizeof(int) ==
sizeof(PyObject*) and the current implementation, but sheesh, I
wouldn't want to rely on it.

Cheers,
mwh

--
In short, just business as usual in the wacky world of floating
point <wink>. -- Tim Peters, comp.lang.python
Jul 18 '05 #5
Michael Hudson wrote:
Jan-Erik Meyer-Lütgens <py****@meyer-luetgens.de> writes:

3. "If a class does not define a __cmp__() method it
should not define a __hash__() operation either."
Can I asume that:

-- I can savely ignore the 3rd statement, because python
falls back to cmp(id(obj1), id(obj2)), if __cmp__()
is not defined.

Don't understand.


Ok, let me ask the following question: What is the reason
for that rule?
--
Jan-Erik

Jul 18 '05 #6
Jan-Erik Meyer-Lütgens <py****@meyer-luetgens.de> writes:
Michael Hudson wrote:
Jan-Erik Meyer-Lütgens <py****@meyer-luetgens.de> writes:

3. "If a class does not define a __cmp__() method it
should not define a __hash__() operation either."
Can I asume that:
-- I can savely ignore the 3rd statement, because python
falls back to cmp(id(obj1), id(obj2)), if __cmp__()
is not defined.

Don't understand.


Ok, let me ask the following question: What is the reason
for that rule?


Well, the idea is that dictionary lookup is done by equality. If you
don't define __cmp__ then equality is in fact identity (well, that's
very nearly true...) so the default implementation of __hash__ (based
on the pointer address) is as good as it can get (you have hash
equality iff you have object equality).

I think.

Cheers,
mwh

--
Every now and then, Google doesn't throw up what I need so I start
checking Altavista, Yahoo, etc. In almost every single case, I am
brutally reminded why I use Google in the first place.
-- John Riddoch, asr
Jul 18 '05 #7
Michael Hudson wrote:
Jan-Erik Meyer-Lütgens <py****@meyer-luetgens.de> writes:
Michael Hudson wrote:
Jan-Erik Meyer-Lütgens <py****@meyer-luetgens.de> writes:

3. "If a class does not define a __cmp__() method it
should not define a __hash__() operation either."

Ok, let me ask the following question: What is the reason
for that rule?

Well, the idea is that dictionary lookup is done by equality. If you
don't define __cmp__ then equality is in fact identity (well, that's
very nearly true...) so the default implementation of __hash__ (based


....and the full truth is? :-)
on the pointer address) is as good as it can get (you have hash
equality iff you have object equality).

I think.


So, this rule is a hint, only. It could break performance,
not functionality, if I define my own hash function, right?
To make things totally clear, I repeat my question:

The only thing to be aware of when working with keys (besides
of object immutability) seems the following:

Keys are equivalent (in the sense of: a key inserted under
key1 can be retrieved with key2), if this is valid:

hash(key1) == hash(key2) and key1 == key2

Is this guaranteed? In future implementations?
--
Jan-Erik

Jul 18 '05 #8
Jan-Erik Meyer-Lütgens <py****@meyer-luetgens.de> writes:
Michael Hudson wrote:
Jan-Erik Meyer-Lütgens <py****@meyer-luetgens.de> writes:
Michael Hudson wrote:
Jan-Erik Meyer-Lütgens <py****@meyer-luetgens.de> writes:

> 3. "If a class does not define a __cmp__() method it
> should not define a __hash__() operation either."
>

Ok, let me ask the following question: What is the reason
for that rule? Well, the idea is that dictionary lookup is done by equality. If you
don't define __cmp__ then equality is in fact identity (well, that's
very nearly true...) so the default implementation of __hash__ (based


...and the full truth is? :-)


Oh, something to do with __coerce__ and instances of old-style classes
(well, I didn't mention __eq__ either, but I assume you know about
that).
on the pointer address) is as good as it can get (you have hash
equality iff you have object equality).
I think.


So, this rule is a hint, only. It could break performance,
not functionality, if I define my own hash function, right?
To make things totally clear, I repeat my question:

The only thing to be aware of when working with keys (besides
of object immutability) seems the following:

Keys are equivalent (in the sense of: a key inserted under
key1 can be retrieved with key2), if this is valid:

hash(key1) == hash(key2) and key1 == key2

Is this guaranteed?


Yes.
In future implementations?


One can't predict the entire future of course -- hey, maybe dicts will
be splay trees or something one day -- but *I* would be happy
depending on it.

Cheers,
mwh

--
42. You can measure a programmer's perspective by noting his
attitude on the continuing vitality of FORTRAN.
-- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Jul 18 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Blair Hall | last post by:
I would like to determine whether two dictionaries have the same set of keys. Can anyone tell me if I HAVE to sort the key sequences as in this code snippet: # d1, d2 already created k1 =...
31
by: John Roth | last post by:
I'm adding a thread for comments on Gerrit Holl's pre-pep, which can be found here: http://tinyurl.com/2578q Frankly, I like the idea. It's about time that all of the file and directory stuff...
0
by: Daniele Varrazzo | last post by:
If you need a container to look into, there is the sets module that provides a couple of them. If you need a sorted list, there is the bisect module. But i don't think it fits your need for a...
57
by: Egor Bolonev | last post by:
why functions created with lambda forms cannot contain statements? how to get unnamed function with statements?
90
by: Christoph Zwerschke | last post by:
Ok, the answer is easy: For historical reasons - built-in sets exist only since Python 2.4. Anyway, I was thinking about whether it would be possible and desirable to change the old behavior in...
24
by: kdotsky | last post by:
Hello, I am using some very large dictionaries with keys that are long strings (urls). For a large dictionary these keys start to take up a significant amount of memory. I do not need access to...
7
by: John Henry | last post by:
I believe the standard dictionary should be amened to allow the use of case insensitive keys - as an option. I found some work done by others to do that at: ...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.