459,290 Members | 1,253 Online
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,290 IT Pros & Developers. It's quick & easy.

hashing an array - howto

 P: n/a Hi, I need to hash arrays of integers (from the hash module). So, I have to derive from array and supply a __hash__ method. But how to hash an array (of fixed length, say 25)? What I need is a function which maps a tuple of 25 integers into 1 integer with good hashing properties. Does anybody know such a thing? Many thanks for a hint, Helmut Jarausch Lehrstuhl fuer Numerische Mathematik RWTH - Aachen University D 52056 Aachen, Germany Sep 5 '08 #1
8 Replies

 P: n/a Helmut Jarausch wrote: I need to hash arrays of integers (from the hash module). So, I have to derive from array and supply a __hash__ method. But how to hash an array (of fixed length, say 25)? What I need is a function which maps a tuple of 25 integers into 1 integer with good hashing properties. Does anybody know such a thing? Have you tried this already? def __hash__(self): return hash(self.tostring()) Peter Sep 5 '08 #2

 P: n/a Helmut Jarausch: I need to hash arrays of integers (from the hash module). One of the possible solutions is to hash the equivalent tuple, but it requires some memory (your sequence must not be tuples already): assert not isinstance(somelist, tuple) hash(tuple(somelist)) This is an alternative solution, it doesn't use much memory, but I am not sure it works correctly: from operator import xor hash(reduce(xor, somelist)) Bye, bearophile Sep 5 '08 #3

 P: n/a On Sep 5, 11:18 am, bearophileH...@lycos.com wrote: Helmut Jarausch: I need to hash arrays of integers (from the hash module). One of the possible solutions is to hash the equivalent tuple, but it requires some memory (your sequence must not be tuples already): why can't it be tuple already? Doesn't matter: >>from numpy import arangea=arange(5)a array([0, 1, 2, 3, 4]) >>hash(a) Traceback (most recent call last): File "", line 1, in ? TypeError: unhashable type >>b=tuple(a)b (0, 1, 2, 3, 4) >>c=tuple(b)c (0, 1, 2, 3, 4) >>hash(c) 1286958229 you can discard the tuple, so the memory requirement is transient. Sep 5 '08 #4

 P: n/a Michael Palmer: why can't it be tuple already? Because if the input list L has tuples and lists, they end having the same hash value: >>L = [[1,2,3], (1,2,3)]hash(tuple(L[0])), hash(tuple(L[1])) (-378539185, -378539185) But it's a not much common situation, and few hash collision pairs can't damage much, so I agree with you that my assert was useless. This may solve that problem anyway: hash(type(L)) ^ hash(tuple(L)) Generally a good hashing functions uses all the input information. If you use tuple() you ignore part of the input information, that is the type of L. So xor-ing hash(type(L)) you use that information too. you can discard the tuple, so the memory requirement is transient. Right, but there's lot of GC action, it may slow down the code. So you can start using hash(tuple(L)), but if later the code performance comes out bad, you may try a different version that creates less intermediate garbage. Bye, bearophile Sep 5 '08 #5

 P: n/a John Machin: Consider this:>>hash(123) == hash(123.0) == hash(123L) True Right... Can you explain me why Python designers have chosen to build a hash() like that? Try "uses all the information that is relevant to the task". My knowledge of hash data structures seems not enough to understand why. Your alternative solution using reduce and xor may have suboptimal characteristics ... Right, sorry. Bye, bearophile Sep 5 '08 #6

 P: n/a On Sep 6, 7:49*am, bearophileH...@lycos.com wrote: John Machin: Consider this:>>hash(123) == hash(123.0) == hash(123L) True Right... Can you explain me why Python designers have chosen to build a hash() like that? I can't channel them; my rationalisation is this: Following the Law of Least Astonishment, >123 == 123.0 == 123L True Consequently if x == y, then adict[x] and adict[y] should give the same result. Cheers, John Sep 5 '08 #7

 P: n/a On Sep 6, 9:30*am, John Machin >hash(123) == hash(123.0) == hash(123L) True Right... Can you explain me why Python designers have chosen to build a hash() like that? I can't channel them; my rationalisation is this: Following the Law of Least Astonishment,>123 == 123.0 == 123L True Consequently if x == y, then adict[x] and adict[y] should give the same result. Another reason for not folding in the type of the object is this: >>type([]) >>hash(type([])) 505252536 >>id(type([])) 505252536 IOW hash(T) == id(T) where T is a type. id(obj) is just a memory address which can vary between executions of the same Python binary on the same machine ... not very reproducible. There is no guarantee in the docs for hash about under what circumstances hash(x) != hash(x) of course; I'm just relying on the least astonishment law again :-) And, again, we don't know what the OP's full requirements are ... Sep 6 '08 #8

 P: n/a On Sep 5, 9:38*am, Helmut Jarausch

This discussion thread is closed

Replies have been disabled for this discussion.