443,973 Members | 1,854 Online Need help? Post your question and get tips & solutions from a community of 443,973 IT Pros & Developers. It's quick & easy.

# python hash function

 P: n/a Ive been doing some investigation of python hashtables and the hash function used in Python. It turns out, that when running pystone.py, as much time is spent in string_hash() as is spent in lookdict_string(). If you look at python's hash function, shown below, you can get an idea why - a multiply is used for every character in the string. static long string_hash1(register char *p, int size) { register int len; register long x; len = size; x = *p << 7; while (--len >= 0) x = (100003*x) ^ *p++; x ^= size; return x; } Looking around the web I found the following hash function, which is much faster, and seems to distribute better than the python hash function. static long string_hash3(register char *p, register int size) { register long x; register char c; x = 0xd2d84a61; while (c = *p++) x ^= ( (x<<7) + c + (x>>5) ); return x; } I also came up with a hash function of my own, which seems to distribute near-perfectly (in the sense of being comparable to assigning, as hashes, random numbers to unique strings) and be faster yet (at least, on my P4 box). static long string_hash6(register unsigned short *p, int size) { register short s; register unsigned long x = 0; if (size == 0) return 0; len = (size+1) >> 1; while (len--) { x += (x>>14) + (*p++ * 0xd2d84a61); } x += (x>>14) + (size*0xd2d84a61); return x; } Ive tested these functions by hashing a large set of strings generated by instrumenting the python interpeter to emit a string every time a string is added to a dictionary. These strings are hashed and thrown into the buckets of various sized tables. I then calculate sigma statistics (sum((observed-expected)^2)/(N-1)) for the number of items in the buckets of those tables. Im not sure what other tests to try out. Im hoping that someone on c.l.py has some experience in testing hash functions, and can suggest some additional tests and/or tweaks. Jul 18 '05 #1 