468,283 Members | 1,602 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,283 developers. It's quick & easy.

python hash function

Ive been doing some investigation of python hashtables and the hash
function used in Python.

It turns out, that when running pystone.py, as much time is spent in
string_hash() as is spent in lookdict_string(). If you look at
python's hash function, shown below, you can get an idea why - a
multiply is used for every character in the string.

static long
string_hash1(register char *p, int size)
register int len;
register long x;

len = size;
x = *p << 7;
while (--len >= 0)
x = (100003*x) ^ *p++;
x ^= size;
return x;

Looking around the web I found the following hash function, which is
much faster, and seems to distribute better than the python hash

static long
string_hash3(register char *p, register int size)
register long x;
register char c;

x = 0xd2d84a61;
while (c = *p++)
x ^= ( (x<<7) + c + (x>>5) );
return x;

I also came up with a hash function of my own, which seems to
distribute near-perfectly (in the sense of being comparable to
assigning, as hashes, random numbers to unique strings) and be faster
yet (at least, on my P4 box).

static long
string_hash6(register unsigned short *p, int size)
register short s;
register unsigned long x = 0;

if (size == 0) return 0;

len = (size+1) >> 1;
while (len--) {
x += (x>>14) + (*p++ * 0xd2d84a61);
x += (x>>14) + (size*0xd2d84a61);

return x;

Ive tested these functions by hashing a large set of strings generated
by instrumenting the python interpeter to emit a string every time a
string is added to a dictionary. These strings are hashed and thrown
into the buckets of various sized tables. I then calculate sigma
statistics (sum((observed-expected)^2)/(N-1)) for the number of items
in the buckets of those tables.

Im not sure what other tests to try out. Im hoping that someone on
c.l.py has some experience in testing hash functions, and can suggest
some additional tests and/or tweaks.
Jul 18 '05 #1
1 13676
go*************@bitfurnace.com (Damien Morton) writes:
Ive been doing some investigation of python hashtables and the hash
function used in Python.


Have you had a look at the python-dev archives? Bound to be stuff
there about this general area.
Jul 18 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

6 posts views Thread by Juho Saarikko | last post: by
3 posts views Thread by fdsl ysnh | last post: by
12 posts views Thread by Arash Partow | last post: by
852 posts views Thread by Mark Tarver | last post: by
22 posts views Thread by sapsi | last post: by
reply views Thread by NPC403 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.