473,498 Members | 2,058 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

python hash function

Ive been doing some investigation of python hashtables and the hash
function used in Python.

It turns out, that when running pystone.py, as much time is spent in
string_hash() as is spent in lookdict_string(). If you look at
python's hash function, shown below, you can get an idea why - a
multiply is used for every character in the string.

static long
string_hash1(register char *p, int size)
{
register int len;
register long x;

len = size;
x = *p << 7;
while (--len >= 0)
x = (100003*x) ^ *p++;
x ^= size;
return x;
}

Looking around the web I found the following hash function, which is
much faster, and seems to distribute better than the python hash
function.

static long
string_hash3(register char *p, register int size)
{
register long x;
register char c;

x = 0xd2d84a61;
while (c = *p++)
x ^= ( (x<<7) + c + (x>>5) );
return x;
}

I also came up with a hash function of my own, which seems to
distribute near-perfectly (in the sense of being comparable to
assigning, as hashes, random numbers to unique strings) and be faster
yet (at least, on my P4 box).

static long
string_hash6(register unsigned short *p, int size)
{
register short s;
register unsigned long x = 0;

if (size == 0) return 0;

len = (size+1) >> 1;
while (len--) {
x += (x>>14) + (*p++ * 0xd2d84a61);
}
x += (x>>14) + (size*0xd2d84a61);

return x;
}

Ive tested these functions by hashing a large set of strings generated
by instrumenting the python interpeter to emit a string every time a
string is added to a dictionary. These strings are hashed and thrown
into the buckets of various sized tables. I then calculate sigma
statistics (sum((observed-expected)^2)/(N-1)) for the number of items
in the buckets of those tables.

Im not sure what other tests to try out. Im hoping that someone on
c.l.py has some experience in testing hash functions, and can suggest
some additional tests and/or tweaks.
Jul 18 '05 #1
1 14157
go*************@bitfurnace.com (Damien Morton) writes:
Ive been doing some investigation of python hashtables and the hash
function used in Python.

[...]

Have you had a look at the python-dev archives? Bound to be stuff
there about this general area.
John
Jul 18 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
2999
by: Juho Saarikko | last post by:
The program attached to this message makes the Python interpreter segfault randomly. I have tried both Python 2.2 which came with Debian Stable, and self-compiled Python 2.3.3 (newest I could find...
3
2808
by: fdsl ysnh | last post by:
--- python-list-request@python.orgдµÀ: > Send Python-list mailing list submissions to > python-list@python.org > > To subscribe or unsubscribe via the World Wide Web, > visit >...
12
6969
by: Arash Partow | last post by:
Hi all, I've ported various hash functions to python if anyone is interested: def RSHash(key): a = 378551 b = 63689 hash = 0
852
27933
by: Mark Tarver | last post by:
How do you compare Python to Lisp? What specific advantages do you think that one has over the other? Note I'm not a Python person and I have no axes to grind here. This is just a question for...
13
2992
by: Chris Carlen | last post by:
Hi: I have begun learning Python by experimenting with the code snippets here: http://hetland.org/writing/instant-python.html In the section on functions, Magnus Lie Hetland writes: ...
22
2062
by: sapsi | last post by:
Hello, I recently tried using the set function in Python and was surprised to find that a= ] doesn't work with 'set', throwing TyperError (unhashable exception). I found out that this is...
0
7121
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7197
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
7375
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
4899
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
3088
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3078
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1411
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
650
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
287
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.