473,735 Members | 1,875 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Why MULT 31 (hash function for string)?

Hi there,
There's a classic hash function to hash strings, where MULT is defined
as "31":

//from programming pearls
unsigned int hash(char *ptr)
{ unsigned int h = 0;
unsigned char *p = ptr;
int n;
for (n = k; n 0; p++) {
h = MULT * h + *p;
if (*p == 0)
n--;
}
return h % NHASH;
}

Why MULT defined as 31? ( How about 29? 24? or 26? )
Thanks,
Wenjie

Sep 19 '06 #1
44 6726
go****@yahoo.co m said:
Hi there,
There's a classic hash function to hash strings, where MULT is defined
as "31":
<snip>
>
Why MULT defined as 31? ( How about 29? 24? or 26? )
Why not find out yourself? Generic hashing routines are intended to get you
a decent spread of hashes for arbitrary data. So cook up a few million
records of arbitrary data, and see what kind of spreads you get for various
multipliers.

If you call it research, maybe you can even fool someone into paying you.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Sep 19 '06 #2
go****@yahoo.co m wrote:
Hi there,
There's a classic hash function to hash strings, where MULT is defined
as "31":

//from programming pearls
unsigned int hash(char *ptr)
{ unsigned int h = 0;
unsigned char *p = ptr;
int n;
for (n = k; n 0; p++) {
h = MULT * h + *p;
if (*p == 0)
n--;
}
return h % NHASH;
}
It looks like this code was probably mis-transcribed.
There's this undefined quantity `k' floating around, and
the behavior at end-of-string can only be called broken.
But that doesn't seem central to your question:
Why MULT defined as 31? ( How about 29? 24? or 26? )
It's a mixture of superstition and good sense.

First, the superstition: People who write code having
to do with hash tables apparently recall that prime numbers
are particularly "good" for them. It seems they don't always
remember just what the "goodness" was or in what connection,
but they'll throw prime numbers into the mix whenever they
can. They'll throw in prime numbers even if they're not too
sure what a prime number is! A colleague of mine once ran
across this little coding gem:

#define HASHSIZE 51 /* a smallish prime */

Second, the good sense: Suppose MULT were 26, and consider
hashing a hundred-character string. How much influence does
the string's first character have on the final value of `h',
just before the mod operation? The first character's value
will have been multiplied by MULT 99 times, so if the arithmetic
were done in infinite precision the value would consist of some
jumble of bits followed by 99 low-order zero bits -- each time
you multiply by MULT you introduce another low-order zero, right?
The computer's finite arithmetic just chops away all the excess
high-order bits, so the first character's actual contribution to
`h' is ... precisely zero! The `h' value depends only on the
rightmost 32 string characters (assuming a 32-bit int), and even
then things are not wonderful: the first of those final 32 bytes
influences only the leftmost bit of `h' and has no effect on
the remaining 31. Clearly, an even-valued MULT is a poor idea.

Need MULT be prime? Not as far as I know (I don't know
everything); any odd value ought to suffice. 31 may be attractive
because it is close to a power of two, and it may be easier for
the compiler to replace a possibly slow multiply instruction with
a shift and subtract (31*x == (x << 5) - x) on machines where it
makes a difference. Setting MULT one greater than a power of two
(e.g., 33) would also be easy to optimize, but might produce too
"simple" an arrangement: mostly a juxtaposition of two copies
of the original set of bits, with a little mixing in the middle.
So you want an odd MULT that has plenty of one-bits.

You'd also like a MULT that "smears" its operand bits across
as much of `h' as you can manage, so MULT shouldn't be too small
(consider the degenerate case of MULT==1). Also, MULT shouldn't
be too large -- to put it differently, UINT_MAX-MULT shouldn't be
too small. How small is "too small" depends to some extent on
the expected lengths of the strings; I suspect 31 is "too small"
if the strings are short (the values won't have time to invade
the high-order part of `h'). I think it would be wiser to choose
MULT somewhere between sqrt(UINT_MAX) and UINT_MAX-sqrt(UINT_MAX),
making sure it's odd and has lots of one-bits. Primality doesn't
seem important here -- but perhaps someone else may offer a good
reason to choose a prime. Sometimes superstition has valid roots.

--
Eric Sosman
es*****@acm-dot-org.invalid

Sep 19 '06 #3
Thanks Eric for a good rundown of hash multipliers. A lot of us don't
have a good feel for the tradeoffs in this area.
FWIW: Many of the Borland compilers (wickedly fast if you don't already
know), used a multiplier of "13" for hashing.

Sep 19 '06 #4
Eric Sosman said:
A colleague of mine once ran
across this little coding gem:

#define HASHSIZE 51 /* a smallish prime */
<snorfl>

1 is prime, 3 is prime, 5 is prime, 7 is prime, 9 is prime, 11 is prime...

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Sep 19 '06 #5
Eric Sosman wrote:
It looks like this code was probably mis-transcribed.
There's this undefined quantity `k' floating around, and
the behavior at end-of-string can only be called broken.
But that doesn't seem central to your question:
int k =2; /*defined before the function, from the book: programming
pearl*/
Why MULT defined as 31? ( How about 29? 24? or 26? )

It's a mixture of superstition and good sense.
<snip>

Thanks for your comments Eric. It is very helpful. Complementary to
your comments on prime number hash table size, Indeed I just found a
web page:
http://www.concentric.net/~Ttwang/tech/primehash.htm

Sep 19 '06 #6
Richard Heathfield wrote:
go****@yahoo.co m said:
Hi there,
There's a classic hash function to hash strings, where MULT is defined
as "31":

<snip>

Why MULT defined as 31? ( How about 29? 24? or 26? )

Why not find out yourself? Generic hashing routines are intended to get you
a decent spread of hashes for arbitrary data. So cook up a few million
records of arbitrary data, and see what kind of spreads you get for various
multipliers.
There's an alternative way to discuss it on the usenet.
>
If you call it research, maybe you can even fool someone into paying you.
Sometimes "Usenet is a strange place."

Sep 19 '06 #7
Eric Sosman wrote:
You'd also like a MULT that "smears" its operand bits across
as much of `h' as you can manage, so MULT shouldn't be too small
(consider the degenerate case of MULT==1). Also, MULT shouldn't
be too large -- to put it differently, UINT_MAX-MULT shouldn't be
too small. How small is "too small" depends to some extent on
the expected lengths of the strings; I suspect 31 is "too small"
if the strings are short (the values won't have time to invade
the high-order part of `h').
I have had some success using the multiplier from some "approved"
linear-conguential PRNG before -- even with "difficult" input data such
as words which are all 3-5 letters and all capitals. I think of it as
a "peturbed" random number generator, in that the input values (chars,
or whatever) replace the additive step in the PRNG. I don't know how
much justification there really is for that way of thinking, but I find
it reassuring...

E.g. (digging out some old C code)

static HASH
lc_hash(uchar *ptr)
{
HASH hash;

for (hash = 0; *ptr; ptr++)
{
hash += *ptr;
hash *= 597485621;
}

return hash;
}

(HASH is some kind of unsigned integer)

-- chris
Sep 19 '06 #8
I wrote:
I have had some success using the multiplier from some "approved"
linear-conguential PRNG before -- even with "difficult" input data
such as words which are all 3-5 letters and all capitals.
I forgot to add that by "some success" I meant that I measured the
distributions of final values for a range of table sizes and found (a)
no apparent significant divergence from randomness and (b) no apparent
tendency to "resonate" with any particular sizes. That was using
real-world sized samples of real-world data.

So I used power-of-two sized hashtables, which I'm sure that Eric, as a
founder member and leading light of the Campaign To Stamp Out Primes,
would find pleasing -- were it not that he is also active their sister
organisation: the Campaign To Stamp Out Powers Of Two. ;-)

-- chris
Sep 19 '06 #9

Richard Heathfield wrote:
Eric Sosman said:
A colleague of mine once ran
across this little coding gem:

#define HASHSIZE 51 /* a smallish prime */

<snorfl>

1 is prime, 3 is prime, 5 is prime, 7 is prime, 9 is prime, 11 is prime...
When picking a prime for a middling-size table, I just recall that
Knuth's MIX language (a horribly ancient IBM 650 emu-alike) also
denotes a prime in roman numerals. If that's not big enough, the
International Motherhood of Multiple Marine Mammal Machinists is also a
prime.

Sep 19 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
14188
by: Damien Morton | last post by:
Ive been doing some investigation of python hashtables and the hash function used in Python. It turns out, that when running pystone.py, as much time is spent in string_hash() as is spent in lookdict_string(). If you look at python's hash function, shown below, you can get an idea why - a multiply is used for every character in the string. static long string_hash1(register char *p, int size)
3
3337
by: Markus Dehmann | last post by:
I have a class "Data" and I store Data pointers in an STL set. But I have millions of inserts and many more lookups, and my profiler found that they cost a lot of runtime. Therefore, I want to substitute the set<Data*> with a hash_set<Data*>: typedef hash_set<const Data*, hash<const Data*>, eqData> DataPointerHashSet; // typedef set<Data*> DataPointerHashSet; // (see complete code below) But it doesn't work! Everything is fine...
4
7254
by: flipdog | last post by:
Hello all, I didn't know there is a thread on hash function started in this newsgroup so reposted my posting from another group. Hope I can have some feedbacks. I am new to hash table. I came across a very well presented tutorial web page wrt hash table. In its content, it listed a number of hash functions which the web master(?) quoted from other web sites. So no explainations for these hash functions and I failed to understand a...
6
8181
by: barcaroller | last post by:
I'm looking for a hash function (in C) that will convert a string of arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit short integer. Can anyone point me to such a hash function?
6
17812
by: Frank King | last post by:
Hi, I am looking for a hash function to map string to string in VB. Could somebody give me some informtion? Thank you very much. fk
4
4909
by: yuyang08 | last post by:
Hello, everyone, I am wondering what is the hash function that is used in hash_map/hash_set. Can I replace it with my own hash function? Any comments on this? Thanks! -Andy
21
3900
by: Hallvard B Furuseth | last post by:
Is the code below valid? Generally a value must be accessed through the same type it was stored as, but there is an exception for data stored through a character type. I'm not sure if that applies in this case though: #include <limits.h> unsigned foo(void) { static const union { unsigned char str;
1
4887
by: aphrodite | last post by:
I have to write a program to manage a hash table using collision resolution by chaining, with numeric long integer and string keys and where the hash function should use polynomial accumulation. So...how exactly should I make this function? I understand all the other ways to make a hash function except this one. Laura
24
3306
by: Alexander Mahone | last post by:
Hello, I'm looking for an hash function to be used for an hash table that will contain structs of a certain kind. I've looked into Sourceforge.net, but so far I've found only hash functions for strings (string->index)...Do you know if there exist such a function somewhere? Thanks
0
8786
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9466
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9330
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9255
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9202
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6748
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6050
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
2
2741
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2191
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.