473,394 Members | 1,755 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Hash function (str -> uint16)

I'm looking for a hash function (in C) that will convert a string of
arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit
short integer. Can anyone point me to such a hash function?
Nov 19 '05 #1
6 7996
In article <Ap*******************@news20.bellglobal.com>,
barcaroller <ba*********@music.net> wrote:
:I'm looking for a hash function (in C) that will convert a string of
:arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit
:short integer. Can anyone point me to such a hash function?

If the string is arbitrary (other than, say, being composed of
the 96 printable ASCII characters including newline but excluding DEL),
then you are "reasonably-unique"ly trying to hash about 6743 bits
worth of information into 16 bits, which is more than 421 times
to small to hold that information. The result would be about
7.3 x 10^126 hash collisions per "reasonably-unique" 16-bit integer.

Personally, I never accept more than 10^100 hash collisions per
value as being *reasonably* unique.

--
"No one has the right to destroy another person's belief by
demanding empirical evidence." -- Ann Landers
Nov 19 '05 #2
barcaroller wrote:

I'm looking for a hash function (in C) that will convert a string of
arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit
short integer. Can anyone point me to such a hash function?


You might look at
http://www.partow.net/programming/ha...#HashAndPrimes
which has lots of information on hashing.

My approach, assuming I am using a processor with built-in multiply,
would be, for each character, to multiply the previous hash value by a
prime and add in the next character.
Here is an (untested) example:

unsigned short hash16 (char *s) {
unsigned int h = 0;
while(*s) {
h = h*23131 + (unsigned char)*s++;
}
return (h & 0xffff);
}
--
Thad
Nov 19 '05 #3
barcaroller wrote:
I'm looking for a hash function (in C) that will convert a string of
arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit
short integer. Can anyone point me to such a hash function?


You could use the same hash function that's used in Macromedia's flash,
Apple's Safari and Konqueror:

http://www.pobox.com/~qed/hash.html

You can just use the bottom 16 bits (cast it to uint16_t) and it will
retain all the properties of a useful hash.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 19 '05 #4
we******@gmail.com writes:
barcaroller wrote:
I'm looking for a hash function (in C) that will convert a string of
arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit
short integer. Can anyone point me to such a hash function?


You could use the same hash function that's used in Macromedia's flash,
Apple's Safari and Konqueror:

http://www.pobox.com/~qed/hash.html

You can just use the bottom 16 bits (cast it to uint16_t) and it will
retain all the properties of a useful hash.


That's interesting. Can you see any situation where crc32 would be
"better" than this hash, for error detection in communications?

--

John Devereux
Nov 20 '05 #5
On 20 Nov 2005 18:34:29 +0000, John Devereux
<jd******@THISdevereux.me.uk> wrote in comp.lang.c:
we******@gmail.com writes:
barcaroller wrote:
I'm looking for a hash function (in C) that will convert a string of
arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit
short integer. Can anyone point me to such a hash function?


You could use the same hash function that's used in Macromedia's flash,
Apple's Safari and Konqueror:

http://www.pobox.com/~qed/hash.html

You can just use the bottom 16 bits (cast it to uint16_t) and it will
retain all the properties of a useful hash.


That's interesting. Can you see any situation where crc32 would be
"better" than this hash, for error detection in communications?


I only know enough about the theory behind CRC algorithms to know that
devising them should be left to experts.

The best error detection algorithm to use for data takes into account
the most likely error pattern for the path the data takes. For
communications, this tends to be noise bursts which can corrupt a
sequence of adjacent bits.

A hash algorithm does not provide any easy method of adapting to
different error patterns. A CRC does, merely by changing the
polynomial.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Nov 21 '05 #6
John Devereux wrote:
we******@gmail.com writes:
barcaroller wrote:
I'm looking for a hash function (in C) that will convert a string of
arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit
short integer. Can anyone point me to such a hash function?


You could use the same hash function that's used in Macromedia's flash,
Apple's Safari and Konqueror:

http://www.pobox.com/~qed/hash.html

You can just use the bottom 16 bits (cast it to uint16_t) and it will
retain all the properties of a useful hash.


That's interesting. Can you see any situation where crc32 would be
"better" than this hash, for error detection in communications?


This hash is not designed for that. Its designed for performance. If
your error mode is complete differences in the data, or single bit
errors, then indeed my hash will help you detect the problem, but it
offers little more than this. This hash has a "0-stick" problem.
I.e., if you force all the accumulator bits to 0 (which you need to
solve for, but its trivial if you start with a 0 initial state) then
subsequent 0's in blocks of 32 bits will retain the accumulator as 0,
regardless of how long it is. So you know that all data of the form:
"<solved-header>000 .... 00<data>" will hash to the same value
regardless of how many 0's (in blocks of 32 bits at a time) are in that
sequence of 0s in the middle, and regardless of what data you end with.
More serious hash functions like MD5, or SHA do not have this problem.
(But to be honest, since CRC's use a single accumulator, I don't see
how they don't have exactly the same problem as SFH.) I think Bob
Jenkin's original hash function might be more robust to the problem
since he uses a 96 bit internal state, so *solving* for the initial
state may not be practical.

Also if the checksum has to be 16 bits (which the OP asked for) then
you must choose a crc16, *not* a crc32. The reason is that crc32 does
not have the full avalanching property. You might be able to *give* it
the avalanche property with a technique similar to the one I use for
SFH(), however, crc16s are just as easily constructed as crc32s and, as
I recall, execute with the same speed.

Error detection depends a lot on the situation you are in, of course.
For example, the AMD Opteron processor uses mere bit parity to ensure
the integrity of its I-cache (since recovery merely requires a reload
from memory) but uses full ECC on its D-cache (since both read and
write activity happens in the D-cache, which can be wildly out of synch
with the real memory contents).

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Nov 23 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Damien Morton | last post by:
Ive been doing some investigation of python hashtables and the hash function used in Python. It turns out, that when running pystone.py, as much time is spent in string_hash() as is spent in...
3
by: Markus Dehmann | last post by:
I have a class "Data" and I store Data pointers in an STL set. But I have millions of inserts and many more lookups, and my profiler found that they cost a lot of runtime. Therefore, I want to...
4
by: flipdog | last post by:
Hello all, I didn't know there is a thread on hash function started in this newsgroup so reposted my posting from another group. Hope I can have some feedbacks. I am new to hash table. I came...
1
by: sureshjayaram | last post by:
When I searched for a good hash function for strings, I came across DJB hash function and found it serving my purpose. I wanted to use that, but I was unaware of the licensing/copyright issues. I...
22
by: vietor.liu | last post by:
is this? int hashpjw(char* str,int size) { unsigned long h=0,g; for(unsigned char *p=(unsigned char*)str;*p;++p) { h=(h<<4)+*p; if(g=(h&0xf0000000)) {
6
by: barcaroller | last post by:
I couldn't find a message-digest newsgroup, so I posted here. I have a C function that converts a string of arbitrary length to a 32-bit hash value. I realize this is overkill but I used...
12
by: shaanxxx | last post by:
I wanted to write hash function float or double. Any suggestion would be appreciated.
21
by: Hallvard B Furuseth | last post by:
Is the code below valid? Generally a value must be accessed through the same type it was stored as, but there is an exception for data stored through a character type. I'm not sure if that...
2
by: Markus Dehmann | last post by:
Hi, I'd like to hash std::vector<intobjects using hash_set, but I'm not sure what hash function to use. Does anybody have a suggestion? Thank you! Markus
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.