455,431 Members | 1,845 Online
Need help? Post your question and get tips & solutions from a community of 455,431 IT Pros & Developers. It's quick & easy.

# Hash function (str -> uint16)

 P: n/a I'm looking for a hash function (in C) that will convert a string of arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit short integer. Can anyone point me to such a hash function? Nov 19 '05 #1
6 Replies

 P: n/a In article , barcaroller wrote: :I'm looking for a hash function (in C) that will convert a string of :arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit :short integer. Can anyone point me to such a hash function? If the string is arbitrary (other than, say, being composed of the 96 printable ASCII characters including newline but excluding DEL), then you are "reasonably-unique"ly trying to hash about 6743 bits worth of information into 16 bits, which is more than 421 times to small to hold that information. The result would be about 7.3 x 10^126 hash collisions per "reasonably-unique" 16-bit integer. Personally, I never accept more than 10^100 hash collisions per value as being *reasonably* unique. -- "No one has the right to destroy another person's belief by demanding empirical evidence." -- Ann Landers Nov 19 '05 #2

 P: n/a barcaroller wrote: I'm looking for a hash function (in C) that will convert a string of arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit short integer. Can anyone point me to such a hash function? You might look at http://www.partow.net/programming/ha...#HashAndPrimes which has lots of information on hashing. My approach, assuming I am using a processor with built-in multiply, would be, for each character, to multiply the previous hash value by a prime and add in the next character. Here is an (untested) example: unsigned short hash16 (char *s) { unsigned int h = 0; while(*s) { h = h*23131 + (unsigned char)*s++; } return (h & 0xffff); } -- Thad Nov 19 '05 #3

 P: n/a barcaroller wrote: I'm looking for a hash function (in C) that will convert a string of arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit short integer. Can anyone point me to such a hash function? You could use the same hash function that's used in Macromedia's flash, Apple's Safari and Konqueror: http://www.pobox.com/~qed/hash.html You can just use the bottom 16 bits (cast it to uint16_t) and it will retain all the properties of a useful hash. -- Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/ Nov 19 '05 #4

 P: n/a we******@gmail.com writes: barcaroller wrote: I'm looking for a hash function (in C) that will convert a string of arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit short integer. Can anyone point me to such a hash function? You could use the same hash function that's used in Macromedia's flash, Apple's Safari and Konqueror: http://www.pobox.com/~qed/hash.html You can just use the bottom 16 bits (cast it to uint16_t) and it will retain all the properties of a useful hash. That's interesting. Can you see any situation where crc32 would be "better" than this hash, for error detection in communications? -- John Devereux Nov 20 '05 #5

 P: n/a On 20 Nov 2005 18:34:29 +0000, John Devereux wrote in comp.lang.c: we******@gmail.com writes: barcaroller wrote: I'm looking for a hash function (in C) that will convert a string of arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit short integer. Can anyone point me to such a hash function? You could use the same hash function that's used in Macromedia's flash, Apple's Safari and Konqueror: http://www.pobox.com/~qed/hash.html You can just use the bottom 16 bits (cast it to uint16_t) and it will retain all the properties of a useful hash. That's interesting. Can you see any situation where crc32 would be "better" than this hash, for error detection in communications? I only know enough about the theory behind CRC algorithms to know that devising them should be left to experts. The best error detection algorithm to use for data takes into account the most likely error pattern for the path the data takes. For communications, this tends to be noise bursts which can corrupt a sequence of adjacent bits. A hash algorithm does not provide any easy method of adapting to different error patterns. A CRC does, merely by changing the polynomial. -- Jack Klein Home: http://JK-Technology.Com FAQs for comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html comp.lang.c++ http://www.parashift.com/c++-faq-lite/ alt.comp.lang.learn.c-c++ http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html Nov 21 '05 #6

 P: n/a John Devereux wrote: we******@gmail.com writes: barcaroller wrote: I'm looking for a hash function (in C) that will convert a string of arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit short integer. Can anyone point me to such a hash function? You could use the same hash function that's used in Macromedia's flash, Apple's Safari and Konqueror: http://www.pobox.com/~qed/hash.html You can just use the bottom 16 bits (cast it to uint16_t) and it will retain all the properties of a useful hash. That's interesting. Can you see any situation where crc32 would be "better" than this hash, for error detection in communications? This hash is not designed for that. Its designed for performance. If your error mode is complete differences in the data, or single bit errors, then indeed my hash will help you detect the problem, but it offers little more than this. This hash has a "0-stick" problem. I.e., if you force all the accumulator bits to 0 (which you need to solve for, but its trivial if you start with a 0 initial state) then subsequent 0's in blocks of 32 bits will retain the accumulator as 0, regardless of how long it is. So you know that all data of the form: "000 .... 00" will hash to the same value regardless of how many 0's (in blocks of 32 bits at a time) are in that sequence of 0s in the middle, and regardless of what data you end with. More serious hash functions like MD5, or SHA do not have this problem. (But to be honest, since CRC's use a single accumulator, I don't see how they don't have exactly the same problem as SFH.) I think Bob Jenkin's original hash function might be more robust to the problem since he uses a 96 bit internal state, so *solving* for the initial state may not be practical. Also if the checksum has to be 16 bits (which the OP asked for) then you must choose a crc16, *not* a crc32. The reason is that crc32 does not have the full avalanching property. You might be able to *give* it the avalanche property with a technique similar to the one I use for SFH(), however, crc16s are just as easily constructed as crc32s and, as I recall, execute with the same speed. Error detection depends a lot on the situation you are in, of course. For example, the AMD Opteron processor uses mere bit parity to ensure the integrity of its I-cache (since recovery merely requires a reload from memory) but uses full ECC on its D-cache (since both read and write activity happens in the D-cache, which can be wildly out of synch with the real memory contents). -- Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/ Nov 23 '05 #7

### This discussion thread is closed

Replies have been disabled for this discussion.