John Devereux wrote:

we******@gmail.com writes: barcaroller wrote: I'm looking for a hash function (in C) that will convert a string of

arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit

short integer. Can anyone point me to such a hash function?

You could use the same hash function that's used in Macromedia's flash,

Apple's Safari and Konqueror:

http://www.pobox.com/~qed/hash.html

You can just use the bottom 16 bits (cast it to uint16_t) and it will

retain all the properties of a useful hash.

That's interesting. Can you see any situation where crc32 would be

"better" than this hash, for error detection in communications?

This hash is not designed for that. Its designed for performance. If

your error mode is complete differences in the data, or single bit

errors, then indeed my hash will help you detect the problem, but it

offers little more than this. This hash has a "0-stick" problem.

I.e., if you force all the accumulator bits to 0 (which you need to

solve for, but its trivial if you start with a 0 initial state) then

subsequent 0's in blocks of 32 bits will retain the accumulator as 0,

regardless of how long it is. So you know that all data of the form:

"<solved-header>000 .... 00<data>" will hash to the same value

regardless of how many 0's (in blocks of 32 bits at a time) are in that

sequence of 0s in the middle, and regardless of what data you end with.

More serious hash functions like MD5, or SHA do not have this problem.

(But to be honest, since CRC's use a single accumulator, I don't see

how they don't have exactly the same problem as SFH.) I think Bob

Jenkin's original hash function might be more robust to the problem

since he uses a 96 bit internal state, so *solving* for the initial

state may not be practical.

Also if the checksum has to be 16 bits (which the OP asked for) then

you must choose a crc16, *not* a crc32. The reason is that crc32 does

not have the full avalanching property. You might be able to *give* it

the avalanche property with a technique similar to the one I use for

SFH(), however, crc16s are just as easily constructed as crc32s and, as

I recall, execute with the same speed.

Error detection depends a lot on the situation you are in, of course.

For example, the AMD Opteron processor uses mere bit parity to ensure

the integrity of its I-cache (since recovery merely requires a reload

from memory) but uses full ECC on its D-cache (since both read and

write activity happens in the D-cache, which can be wildly out of synch

with the real memory contents).

--

Paul Hsieh

http://www.pobox.com/~qed/ http://bstring.sf.net/