In article <11**********************@g44g2000cwa.googlegroups .com>,
we******@gmail.com writes:
barcaroller wrote:
2. Are you saying that choosing the last 32 bits or the first 32 bits in the
SHA-1 message digest would (on average) result in the same number of
collisions?
As Paul already noted, this is off-topic for comp.lang.c, and SHA-1
is probably not well-suited to your needs.
Yes -- minimal.
More formally, you would have a collision with probability 1/2 after
hashing 2**16 objects in this fashion, per the "birthday paradox".
That's the best you can do for a pseudorandom hash, or for any hash
over random input. (If you know your input is biased, you can bias
your hash accordingly and reduce chances of collision further.)
Note this assumes that SHA-1 is not broken, or is broken in a fashion
that does not bias it.
SHA-1 is a cryptographic message digest. Any non-broken crypto
digest of length N bits must have the full avalanche property, or
it's advertising more bits of strength than it actually has. That
means each input bits affects each output bit independently with
probability 1/2. And that means that the output bits in the final
result are independent.
It doesn't matter how you reduce those N bits (assuming N > 32) to 32
bits, so long as you allow each of your output bits to be fairly
determined by at least bit of entropy from those N bits. You can
take any 32 bits from the digest; you can take 64 bits as two groups
of 32 and XOR them together. Makes no difference.
Obviously you can misapply the input bits and reduce entropy, eg by
ORing two groups of 32 bits together, which will bias bits toward 1.
But any 32 bits picked at random from the output of a non-broken
digest will be equally well-distributed. They each have value 1 with
probability 1/2.
--
Michael Wojcik
mi************@microfocus.com
Web 2.0 is ... like talking to people - without the pesky annoyance of
other people. -- Martin Wood