473,473 Members | 1,614 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Hash Function (str -> 32 bits)

I couldn't find a message-digest newsgroup, so I posted here. I have a C
function that converts a string of arbitrary length to a 32-bit hash value.
I realize this is overkill but I used OpenSSL's sha1() to convert the string
to a SHA-1 160-bit message digest.

The question is: how do I use these 160 bits to get my final 32 bits?
Should I use the first 32 bits or the last 32 bits or is there yet another
method?

Note: I understand that the 32-bit result is much more likely to cause
collisions than the SHA-1 message digest.

Feb 19 '06 #1
6 3844
barcaroller wrote:
I couldn't find a message-digest newsgroup, so I posted here.
sci.crypt is the group you want. But comp.programming would have
worked just as well.
[...] I have a C
function that converts a string of arbitrary length to a 32-bit hash value.
I realize this is overkill but I used OpenSSL's sha1() to convert the string
to a SHA-1 160-bit message digest.
Yes that is overkill. If all you want is 32 bits, then you can use the
function found here:

http://www.pobox.com/~qed/hash.html

Its a smaller and much faster piece of code.
The question is: how do I use these 160 bits to get my final 32 bits?
Should I use the first 32 bits or the last 32 bits or is there yet another
method?


You may pick *any* of the 32 bits in the digest. That's kind of the
point of it.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Feb 19 '06 #2

<we******@gmail.com> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
Yes that is overkill. If all you want is 32 bits, then you can use the
function found here:

http://www.pobox.com/~qed/hash.html

Its a smaller and much faster piece of code.

You may pick *any* of the 32 bits in the digest. That's kind of the
point of it.

Thanks for your response. Two questions for you:

1. Does the function you mentioned cause less collisions than SHA-1
converted to 32 bits?

2. Are you saying that choosing the last 32 bits or the first 32 bits in the
SHA-1 message digest would (on average) result in the same number of
collisions?

Feb 19 '06 #3
barcaroller wrote:
<we******@gmail.com> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
Yes that is overkill. If all you want is 32 bits, then you can use the
function found here:

http://www.pobox.com/~qed/hash.html

Its a smaller and much faster piece of code.

You may pick *any* of the 32 bits in the digest. That's kind of the
point of it.
Thanks for your response. Two questions for you:

1. Does the function you mentioned cause less collisions than SHA-1
converted to 32 bits?


It depends on the situation. If you are trying to formulaically attack
the hash, then SHA-1 is clearly better, since you require a brute force
attack for *each* collision (my hash will fall to deterministic attacks
which can be generalized to any size). If you are expecting random,
uniform, or otherwise typical data, then they will behave practically
the same.
2. Are you saying that choosing the last 32 bits or the first 32 bits in the
SHA-1 message digest would (on average) result in the same number of
collisions?


Yes -- minimal.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Feb 19 '06 #4
barcaroller wrote:

I couldn't find a message-digest newsgroup, so I posted here. I
have a C function that converts a string of arbitrary length to a
32-bit hash value. I realize this is overkill but I used OpenSSL's
sha1() to convert the string to a SHA-1 160-bit message digest.

The question is: how do I use these 160 bits to get my final 32
bits? Should I use the first 32 bits or the last 32 bits or is
there yet another method?


You can find some simple yet effective string hashing functions in
the source code for hashlib, together with some references to other
possible functions. Hashlib is at:

<http://cbfalconer.home.att.net/download/hashlib.zip>

Since hashlib was originally developed to try out hash functions,
it can still be used for the same purpose. It collects suitable
statistics during a run.

Hashlib is implemented in C, but this general subject is better
suited to comp.programming.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
Feb 19 '06 #5

In article <11**********************@g44g2000cwa.googlegroups .com>, we******@gmail.com writes:
barcaroller wrote:
2. Are you saying that choosing the last 32 bits or the first 32 bits in the
SHA-1 message digest would (on average) result in the same number of
collisions?

As Paul already noted, this is off-topic for comp.lang.c, and SHA-1
is probably not well-suited to your needs.
Yes -- minimal.


More formally, you would have a collision with probability 1/2 after
hashing 2**16 objects in this fashion, per the "birthday paradox".
That's the best you can do for a pseudorandom hash, or for any hash
over random input. (If you know your input is biased, you can bias
your hash accordingly and reduce chances of collision further.)

Note this assumes that SHA-1 is not broken, or is broken in a fashion
that does not bias it.

SHA-1 is a cryptographic message digest. Any non-broken crypto
digest of length N bits must have the full avalanche property, or
it's advertising more bits of strength than it actually has. That
means each input bits affects each output bit independently with
probability 1/2. And that means that the output bits in the final
result are independent.

It doesn't matter how you reduce those N bits (assuming N > 32) to 32
bits, so long as you allow each of your output bits to be fairly
determined by at least bit of entropy from those N bits. You can
take any 32 bits from the digest; you can take 64 bits as two groups
of 32 and XOR them together. Makes no difference.

Obviously you can misapply the input bits and reduce entropy, eg by
ORing two groups of 32 bits together, which will bias bits toward 1.
But any 32 bits picked at random from the output of a non-broken
digest will be equally well-distributed. They each have value 1 with
probability 1/2.
--
Michael Wojcik mi************@microfocus.com

Web 2.0 is ... like talking to people - without the pesky annoyance of
other people. -- Martin Wood
Feb 21 '06 #6
In article <dt*******@news3.newsguy.com>,
Michael Wojcik <mw*****@newsguy.com> wrote:
In article <11**********************@g44g2000cwa.googlegroups .com>, we******@gmail.com writes:
barcaroller wrote:
> 2. Are you saying that choosing the last 32 bits or the first 32 bits in the
> SHA-1 message digest would (on average) result in the same number of
> collisions?
As Paul already noted, this is off-topic for comp.lang.c, and SHA-1
is probably not well-suited to your needs. More formally, you would have a collision with probability 1/2 after
hashing 2**16 objects in this fashion, per the "birthday paradox".


Minor OT correction: you would need 1L<<16 * pow(2*log(2),1/2.)
That's about 1.1774 times larger than your count.
--
I was very young in those days, but I was also rather dim.
-- Christopher Priest
Feb 21 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Damien Morton | last post by:
Ive been doing some investigation of python hashtables and the hash function used in Python. It turns out, that when running pystone.py, as much time is spent in string_hash() as is spent in...
3
by: Markus Dehmann | last post by:
I have a class "Data" and I store Data pointers in an STL set. But I have millions of inserts and many more lookups, and my profiler found that they cost a lot of runtime. Therefore, I want to...
4
by: flipdog | last post by:
Hello all, I didn't know there is a thread on hash function started in this newsgroup so reposted my posting from another group. Hope I can have some feedbacks. I am new to hash table. I came...
6
by: barcaroller | last post by:
I'm looking for a hash function (in C) that will convert a string of arbitrary length (but less than 1024 chars) to a reasonably-unique 16-bit short integer. Can anyone point me to such a hash...
22
by: vietor.liu | last post by:
is this? int hashpjw(char* str,int size) { unsigned long h=0,g; for(unsigned char *p=(unsigned char*)str;*p;++p) { h=(h<<4)+*p; if(g=(h&0xf0000000)) {
12
by: shaanxxx | last post by:
I wanted to write hash function float or double. Any suggestion would be appreciated.
44
by: gokkog | last post by:
Hi there, There's a classic hash function to hash strings, where MULT is defined as "31": //from programming pearls unsigned int hash(char *ptr) { unsigned int h = 0; unsigned char *p =...
21
by: Hallvard B Furuseth | last post by:
Is the code below valid? Generally a value must be accessed through the same type it was stored as, but there is an exception for data stored through a character type. I'm not sure if that...
2
by: Markus Dehmann | last post by:
Hi, I'd like to hash std::vector<intobjects using hash_set, but I'm not sure what hash function to use. Does anybody have a suggestion? Thank you! Markus
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.