472,127 Members | 1,699 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,127 software developers and data experts.

Fast 1 -> 1 non md5 hash algorithm

Hi,

I'm looking for a fast and simple one to one hash function, suitable
for longer strings (up to 2048 in length). I'd like keys to be
relatively short, I doubt I'd be creating more than 256 keys..

I could use md5 but even that is more than I need, and I assume not the
fastest algorithm (?), so I'm reluctant to use it.

I've been looking around a lot, not finding much anything come to mind?

thanks!
Calin

Jan 9 '06 #1
6 3297


th************@gmail.com wrote On 01/09/06 16:17,:
Hi,

I'm looking for a fast and simple one to one hash function, suitable
for longer strings (up to 2048 in length). I'd like keys to be
relatively short, I doubt I'd be creating more than 256 keys..

I could use md5 but even that is more than I need, and I assume not the
fastest algorithm (?), so I'm reluctant to use it.

I've been looking around a lot, not finding much anything come to mind?


The first thing that comes to mind is that you've found
the wrong newsgroup. comp.lang.c is about the C programming
language, not about algorithms.

The second thing that comes to mind is that a one-to-one
hash function can be extremely simple and extremely fast:
just use the hashed object as its own hash code. (Note that
a code with either fewer or more bits than the original cannot
be one-to-one.)

The third thing that comes to mind is that since the
second thing didn't come to your mind, you may not have a
clear idea of what you're trying to do.

--
Er*********@sun.com

Jan 9 '06 #2
In article <11*********************@g49g2000cwa.googlegroups. com>,
<th************@gmail.com> wrote:
I'm looking for a fast and simple one to one hash function, suitable
for longer strings (up to 2048 in length). I'd like keys to be
relatively short, I doubt I'd be creating more than 256 keys.. I could use md5 but even that is more than I need, and I assume not the
fastest algorithm (?), so I'm reluctant to use it.


There are lot of different hash functions. In order to be able to
recommend one, we would have to know:

- do you need a hash function that is *certain* not to produce
the same hash for values you wish to distinguish; or

- are you using a "bucket" hashing system; or

- are you just going to fill the next available bucket when you have
a hash collision; or

- are you going to use a series of rehash functions to attempt to
resolve hash collisions;

- are you truly looking for a hash function, or are you looking
for a digital signature?

- is this a case in which you can take your time hashing during the
data analysis phase, but then expect to retrieve the data over and
over again, so theoretical read performance is much more important than
time to insert into hash structures? If so, then have you considered
"perfect hash" ?

- would it be acceptable to put an arbitrary limit on the number of
keys if by doing so you could create a noticably more efficient hash?
For example, would either of 251 or 257 be acceptable limits?

- if you find out later that you need more than 256 keys, must it be
easy to expand the maximum number of keys, or would it be acceptable for
noticable "work" (by the computer) to be involved? Though in this
particular case it sounds like you won't be needing more than 512 Kb for
your hash, in the general case someone might be hashing over a million
entries each of which might be a couple of hundred megabytes, and
so for them, it could be quite important that as the number of keys
expanded that the previously processed objects continue to hash into
exactly the same disk storage location that they did before.
--
"No one has the right to destroy another person's belief by
demanding empirical evidence." -- Ann Landers
Jan 9 '06 #3
Thanks for your reply.
- do you need a hash function that is *certain* not to produce
the same hash for values you wish to distinguish; or
Yes
- are you using a "bucket" hashing system; or
No
- are you just going to fill the next available bucket when you have
a hash collision; or
Yes
- are you going to use a series of rehash functions to attempt to
resolve hash collisions;
No
- are you truly looking for a hash function, or are you looking
for a digital signature?
More a digital signature i suppose, not a hash function as is normally
used with chained hash table & buckets where some collisions are
acceptable etc.
- is this a case in which you can take your time hashing during the
data analysis phase, but then expect to retrieve the data over and
over again, so theoretical read performance is much more important than
time to insert into hash structures?
Yes...but time to insert can't be "slow" ...anything that might take
more than 1 sec would be too slow..though obviously that'd vary widely
on the machine

If so, then have you considered "perfect hash" ?
I noticed that but wasn't sure about it, I'll look into it further
thank you.

- would it be acceptable to put an arbitrary limit on the number of
keys if by doing so you could create a noticably more efficient hash?
For example, would either of 251 or 257 be acceptable limits?

definately

- if you find out later that you need more than 256 keys, must it be
easy to expand the maximum number of keys, or would it be acceptable for
noticable "work" (by the computer) to be involved?


Noticable work would be fine, that's an unlikely case and speed is
paramount.
thanks very much!
Calin

Jan 10 '06 #4


In article <11*********************@g14g2000cwa.googlegroups. com>, th************@gmail.com writes:

[Please preserve attributions for quoted material.]
- do you need a hash function that is *certain* not to produce
the same hash for values you wish to distinguish; or


Yes
- are you just going to fill the next available bucket when you have
a hash collision; or


Yes


This is inconsistent with your first answer. If the hash function is
guaranteed to avoid collisions, then the bucket will always be
available.
- are you truly looking for a hash function, or are you looking
for a digital signature?


More a digital signature i suppose, not a hash function as is normally
used with chained hash table & buckets where some collisions are
acceptable etc.


A hash function is not a digital signature. They are completely
different operations. Many digital signature algorithms are built
on top of a cryptographic hash (aka "message digest"), but not all
are (eg encrypt-and-sign modes), and a digital signature built on
a cryptographic hash is much more than just a hash.

I would submit that if you think you may be looking for any sort
of cryptographic primitive, you should know *exactly* what you
need, and where to find it. Otherwise, either you don't need crypto,
or you're going to use it incorrectly. The history of correct
crypto implemented by non-experts is very thin.
If so, then have you considered
"perfect hash" ?


I noticed that but wasn't sure about it, I'll look into it further
thank you.


Perfect hashing may be a solution for you.

In your first message you mentioned cryptographic hashes such as
MD5 and SHA-1. While they're very unlikely to accidentally produce
a collision, in practice it's now quite feasible to deliberately
produce MD5 collisions, and SHA-1 may not be far behind. And they
are not fast - in fact, they're deliberately slow, to make it more
difficult to mount certain kinds of attacks.

--
Michael Wojcik mi************@microfocus.com

This record comes with a coupon that wins you a trip around the world.
-- Pizzicato Five
Jan 12 '06 #5
On 12 Jan 2006 18:13:31 GMT, mw*****@newsguy.com (Michael Wojcik)
wrote:
<snip>
I would submit that if you think you may be looking for any sort
of cryptographic primitive, you should know *exactly* what you
need, and where to find it. Otherwise, either you don't need crypto,
or you're going to use it incorrectly. The history of correct
crypto implemented by non-experts is very thin.
Concur strongly.
In your first message you mentioned cryptographic hashes such as
MD5 and SHA-1. While they're very unlikely to accidentally produce
a collision, in practice it's now quite feasible to deliberately
produce MD5 collisions, and SHA-1 may not be far behind. And they
are not fast - in fact, they're deliberately slow, to make it more
difficult to mount certain kinds of attacks.


Crypto-hashes are not deliberately slow, and in fact modern
(implementations of) them are nearly as fast as CRCs even though they
(must!) "mix" the input data much "better". Where you want deliberate
slowness, like in PKCS5 (and 12) and IIRC in password "stretching"
protocols like SRP et al, you iterate the hash many times.

Some crypto operations like RSA (which is already pretty slow)
sometimes need to be implemented as constant-speed to prevent
"side-channel" (power and timing) attacks, but no hashes AFAIK.
Although, Bernstein has recently claimed a timing attack on a popular
speedup for the new US standard symmetric cipher AES or Rijndael,
although I am among those skeptical his prerequisites hold widely.

- David.Thompson1 at worldnet.att.net
Jan 23 '06 #6

In article <u1********************************@4ax.com>, Dave Thompson <da*************@worldnet.att.net> writes:
On 12 Jan 2006 18:13:31 GMT, mw*****@newsguy.com (Michael Wojcik)
wrote:
In your first message you mentioned cryptographic hashes such as
MD5 and SHA-1. While they're very unlikely to accidentally produce
a collision, in practice it's now quite feasible to deliberately
produce MD5 collisions, and SHA-1 may not be far behind. And they
are not fast - in fact, they're deliberately slow, to make it more
difficult to mount certain kinds of attacks.
Crypto-hashes are not deliberately slow, and in fact modern
(implementations of) them are nearly as fast as CRCs even though they
(must!) "mix" the input data much "better". Where you want deliberate
slowness, like in PKCS5 (and 12) and IIRC in password "stretching"
protocols like SRP et al, you iterate the hash many times.


Right, right. I was actually thinking of older digest algorithms
like Snefru and MD2 that used iteration as a strengthening factor,
and key expansion, and somehow generalized that into "digests are
deliberately slow to increase the work factor of brute-force image
searches", which clearly doesn't make much sense.
Some crypto operations like RSA (which is already pretty slow)
sometimes need to be implemented as constant-speed to prevent
"side-channel" (power and timing) attacks, but no hashes AFAIK.


I imagine that if a digest used operations with variable power or
time costs, side-channel attacks would be an issue; but since the
data-dependent operations of MD5, SHA-*, and RIPE-MD (which represent
most real-world commercial use) are bitwise operations or integer
addition on fixed-sized quantitites, all general-purpose CPUs will
perform them at constant time and dissipation.

Digest functions built from block ciphers or asymmetric ciphers could
be vulnerable to side-channel attacks if the underlying ciphers are,
assuming the channel was modulated by the data being digested.

But a side-channel attack against a hash probably has a smaller
attack surface than one for encryption does; leaking some information
about the preimage seems to be of limited use for most hashing
applications, for example.

Anyway, this has drifted off-topic for the group, though if the OP
is still reading he may by now have gotten the impression that this
is not a trivial do-it-yourself task.

--
Michael Wojcik mi************@microfocus.com

There are some, who do not understand true enjoyment, will tell you that
rules spoil convivial meetings, and that a merry company becomes a dull
committee as soon as it is called a club. Do not believe them: the
precedents are all against them. -- Arthur Ransome
Jan 23 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

18 posts views Thread by Michele Simionato | last post: by
reply views Thread by Dean J Garrett | last post: by
8 posts views Thread by Neil | last post: by
6 posts views Thread by G.Esmeijer | last post: by
10 posts views Thread by javuchi | last post: by
95 posts views Thread by hstagni | last post: by
9 posts views Thread by Salad | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.