473,240 Members | 1,713 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,240 software developers and data experts.

Fast 1 -> 1 non md5 hash algorithm

Hi,

I'm looking for a fast and simple one to one hash function, suitable
for longer strings (up to 2048 in length). I'd like keys to be
relatively short, I doubt I'd be creating more than 256 keys..

I could use md5 but even that is more than I need, and I assume not the
fastest algorithm (?), so I'm reluctant to use it.

I've been looking around a lot, not finding much anything come to mind?

thanks!
Calin

Jan 9 '06 #1
6 3384


th************@gmail.com wrote On 01/09/06 16:17,:
Hi,

I'm looking for a fast and simple one to one hash function, suitable
for longer strings (up to 2048 in length). I'd like keys to be
relatively short, I doubt I'd be creating more than 256 keys..

I could use md5 but even that is more than I need, and I assume not the
fastest algorithm (?), so I'm reluctant to use it.

I've been looking around a lot, not finding much anything come to mind?


The first thing that comes to mind is that you've found
the wrong newsgroup. comp.lang.c is about the C programming
language, not about algorithms.

The second thing that comes to mind is that a one-to-one
hash function can be extremely simple and extremely fast:
just use the hashed object as its own hash code. (Note that
a code with either fewer or more bits than the original cannot
be one-to-one.)

The third thing that comes to mind is that since the
second thing didn't come to your mind, you may not have a
clear idea of what you're trying to do.

--
Er*********@sun.com

Jan 9 '06 #2
In article <11*********************@g49g2000cwa.googlegroups. com>,
<th************@gmail.com> wrote:
I'm looking for a fast and simple one to one hash function, suitable
for longer strings (up to 2048 in length). I'd like keys to be
relatively short, I doubt I'd be creating more than 256 keys.. I could use md5 but even that is more than I need, and I assume not the
fastest algorithm (?), so I'm reluctant to use it.


There are lot of different hash functions. In order to be able to
recommend one, we would have to know:

- do you need a hash function that is *certain* not to produce
the same hash for values you wish to distinguish; or

- are you using a "bucket" hashing system; or

- are you just going to fill the next available bucket when you have
a hash collision; or

- are you going to use a series of rehash functions to attempt to
resolve hash collisions;

- are you truly looking for a hash function, or are you looking
for a digital signature?

- is this a case in which you can take your time hashing during the
data analysis phase, but then expect to retrieve the data over and
over again, so theoretical read performance is much more important than
time to insert into hash structures? If so, then have you considered
"perfect hash" ?

- would it be acceptable to put an arbitrary limit on the number of
keys if by doing so you could create a noticably more efficient hash?
For example, would either of 251 or 257 be acceptable limits?

- if you find out later that you need more than 256 keys, must it be
easy to expand the maximum number of keys, or would it be acceptable for
noticable "work" (by the computer) to be involved? Though in this
particular case it sounds like you won't be needing more than 512 Kb for
your hash, in the general case someone might be hashing over a million
entries each of which might be a couple of hundred megabytes, and
so for them, it could be quite important that as the number of keys
expanded that the previously processed objects continue to hash into
exactly the same disk storage location that they did before.
--
"No one has the right to destroy another person's belief by
demanding empirical evidence." -- Ann Landers
Jan 9 '06 #3
Thanks for your reply.
- do you need a hash function that is *certain* not to produce
the same hash for values you wish to distinguish; or
Yes
- are you using a "bucket" hashing system; or
No
- are you just going to fill the next available bucket when you have
a hash collision; or
Yes
- are you going to use a series of rehash functions to attempt to
resolve hash collisions;
No
- are you truly looking for a hash function, or are you looking
for a digital signature?
More a digital signature i suppose, not a hash function as is normally
used with chained hash table & buckets where some collisions are
acceptable etc.
- is this a case in which you can take your time hashing during the
data analysis phase, but then expect to retrieve the data over and
over again, so theoretical read performance is much more important than
time to insert into hash structures?
Yes...but time to insert can't be "slow" ...anything that might take
more than 1 sec would be too slow..though obviously that'd vary widely
on the machine

If so, then have you considered "perfect hash" ?
I noticed that but wasn't sure about it, I'll look into it further
thank you.

- would it be acceptable to put an arbitrary limit on the number of
keys if by doing so you could create a noticably more efficient hash?
For example, would either of 251 or 257 be acceptable limits?

definately

- if you find out later that you need more than 256 keys, must it be
easy to expand the maximum number of keys, or would it be acceptable for
noticable "work" (by the computer) to be involved?


Noticable work would be fine, that's an unlikely case and speed is
paramount.
thanks very much!
Calin

Jan 10 '06 #4


In article <11*********************@g14g2000cwa.googlegroups. com>, th************@gmail.com writes:

[Please preserve attributions for quoted material.]
- do you need a hash function that is *certain* not to produce
the same hash for values you wish to distinguish; or


Yes
- are you just going to fill the next available bucket when you have
a hash collision; or


Yes


This is inconsistent with your first answer. If the hash function is
guaranteed to avoid collisions, then the bucket will always be
available.
- are you truly looking for a hash function, or are you looking
for a digital signature?


More a digital signature i suppose, not a hash function as is normally
used with chained hash table & buckets where some collisions are
acceptable etc.


A hash function is not a digital signature. They are completely
different operations. Many digital signature algorithms are built
on top of a cryptographic hash (aka "message digest"), but not all
are (eg encrypt-and-sign modes), and a digital signature built on
a cryptographic hash is much more than just a hash.

I would submit that if you think you may be looking for any sort
of cryptographic primitive, you should know *exactly* what you
need, and where to find it. Otherwise, either you don't need crypto,
or you're going to use it incorrectly. The history of correct
crypto implemented by non-experts is very thin.
If so, then have you considered
"perfect hash" ?


I noticed that but wasn't sure about it, I'll look into it further
thank you.


Perfect hashing may be a solution for you.

In your first message you mentioned cryptographic hashes such as
MD5 and SHA-1. While they're very unlikely to accidentally produce
a collision, in practice it's now quite feasible to deliberately
produce MD5 collisions, and SHA-1 may not be far behind. And they
are not fast - in fact, they're deliberately slow, to make it more
difficult to mount certain kinds of attacks.

--
Michael Wojcik mi************@microfocus.com

This record comes with a coupon that wins you a trip around the world.
-- Pizzicato Five
Jan 12 '06 #5
On 12 Jan 2006 18:13:31 GMT, mw*****@newsguy.com (Michael Wojcik)
wrote:
<snip>
I would submit that if you think you may be looking for any sort
of cryptographic primitive, you should know *exactly* what you
need, and where to find it. Otherwise, either you don't need crypto,
or you're going to use it incorrectly. The history of correct
crypto implemented by non-experts is very thin.
Concur strongly.
In your first message you mentioned cryptographic hashes such as
MD5 and SHA-1. While they're very unlikely to accidentally produce
a collision, in practice it's now quite feasible to deliberately
produce MD5 collisions, and SHA-1 may not be far behind. And they
are not fast - in fact, they're deliberately slow, to make it more
difficult to mount certain kinds of attacks.


Crypto-hashes are not deliberately slow, and in fact modern
(implementations of) them are nearly as fast as CRCs even though they
(must!) "mix" the input data much "better". Where you want deliberate
slowness, like in PKCS5 (and 12) and IIRC in password "stretching"
protocols like SRP et al, you iterate the hash many times.

Some crypto operations like RSA (which is already pretty slow)
sometimes need to be implemented as constant-speed to prevent
"side-channel" (power and timing) attacks, but no hashes AFAIK.
Although, Bernstein has recently claimed a timing attack on a popular
speedup for the new US standard symmetric cipher AES or Rijndael,
although I am among those skeptical his prerequisites hold widely.

- David.Thompson1 at worldnet.att.net
Jan 23 '06 #6

In article <u1********************************@4ax.com>, Dave Thompson <da*************@worldnet.att.net> writes:
On 12 Jan 2006 18:13:31 GMT, mw*****@newsguy.com (Michael Wojcik)
wrote:
In your first message you mentioned cryptographic hashes such as
MD5 and SHA-1. While they're very unlikely to accidentally produce
a collision, in practice it's now quite feasible to deliberately
produce MD5 collisions, and SHA-1 may not be far behind. And they
are not fast - in fact, they're deliberately slow, to make it more
difficult to mount certain kinds of attacks.
Crypto-hashes are not deliberately slow, and in fact modern
(implementations of) them are nearly as fast as CRCs even though they
(must!) "mix" the input data much "better". Where you want deliberate
slowness, like in PKCS5 (and 12) and IIRC in password "stretching"
protocols like SRP et al, you iterate the hash many times.


Right, right. I was actually thinking of older digest algorithms
like Snefru and MD2 that used iteration as a strengthening factor,
and key expansion, and somehow generalized that into "digests are
deliberately slow to increase the work factor of brute-force image
searches", which clearly doesn't make much sense.
Some crypto operations like RSA (which is already pretty slow)
sometimes need to be implemented as constant-speed to prevent
"side-channel" (power and timing) attacks, but no hashes AFAIK.


I imagine that if a digest used operations with variable power or
time costs, side-channel attacks would be an issue; but since the
data-dependent operations of MD5, SHA-*, and RIPE-MD (which represent
most real-world commercial use) are bitwise operations or integer
addition on fixed-sized quantitites, all general-purpose CPUs will
perform them at constant time and dissipation.

Digest functions built from block ciphers or asymmetric ciphers could
be vulnerable to side-channel attacks if the underlying ciphers are,
assuming the channel was modulated by the data being digested.

But a side-channel attack against a hash probably has a smaller
attack surface than one for encryption does; leaking some information
about the preimage seems to be of limited use for most hashing
applications, for example.

Anyway, this has drifted off-topic for the group, though if the OP
is still reading he may by now have gotten the impression that this
is not a trivial do-it-yourself task.

--
Michael Wojcik mi************@microfocus.com

There are some, who do not understand true enjoyment, will tell you that
rules spoil convivial meetings, and that a merry company becomes a dull
committee as soon as it is called a club. Do not believe them: the
precedents are all against them. -- Arthur Ransome
Jan 23 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
by: Michele Simionato | last post by:
I posted this few weeks ago (remember the C Sharp thread?) but it went unnoticed on the large mass of posts, so let me retry. Here I get Python+ Psyco twice as fast as optimized C, so I would like...
0
by: Dean J Garrett | last post by:
Does anyone know about "fast web view" for PDF files? We have a .NET application that opens PDF files as the user's request. The problem is that some of these are very large, 20MB, and it takes...
8
by: Neil | last post by:
I have a very puzzling situation with a database. It's an Access 2000 mdb with a SQL 7 back end, with forms bound using ODBC linked tables. At our remote location (accessed via a T1 line) the time...
22
by: Marc Mones | last post by:
Hello, I'working with IBM DB2 V8.1 and CLI/ODBC. I've got a problem with the following statement: ******************************************************************************** SELECT...
20
by: GS | last post by:
The stdint.h header definition mentions five integer categories, 1) exact width, eg., int32_t 2) at least as wide as, eg., int_least32_t 3) as fast as possible but at least as wide as, eg.,...
6
by: G.Esmeijer | last post by:
Friends, I would like to read a text file (fixed length formaated) really fast and store the data into an Access database (2003). Using the streamreader and reading line by line, separating the...
10
by: javuchi | last post by:
I just want to share some code with you, and have some comments and improvements if you want. This header file allocates and add and delete items of any kind of data from a very fast array: ...
95
by: hstagni | last post by:
Where can I find a library to created text-based windows applications? Im looking for a library that can make windows and buttons inside console.. Many old apps were make like this, i guess ...
0
by: Vinod Sadanandan | last post by:
Fast-Start Failover An Overview In Dataguard Environment ============================================================================= This article describes the automatic fast start failover...
9
by: Salad | last post by:
I have access, for testing at my client's site, a Win2000 computer running A2003 retail. He recently upgraded all of his other machines to DualCore Pentiums with 2 gig ram and run A2003 runtime. ...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.