473,785 Members | 2,480 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Fast 1 -> 1 non md5 hash algorithm

Hi,

I'm looking for a fast and simple one to one hash function, suitable
for longer strings (up to 2048 in length). I'd like keys to be
relatively short, I doubt I'd be creating more than 256 keys..

I could use md5 but even that is more than I need, and I assume not the
fastest algorithm (?), so I'm reluctant to use it.

I've been looking around a lot, not finding much anything come to mind?

thanks!
Calin

Jan 9 '06 #1
6 3411


th************@ gmail.com wrote On 01/09/06 16:17,:
Hi,

I'm looking for a fast and simple one to one hash function, suitable
for longer strings (up to 2048 in length). I'd like keys to be
relatively short, I doubt I'd be creating more than 256 keys..

I could use md5 but even that is more than I need, and I assume not the
fastest algorithm (?), so I'm reluctant to use it.

I've been looking around a lot, not finding much anything come to mind?


The first thing that comes to mind is that you've found
the wrong newsgroup. comp.lang.c is about the C programming
language, not about algorithms.

The second thing that comes to mind is that a one-to-one
hash function can be extremely simple and extremely fast:
just use the hashed object as its own hash code. (Note that
a code with either fewer or more bits than the original cannot
be one-to-one.)

The third thing that comes to mind is that since the
second thing didn't come to your mind, you may not have a
clear idea of what you're trying to do.

--
Er*********@sun .com

Jan 9 '06 #2
In article <11************ *********@g49g2 000cwa.googlegr oups.com>,
<th************ @gmail.com> wrote:
I'm looking for a fast and simple one to one hash function, suitable
for longer strings (up to 2048 in length). I'd like keys to be
relatively short, I doubt I'd be creating more than 256 keys.. I could use md5 but even that is more than I need, and I assume not the
fastest algorithm (?), so I'm reluctant to use it.


There are lot of different hash functions. In order to be able to
recommend one, we would have to know:

- do you need a hash function that is *certain* not to produce
the same hash for values you wish to distinguish; or

- are you using a "bucket" hashing system; or

- are you just going to fill the next available bucket when you have
a hash collision; or

- are you going to use a series of rehash functions to attempt to
resolve hash collisions;

- are you truly looking for a hash function, or are you looking
for a digital signature?

- is this a case in which you can take your time hashing during the
data analysis phase, but then expect to retrieve the data over and
over again, so theoretical read performance is much more important than
time to insert into hash structures? If so, then have you considered
"perfect hash" ?

- would it be acceptable to put an arbitrary limit on the number of
keys if by doing so you could create a noticably more efficient hash?
For example, would either of 251 or 257 be acceptable limits?

- if you find out later that you need more than 256 keys, must it be
easy to expand the maximum number of keys, or would it be acceptable for
noticable "work" (by the computer) to be involved? Though in this
particular case it sounds like you won't be needing more than 512 Kb for
your hash, in the general case someone might be hashing over a million
entries each of which might be a couple of hundred megabytes, and
so for them, it could be quite important that as the number of keys
expanded that the previously processed objects continue to hash into
exactly the same disk storage location that they did before.
--
"No one has the right to destroy another person's belief by
demanding empirical evidence." -- Ann Landers
Jan 9 '06 #3
Thanks for your reply.
- do you need a hash function that is *certain* not to produce
the same hash for values you wish to distinguish; or
Yes
- are you using a "bucket" hashing system; or
No
- are you just going to fill the next available bucket when you have
a hash collision; or
Yes
- are you going to use a series of rehash functions to attempt to
resolve hash collisions;
No
- are you truly looking for a hash function, or are you looking
for a digital signature?
More a digital signature i suppose, not a hash function as is normally
used with chained hash table & buckets where some collisions are
acceptable etc.
- is this a case in which you can take your time hashing during the
data analysis phase, but then expect to retrieve the data over and
over again, so theoretical read performance is much more important than
time to insert into hash structures?
Yes...but time to insert can't be "slow" ...anything that might take
more than 1 sec would be too slow..though obviously that'd vary widely
on the machine

If so, then have you considered "perfect hash" ?
I noticed that but wasn't sure about it, I'll look into it further
thank you.

- would it be acceptable to put an arbitrary limit on the number of
keys if by doing so you could create a noticably more efficient hash?
For example, would either of 251 or 257 be acceptable limits?

definately

- if you find out later that you need more than 256 keys, must it be
easy to expand the maximum number of keys, or would it be acceptable for
noticable "work" (by the computer) to be involved?


Noticable work would be fine, that's an unlikely case and speed is
paramount.
thanks very much!
Calin

Jan 10 '06 #4


In article <11************ *********@g14g2 000cwa.googlegr oups.com>, th************@ gmail.com writes:

[Please preserve attributions for quoted material.]
- do you need a hash function that is *certain* not to produce
the same hash for values you wish to distinguish; or


Yes
- are you just going to fill the next available bucket when you have
a hash collision; or


Yes


This is inconsistent with your first answer. If the hash function is
guaranteed to avoid collisions, then the bucket will always be
available.
- are you truly looking for a hash function, or are you looking
for a digital signature?


More a digital signature i suppose, not a hash function as is normally
used with chained hash table & buckets where some collisions are
acceptable etc.


A hash function is not a digital signature. They are completely
different operations. Many digital signature algorithms are built
on top of a cryptographic hash (aka "message digest"), but not all
are (eg encrypt-and-sign modes), and a digital signature built on
a cryptographic hash is much more than just a hash.

I would submit that if you think you may be looking for any sort
of cryptographic primitive, you should know *exactly* what you
need, and where to find it. Otherwise, either you don't need crypto,
or you're going to use it incorrectly. The history of correct
crypto implemented by non-experts is very thin.
If so, then have you considered
"perfect hash" ?


I noticed that but wasn't sure about it, I'll look into it further
thank you.


Perfect hashing may be a solution for you.

In your first message you mentioned cryptographic hashes such as
MD5 and SHA-1. While they're very unlikely to accidentally produce
a collision, in practice it's now quite feasible to deliberately
produce MD5 collisions, and SHA-1 may not be far behind. And they
are not fast - in fact, they're deliberately slow, to make it more
difficult to mount certain kinds of attacks.

--
Michael Wojcik mi************@ microfocus.com

This record comes with a coupon that wins you a trip around the world.
-- Pizzicato Five
Jan 12 '06 #5
On 12 Jan 2006 18:13:31 GMT, mw*****@newsguy .com (Michael Wojcik)
wrote:
<snip>
I would submit that if you think you may be looking for any sort
of cryptographic primitive, you should know *exactly* what you
need, and where to find it. Otherwise, either you don't need crypto,
or you're going to use it incorrectly. The history of correct
crypto implemented by non-experts is very thin.
Concur strongly.
In your first message you mentioned cryptographic hashes such as
MD5 and SHA-1. While they're very unlikely to accidentally produce
a collision, in practice it's now quite feasible to deliberately
produce MD5 collisions, and SHA-1 may not be far behind. And they
are not fast - in fact, they're deliberately slow, to make it more
difficult to mount certain kinds of attacks.


Crypto-hashes are not deliberately slow, and in fact modern
(implementation s of) them are nearly as fast as CRCs even though they
(must!) "mix" the input data much "better". Where you want deliberate
slowness, like in PKCS5 (and 12) and IIRC in password "stretching "
protocols like SRP et al, you iterate the hash many times.

Some crypto operations like RSA (which is already pretty slow)
sometimes need to be implemented as constant-speed to prevent
"side-channel" (power and timing) attacks, but no hashes AFAIK.
Although, Bernstein has recently claimed a timing attack on a popular
speedup for the new US standard symmetric cipher AES or Rijndael,
although I am among those skeptical his prerequisites hold widely.

- David.Thompson1 at worldnet.att.ne t
Jan 23 '06 #6

In article <u1************ *************** *****@4ax.com>, Dave Thompson <da************ *@worldnet.att. net> writes:
On 12 Jan 2006 18:13:31 GMT, mw*****@newsguy .com (Michael Wojcik)
wrote:
In your first message you mentioned cryptographic hashes such as
MD5 and SHA-1. While they're very unlikely to accidentally produce
a collision, in practice it's now quite feasible to deliberately
produce MD5 collisions, and SHA-1 may not be far behind. And they
are not fast - in fact, they're deliberately slow, to make it more
difficult to mount certain kinds of attacks.
Crypto-hashes are not deliberately slow, and in fact modern
(implementation s of) them are nearly as fast as CRCs even though they
(must!) "mix" the input data much "better". Where you want deliberate
slowness, like in PKCS5 (and 12) and IIRC in password "stretching "
protocols like SRP et al, you iterate the hash many times.


Right, right. I was actually thinking of older digest algorithms
like Snefru and MD2 that used iteration as a strengthening factor,
and key expansion, and somehow generalized that into "digests are
deliberately slow to increase the work factor of brute-force image
searches", which clearly doesn't make much sense.
Some crypto operations like RSA (which is already pretty slow)
sometimes need to be implemented as constant-speed to prevent
"side-channel" (power and timing) attacks, but no hashes AFAIK.


I imagine that if a digest used operations with variable power or
time costs, side-channel attacks would be an issue; but since the
data-dependent operations of MD5, SHA-*, and RIPE-MD (which represent
most real-world commercial use) are bitwise operations or integer
addition on fixed-sized quantitites, all general-purpose CPUs will
perform them at constant time and dissipation.

Digest functions built from block ciphers or asymmetric ciphers could
be vulnerable to side-channel attacks if the underlying ciphers are,
assuming the channel was modulated by the data being digested.

But a side-channel attack against a hash probably has a smaller
attack surface than one for encryption does; leaking some information
about the preimage seems to be of limited use for most hashing
applications, for example.

Anyway, this has drifted off-topic for the group, though if the OP
is still reading he may by now have gotten the impression that this
is not a trivial do-it-yourself task.

--
Michael Wojcik mi************@ microfocus.com

There are some, who do not understand true enjoyment, will tell you that
rules spoil convivial meetings, and that a merry company becomes a dull
committee as soon as it is called a club. Do not believe them: the
precedents are all against them. -- Arthur Ransome
Jan 23 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
4881
by: Michele Simionato | last post by:
I posted this few weeks ago (remember the C Sharp thread?) but it went unnoticed on the large mass of posts, so let me retry. Here I get Python+ Psyco twice as fast as optimized C, so I would like to now if something is wrong on my old laptop and if anybody can reproduce my results. Here are I my numbers for calling the error function a million times (Python 2.3, Psyco 1.0, Red Hat Linux 7.3, Pentium II 366 MHz): $ time p23 erf.py real ...
0
1602
by: Dean J Garrett | last post by:
Does anyone know about "fast web view" for PDF files? We have a .NET application that opens PDF files as the user's request. The problem is that some of these are very large, 20MB, and it takes forever to download the whole doc before anything is displayed. I was told that the PDF files were created with "fast web view" and that the first page can be immediately displayed before the entire document is downloaded. Has anyone done this? How...
8
2892
by: Neil | last post by:
I have a very puzzling situation with a database. It's an Access 2000 mdb with a SQL 7 back end, with forms bound using ODBC linked tables. At our remote location (accessed via a T1 line) the time it took to go to a record was very slow. The go to mechanism was a box that the user typed the index value into a combo box, with very simple code attached: with me.RecordsetClone .FindFirst " = " & me.cboGoTo If Not .NoMatch Then Me.Bookmark...
22
3315
by: Marc Mones | last post by:
Hello, I'working with IBM DB2 V8.1 and CLI/ODBC. I've got a problem with the following statement: ******************************************************************************** SELECT S_ART, S_SPRACHE, S_MANDANT, S_NR, S_SUB, S_OWNER, S_SATZ FROM SY0001_00005 WHERE S_ART = ? AND S_SPRACHE = ? AND S_MANDANT = ? AND S_NR = ? AND
20
9176
by: GS | last post by:
The stdint.h header definition mentions five integer categories, 1) exact width, eg., int32_t 2) at least as wide as, eg., int_least32_t 3) as fast as possible but at least as wide as, eg., int_fast32_t 4) integer capable of holding a pointer, intptr_t 5) widest integer in the implementation, intmax_t Is there a valid motivation for having both int_least and int_fast?
6
23775
by: G.Esmeijer | last post by:
Friends, I would like to read a text file (fixed length formaated) really fast and store the data into an Access database (2003). Using the streamreader and reading line by line, separating the line into string just takes to long. When I Import the file with access manually goes fast. But how to use this fromout a C# programme who has done this before and who can give met some answers
10
3015
by: javuchi | last post by:
I just want to share some code with you, and have some comments and improvements if you want. This header file allocates and add and delete items of any kind of data from a very fast array: #include <stdlib.h> #ifndef __LIST_H__ #define __LIST_H__
95
5428
by: hstagni | last post by:
Where can I find a library to created text-based windows applications? Im looking for a library that can make windows and buttons inside console.. Many old apps were make like this, i guess ____________________________________ | | | ------------------ | | | BUTTON | | | ...
0
8012
by: Vinod Sadanandan | last post by:
Fast-Start Failover An Overview In Dataguard Environment ============================================================================= This article describes the automatic fast start failover configuration and the conditions for trigerring a fast start failover in dataguard environment . In Faststart failover dataguard configuration if the primary database becomes unavailable, the...
9
2146
by: Salad | last post by:
I have access, for testing at my client's site, a Win2000 computer running A2003 retail. He recently upgraded all of his other machines to DualCore Pentiums with 2 gig ram and run A2003 runtime. I believe all current SPs for Windows and Office are installed on the fast machines. I have 1 process/subroutine that has worked for a couple of years without a problem. It works fine on the testing (slow) machine. The process checks a folder...
0
9480
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10325
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10148
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9950
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8972
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7499
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6740
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
2
3646
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2879
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.