473,327 Members | 1,952 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

Which hashing function to use?

Hello,

I need to use a hashing function for relatively short strings (roughly 16
characters or less). The data that needs to be accessed via hash is just a
simple int. I just need to look up values in two dimensional matrices each
time that I encounter a pair of these strings.

I would like to keep the code as simple and standard as
possible, but at the same time to have a reasonable performance.

While looking through the docs on my system I found a few different
implementations:
- hsearch from <search.h>
- tsearch from <search.h>
- bsearch from <stdlib.h>

Which one would you recommend? Or maybe something completly different?
Right now I just use Perl to do the hashing :-) this is nice, but I would
love to have a small, efficient, ANSI C program.

Regards,

January
Nov 16 '07 #1
11 2123
On Nov 16, 3:42 pm, January Weiner <january.wei...@gmail.comwrote:
Hello,

I need to use a hashing function for relatively short strings (roughly 16
characters or less). The data that needs to be accessed via hash is just a
simple int. I just need to look up values in two dimensional matrices each
time that I encounter a pair of these strings.

I would like to keep the code as simple and standard as
possible, but at the same time to have a reasonable performance.

While looking through the docs on my system I found a few different
implementations:
- hsearch from <search.h>
- tsearch from <search.h>
- bsearch from <stdlib.h>

Which one would you recommend? Or maybe something completly different?
Right now I just use Perl to do the hashing :-) this is nice, but I would
love to have a small, efficient, ANSI C program.
Your post is more topical in news:comp.programming {follow-ups set}.

I do not think that your problem has an answer yet. It will depend on
lots of things.
For instance, are the strings a static set that never changes? If so,
then a perfect hash is your choice.
There are string searching algorithms faster than a hash table (See
Sedgewick).

Do you need to preserve the data (e.g in a key/value pair set on
disk)?

"I need a hash, which should I use?" is such an open question that any
simple answer will probably be lacking.
Nov 17 '07 #2
user923005 <dc*****@connx.comwrote:
Your post is more topical in news:comp.programming {follow-ups set}.
Well, it is not. Not 100%. The reason being that I am looking for
something that is part of the C standard and that is simple to use in C.
What do you think happens when I ask a question like that in
comp.programming? Some wise guy will set the Followup-To to comp.lang.c
:-P, that's what would happen. Finally, the reason is that I am not a
computer scientist and need, hm, very practical advice. Practical advice
is something that I always got plenty on comp.lang.c :-) Note that you will
find a discussion on different hashing algorithms in the FAQ of this group
-- however, I do not fully understand it (otherwise I would not be asking
the questions).
I do not think that your problem has an answer yet. It will depend on
lots of things.
For instance, are the strings a static set that never changes? If so,
then a perfect hash is your choice.
There are string searching algorithms faster than a hash table (See
Sedgewick).
OK, as I am not a computer scientist, I will try to look it up somewhere.
Meanwhile, which of that is a part of standard C libraries?
Do you need to preserve the data (e.g in a key/value pair set on
disk)?
Nope, I would go for DBM or SQL if I needed that.
"I need a hash, which should I use?" is such an open question that any
simple answer will probably be lacking.
"I need a hash, which of the ones in the standard C libraries should I use"
is a more specific question, is it not?

Kind regards,

January
Nov 17 '07 #3
January Weiner wrote:
user923005 <dc*****@connx.comwrote:
>Your post is more topical in news:comp.programming {follow-ups set}.
[...]
>"I need a hash, which should I use?" is such an open question that any
simple answer will probably be lacking.

"I need a hash, which of the ones in the standard C libraries should I use"
is a more specific question, is it not?
Yes, it is. And the answer is, there are no hash functions in the
standard C library.

--
Keith Thompson (The_Other_Keith) <ks***@mib.org>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Nov 17 '07 #4
Keith Thompson said:
January Weiner wrote:
>user923005 <dc*****@connx.comwrote:
>>Your post is more topical in news:comp.programming {follow-ups set}.
[...]
>>"I need a hash, which should I use?" is such an open question that any
simple answer will probably be lacking.

"I need a hash, which of the ones in the standard C libraries should I
use" is a more specific question, is it not?

Yes, it is. And the answer is, there are no hash functions in the
standard C library.
Cases could be made for some of the math functions, and perhaps rand. Not
particularly good cases, admittedly, but theoretically...

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Nov 17 '07 #5
"January Weiner" <ja************@gmail.comwrote in message
Hello,

I need to use a hashing function for relatively short strings (roughly 16
characters or less). The data that needs to be accessed via hash is just
a
simple int. I just need to look up values in two dimensional matrices each
time that I encounter a pair of these strings.

I would like to keep the code as simple and standard as
possible, but at the same time to have a reasonable performance.

While looking through the docs on my system I found a few different
implementations:
- hsearch from <search.h>
- tsearch from <search.h>
- bsearch from <stdlib.h>

Which one would you recommend? Or maybe something completly different?
Right now I just use Perl to do the hashing :-) this is nice, but I would
love to have a small, efficient, ANSI C program.
There's a general-purpose hash function on my website - under Basic
Algorithms, sample chapter on hash tables.
Whilst it is not specially designed for strings, it's pretty good. Use it
for now, and replace it only if performance is unacceptable.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Nov 17 '07 #6
January Weiner wrote:
>
I need to use a hashing function for relatively short strings
(roughly 16 characters or less). The data that needs to be
accessed via hash is just a simple int. I just need to look up
values in two dimensional matrices each time that I encounter a
pair of these strings.

I would like to keep the code as simple and standard as
possible, but at the same time to have a reasonable performance.

While looking through the docs on my system I found a few
different implementations:
- hsearch from <search.h>
- tsearch from <search.h>
- bsearch from <stdlib.h>

Which one would you recommend? Or maybe something completly
different? Right now I just use Perl to do the hashing :-) this
is nice, but I would love to have a small, efficient, ANSI C
program.
There are a couple of very simple, but efficient, string hashing
routines available (in source form) in hashlib.zip. Although
hashlib is copyright and released under GPL, those functions are
pretty common knowledge and there will be no objections to
including them in anything else. You can find hashlib.zip at:

<http://cbfalconer.home.att.net/download/>

--
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>
Try the download section.

--
Posted via a free Usenet account from http://www.teranews.com

Nov 17 '07 #7
CBFalconer <cb********@yahoo.comwrote:
There are a couple of very simple, but efficient, string hashing
routines available (in source form) in hashlib.zip. Although
hashlib is copyright and released under GPL, those functions are
pretty common knowledge and there will be no objections to
including them in anything else. You can find hashlib.zip at:
Thank you very much, that is precisely the answer that I needed.

January

Nov 18 '07 #8
Malcolm McLean <re*******@btinternet.comwrote:
There's a general-purpose hash function on my website - under Basic
Algorithms, sample chapter on hash tables.
Whilst it is not specially designed for strings, it's pretty good. Use it
for now, and replace it only if performance is unacceptable.
Thanks!

January

Nov 18 '07 #9
Keith Thompson <ks...@mib.orgwrote:
January Weiner wrote:
"I need a hash, which of the ones in the standard C
libraries should I use" is a more specific question,
is it not?

Yes, it is. And the answer is, there are no hash functions
in the standard C library.
There are at least two that fit the raw definition: strlen()
and strtoul(..., ..., 36). Not necessarily useful, but forms
of hashing nonetheless.

--
Peter
Nov 18 '07 #10
On Nov 17, 12:09 am, January Weiner <january.wei...@gmail.comwrote:
user923005 <dcor...@connx.comwrote:
Your post is more topical in news:comp.programming {follow-ups set}.

Well, it is not. Not 100%. The reason being that I am looking for
something that is part of the C standard and that is simple to use in C.
The C standard does not contain a hash function. Quite a pity, too.
What hash function you should use will be a function of the
requirements.

The SDBM and Bernstein are good, simple, fast hashes used to operate
on strings when you need something like a join or lookup and a few
rare collisions are OK.
If you need a much lower chance of collision, then a 64 bit hash like
UMAC is probably what is desired.
If you need 0% chance of collisions, then generate a perfect hash
(obviously, this only works if the inputs are distinct).
What do you think happens when I ask a question like that in
comp.programming? Some wise guy will set the Followup-To to comp.lang.c
:-P, that's what would happen.
Well, he would be a wise guy. True, the group news:comp.programming
is not specifically for algorithm discussion, but it's the closest
thing we've got.
Finally, the reason is that I am not a
computer scientist and need, hm, very practical advice. Practical advice
is something that I always got plenty on comp.lang.c :-) Note that you will
find a discussion on different hashing algorithms in the FAQ of this group
-- however, I do not fully understand it (otherwise I would not be asking
the questions).
The practical advice you get here will be of the same quality as
news:comp.lang.c. Indeed, many of the c.l.c regulars read and post in
both places.
I do not think that your problem has an answer yet. It will depend on
lots of things.
For instance, are the strings a static set that never changes? If so,
then a perfect hash is your choice.
There are string searching algorithms faster than a hash table (See
Sedgewick).

OK, as I am not a computer scientist, I will try to look it up somewhere.
Meanwhile, which of that is a part of standard C libraries?
None of them.
Do you need to preserve the data (e.g in a key/value pair set on
disk)?

Nope, I would go for DBM or SQL if I needed that.
"I need a hash, which should I use?" is such an open question that any
simple answer will probably be lacking.

"I need a hash, which of the ones in the standard C libraries should I use"
is a more specific question, is it not?
Yes. Unfortunately, the answer is not the one you would like to hear.
I guess that this site might be helpful:
http://www.partow.net/programming/hashfunctions/

I guess further that if you state your requirements precisely you will
get better answers.
Nov 19 '07 #11
On Nov 16, 3:42 pm, January Weiner <january.wei...@gmail.comwrote:
Hello,

I need to use a hashing function for relatively short strings (roughly
16 characters or less). The data that needs to be accessed via hash is
just a simple int. I just need to look up values in two dimensional
matrices each time that I encounter a pair of these strings.
There is a google group dedicated to this sort of thing:

http://groups.google.com/group/hash-functions

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
Nov 20 '07 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Pat | last post by:
I want to look for some one-to-one hashing function. In C++, any one-to-one hashing function?
2
by: Matt Bull | last post by:
Hi, I apologise in advance if this is off topic, but would appreciate any pointers you chaps might be able to provide. I'm relatively novice in the art of C so am after any suggestions about...
11
by: Wm. Scott Miller | last post by:
Hello all! We are building applications here and have hashing algorithms to secure secrets (e.g passwords) by producing one way hashes. Now, I've read alot and I've followed most of the advice...
10
by: Dino M. Buljubasic | last post by:
Hi, I am using MD5 to hash my passwords and add them to database as hashed. I have noticed though that some passwords don't get recognized and I suppose that it happen because hashing might...
4
by: wkatz | last post by:
Hi, Gurus. What hashing algorithm outputs hash value as numbers only? For example, if you pass a “John Q. Public” it will output 23324. If there is no such hashing, how hard is it to hire somebody to...
1
by: snipes2002 | last post by:
hi all i want to make a java code to build a 16 element array and do the following things ( insert , search , delete ) using a hashing function (j mod (arraysize)) and if the index is occupied so...
0
by: s.phonologies | last post by:
Hi friends, I am Sumit problem in C Language I have created a program to store int data by the HASH() my hash function for generate hashkey: int HASH(data) { return data%10;
15
by: Vinodh | last post by:
I am reading about hashing techniques. The map data structure available in C++ STL uses hashing techniques?
1
by: May Amor | last post by:
Helu gurus!!! I have a code below about hashing method with collision resolution...My problem is how to use the collsion resolution again if the hash index though has already a value. Please kinda...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.