STL map kinda functionality in C

comp_novice

Hi,

I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Thanks in advance
DKRS.

Jul 10 '06 #1

Subscribe Reply

1538

Richard Heathfield

comp_novice said:

Hi,

I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. It is trivial to write a bst that can handle any
data type. If you wish to separate the key from the rest of the value,
that's up to you, but C doesn't really mind either way.

For information on binary search trees in C, check out http://adtinfo.org
which documents Ben Pfaff's GNU libavl library. IIRC Ben normally builds
his trees out of ints rather than the void *s you'll be needing, but the
techniques are basically the same.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jul 10 '06 #2

Ben Pfaff

Richard Heathfield <in*****@invalid.invalidwrites:

For information on binary search trees in C, check out http://adtinfo.org
which documents Ben Pfaff's GNU libavl library. IIRC Ben normally builds
his trees out of ints rather than the void *s you'll be needing, but the
techniques are basically the same.

I think that the C Unleashed book implementation might have used
ints, but GNU libavl uses void *s.
--
"You call this a *C* question? What the hell are you smoking?" --Kaz

Jul 10 '06 #3

Richard Heathfield

Ben Pfaff said:

Richard Heathfield <in*****@invalid.invalidwrites:

>For information on binary search trees in C, check out http://adtinfo.org
which documents Ben Pfaff's GNU libavl library. IIRC Ben normally builds
his trees out of ints rather than the void *s you'll be needing, but the
techniques are basically the same.

I think that the C Unleashed book implementation might have used
ints,

Indeed.

but GNU libavl uses void *s.

Oh, that's good - I wish I'd known; I could have saved myself some work. :-)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jul 10 '06 #4

websnarf

Richard Heathfield wrote:

comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys? Why
don't we go for a hash table instead? Or at least ask if the keys are
lexical or not.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Jul 10 '06 #5

Marc Boyer

Le 10-07-2006, comp_novice <di*******@yahoo.coma écrit*:

I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

You could have a look at BPL, it offers a STL-like map
http://www.enseeiht.fr/~boyer/Tools.html

Any feedback welcome,
Marc Boyer

Jul 10 '06 #6

Richard Heathfield

we******@gmail.com said:

Richard Heathfield wrote:
>comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrugAny key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the worst
case you can compare them using memcmp.

Why
don't we go for a hash table instead?

I like trees.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jul 10 '06 #7

websnarf

Richard Heathfield wrote:

we******@gmail.com said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]
Huh? Doesn't this assume you have a lexical ordering on keys?

<shrugAny key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the worst
case you can compare them using memcmp.

Except if the key is a pointer to the *real* contents of the key. I.e.
"Hello" and strdup("Hello") might compare to be equal even though their
pointers do not compare as equal. (Of course string are lexical, but
obviously this idea extends to more abstract keys.)

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Jul 10 '06 #8

Richard Heathfield

we******@gmail.com said:

Richard Heathfield wrote:
>we******@gmail.com said:

<snip>

>>
<shrugAny key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the worst
case you can compare them using memcmp.

Except if the key is a pointer to the *real* contents of the key.

<shrugThe key that is a pointer to the real key is not itself the real key
(unless it points to itself, which isn't terribly useful).

Comparison functions deal with this in a trivially simple manner.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jul 10 '06 #9

Dann Corbit

"comp_novice" <di*******@yahoo.comwrote in message
news:11**********************@b28g2000cwb.googlegr oups.com...

Hi,

I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Hash table.
http://en.wikipedia.org/wiki/Hash_table

Sourceforge will have lots of implementations.

Thanks in advance
DKRS.

Jul 10 '06 #10

Dann Corbit

"Richard Heathfield" <in*****@invalid.invalidwrote in message
news:a8********************@bt.com...

we******@gmail.com said:

>Richard Heathfield wrote:
>>comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrugAny key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the worst
case you can compare them using memcmp.

>Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.

The only reason not to use a hash table is if you need to do range searches.
If all searches are equality searches, then the right answer is a hash
table.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jul 10 '06 #11

Ben Pfaff

"Dann Corbit" <dc*****@connx.comwrites:

A hash table is much faster than a tree for all operations.

The only reason not to use a hash table is if you need to do range searches.
If all searches are equality searches, then the right answer is a hash
table.

I wish someone had pointed this out to me before I spent so much
time on carefully implementing binary trees. I probably would
never have done so if I'd realized that, actually, for all the
problems I was interested in solving at the time, hash tables
were a much better solution. (However, I was only 14 or so at
the time.)

These days, I'm better at thinking before I code.
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa6 7f6aaa,0xaa9aa9f6,0x11f6},*p
=b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
2:{i++;if(i)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}

Jul 10 '06 #12

Dann Corbit

"Ben Pfaff" <bl*@cs.stanford.eduwrote in message
news:87************@benpfaff.org...

"Dann Corbit" <dc*****@connx.comwrites:

>A hash table is much faster than a tree for all operations.

The only reason not to use a hash table is if you need to do range
searches.
If all searches are equality searches, then the right answer is a hash
table.

I wish someone had pointed this out to me before I spent so much
time on carefully implementing binary trees. I probably would
never have done so if I'd realized that, actually, for all the
problems I was interested in solving at the time, hash tables
were a much better solution. (However, I was only 14 or so at
the time.)

That would have been a terrible tragedy. Your AVL tree project is the
finest discussion on the topic that I have ever seen. And sometimes, we
want to search for ranges of things instead of exact matches. In such a
case, a tree is the right answer and a hash table is the wrong answer.

These days, I'm better at thinking before I code.
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long
b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa6 7f6aaa,0xaa9aa9f6,0x11f6},*p
=b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return
0;for(p--;i--;i--)case+
2:{i++;if(i)break;else default:continue;if(0)case
1:putchar(a[i&15]);break;}}}

Jul 10 '06 #13

Eric Sosman

Dann Corbit wrote On 07/10/06 15:12,:

>
A hash table is much faster than a tree for all operations.

Usually, yes -- but as Pitti-Sing says, "Bless you,
it all depends!"

The only reason not to use a hash table is if you need to do range searches.
If all searches are equality searches, then the right answer is a hash
table.

I'd extend this just a tiny bit: If range *operations*
are important, ordered data structures (trees, skip lists,
what-have-you) might be the answer. Even if all the searches
are for exact equality, it is sometimes important to be able
to access the "neighbors" of a located item, or to move
"forward" and/or "backward" from it.

But if all that's needed is a plain vanilla key-and-value
mapping, I'd agree with Dann: the hash table is the weapon of
choice.

--
Er*********@sun.com

Jul 10 '06 #14

Morris Dovey

Eric Sosman (in 1152562315.492306@news1nwk) said:

| Dann Corbit wrote On 07/10/06 15:12,:
||
|| A hash table is much faster than a tree for all operations.
|
| Usually, yes -- but as Pitti-Sing says, "Bless you,
| it all depends!"

I'm inclined to agree with Eric on this one - that it all depends on
the context. For relatively small key sizes (e.g. stock symbols), I've
been able to achieve my best lookup speeds using trie indexes.

--
Morris Dovey
DeSoto Solar
DeSoto, Iowa USA
http://www.iedu.com/DeSoto

Jul 10 '06 #15

websnarf

Dann Corbit wrote:

"Richard Heathfield" <in*****@invalid.invalidwrote in message
news:a8********************@bt.com...
we******@gmail.com said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?
<shrugAny key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the worst
case you can compare them using memcmp.

Why
don't we go for a hash table instead?
I like trees.

A hash table is much faster than a tree for all operations.

This is not true. A hash table is faster for *most* operations, under
the most typical real world scenarios. A hash table assumes you have a
lot of memory, and basically leverages the large addressing
capabilities of general hardware in lieu of log(n) binary choices to
find individual entries.

But with a hash table you still have to compute a hash function
whatever it is you are looking for, before you search for, insert or
delete. In ideal cases of a binary tree with large string keys, it may
be faster just to burn the cost of O(log(tree-size)) comparisons to
find out that your entry is not in the tree, rather than compute the
hash of it in O(key-length) time.

Furthermore, if your hash table has been improperly sized, then you
will pay unusually high reprobe penalities, or may thrash your memory
when you don't need to. A hash table usually never has 100% density
efficiency -- on average its half filled. If you make an
auto-resizable hash table, and assume that the number of entries is
generally monotonically increasing, then it means that periodically,
operations will cost O(tree-size) which can be extremely expensive, and
will not fly in real-time systems.

That all being said -- *usually* hash tables are faster. It assumes
you can compute the hash function very quickly, and that your table is
not encumbered by extra overhead for each operation. James Dow Allen
has written a hash table that satisfies these contraints (though I
personally dislike his programming style.)

The only reason not to use a hash table is if you need to do range searches.

Well, another case, not described above, is if you are implementing
something like a file system.

If all searches are equality searches, then the right answer is a hash
table.

A hash table is more general in the case where you *have* to use
equality searches. Otherwise you just have to understand your case to
know which is the best strategy.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Jul 10 '06 #16

Dann Corbit

<we******@gmail.comwrote in message
news:11**********************@s13g2000cwa.googlegr oups.com...

Dann Corbit wrote:
>"Richard Heathfield" <in*****@invalid.invalidwrote in message
news:a8********************@bt.com...
we******@gmail.com said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrugAny key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the
worst
case you can compare them using memcmp.

Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.

This is not true. A hash table is faster for *most* operations, under
the most typical real world scenarios. A hash table assumes you have a
lot of memory, and basically leverages the large addressing
capabilities of general hardware in lieu of log(n) binary choices to
find individual entries.

A tree structure requires more memory than a hash table.

But with a hash table you still have to compute a hash function
whatever it is you are looking for, before you search for, insert or
delete. In ideal cases of a binary tree with large string keys, it may
be faster just to burn the cost of O(log(tree-size)) comparisons to
find out that your entry is not in the tree, rather than compute the
hash of it in O(key-length) time.

I guess that the hash computation will be as fast as comparison. Since it
happens N times instead of N*log(N) times, the hash computation will be
faster:
http://www.cse.yorku.ca/~oz/hash.html

Furthermore, if your hash table has been improperly sized, then you
will pay unusually high reprobe penalities, or may thrash your memory
when you don't need to. A hash table usually never has 100% density
efficiency -- on average its half filled. If you make an
auto-resizable hash table, and assume that the number of entries is
generally monotonically increasing, then it means that periodically,
operations will cost O(tree-size) which can be extremely expensive, and
will not fly in real-time systems.

It's pretty unusual to have no idea how large your data set is.

That all being said -- *usually* hash tables are faster. It assumes
you can compute the hash function very quickly, and that your table is
not encumbered by extra overhead for each operation. James Dow Allen
has written a hash table that satisfies these contraints (though I
personally dislike his programming style.)

There are heaps and loads of hash functions with liberal licenses for use.

>The only reason not to use a hash table is if you need to do range
searches.

Well, another case, not described above, is if you are implementing
something like a file system.

Here, you may be onto something. I have seen real-world studies where disk
storage is required where b+trees are faster than hash tables.

>If all searches are equality searches, then the right answer is a hash
table.

A hash table is more general in the case where you *have* to use
equality searches. Otherwise you just have to understand your case to
know which is the best strategy.

If you don't have equality searches, then hash tables are totally useless.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Jul 10 '06 #17

Richard Heathfield

Dann Corbit said:

"Richard Heathfield" <in*****@invalid.invalidwrote in message
news:a8********************@bt.com...
>we******@gmail.com said:

<snip>

>>
>>Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.

Consider writing out the keys and values of all nodes, in key order (i.e.
print a dictionary, phone list, or whatever). Can you really do that faster
with a hash table than with a tree? I'm surprised.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jul 10 '06 #18

Dann Corbit

"Richard Heathfield" <in*****@invalid.invalidwrote in message
news:Zo********************@bt.com...

Dann Corbit said:

>"Richard Heathfield" <in*****@invalid.invalidwrote in message
news:a8********************@bt.com...
>>we******@gmail.com said:

<snip>

>>>
Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.

Consider writing out the keys and values of all nodes, in key order (i.e.
print a dictionary, phone list, or whatever). Can you really do that
faster
with a hash table than with a tree? I'm surprised.

No. Hash tables are not useful for ordered lists.
By all operations I meant "insert, update, delete, find." which are all
O(1).
Elsethread, I pointed out that hash tables are not useful for range
comparisons.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jul 10 '06 #19

Richard Heathfield

Dann Corbit said:

<snip>

By all operations I meant "insert, update, delete, find." which are all
O(1).

Surely that depends on the collision rate. If the hash table has a great
many buckets and a superb hash function, then obviously that improves
matters, but collisions will still need to be resolved, and that can't be
done in constant time.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jul 10 '06 #20

Dann Corbit

"Richard Heathfield" <in*****@invalid.invalidwrote in message
news:14******************************@bt.com...

Dann Corbit said:

<snip>

>By all operations I meant "insert, update, delete, find." which are all
O(1).

Surely that depends on the collision rate. If the hash table has a great
many buckets and a superb hash function, then obviously that improves
matters, but collisions will still need to be resolved, and that can't be
done in constant time.

http://citeseer.ist.psu.edu/fotakis03space.html

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jul 10 '06 #21

Tak-Shing Chan

On Mon, 10 Jul 2006, Richard Heathfield wrote:

Dann Corbit said:

<snip>

>By all operations I meant "insert, update, delete, find." which are all
O(1).

Surely that depends on the collision rate. If the hash table has a great
many buckets and a superb hash function, then obviously that improves
matters, but collisions will still need to be resolved, and that can't be
done in constant time.

I think Dann has perfect hashing in mind (though it is
unclear whether this is suitable for the OP or not).

Tak-Shing

Jul 10 '06 #22

Dann Corbit

"Tak-Shing Chan" <t.****@gold.ac.ukwrote in message
news:Pi******************************@scorpio.gold .ac.uk...

On Mon, 10 Jul 2006, Richard Heathfield wrote:

>Dann Corbit said:

<snip>

>>By all operations I meant "insert, update, delete, find." which are all
O(1).

Surely that depends on the collision rate. If the hash table has a great
many buckets and a superb hash function, then obviously that improves
matters, but collisions will still need to be resolved, and that can't be
done in constant time.

I think Dann has perfect hashing in mind (though it is
unclear whether this is suitable for the OP or not).

The OP said:
"I would like to store a key-value pair and then retrieve the value based on
a given key value."

If that is what they really meant, then a hash table is the answer to the
question.

If they need additional functionality, like range searches, then a tree
structure of some sort is probably superior.

You can get O(n) performance in the general case. See my other post on the
topic.

At any rate, for the OP's stated requirement, a hash table is better than a
tree.

Probably, news:comp.programming is the right place for this thread, since
(other than the fact that the target wanted is C) it is not about the C
language, but about implementation of an algorithm.

Tak-Shing

Jul 10 '06 #23

Richard Heathfield

Dann Corbit said:

"Richard Heathfield" <in*****@invalid.invalidwrote in message
news:14******************************@bt.com...
>Dann Corbit said:

<snip>

>>By all operations I meant "insert, update, delete, find." which are all
O(1).

Surely that depends on the collision rate. If the hash table has a great
many buckets and a superb hash function, then obviously that improves
matters, but collisions will still need to be resolved, and that can't be
done in constant time.

http://citeseer.ist.psu.edu/fotakis03space.html

"The server encountered an internal error and was unable to complete your
request. Error message:
The server encountered an internal error and was unable to complete your
request. Either the server is overloaded or there was an error in a CGI
script."

[sic]

So - that was an interesting counter-argument. :-)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jul 11 '06 #24

Dann Corbit

"Richard Heathfield" <in*****@invalid.invalidwrote in message
news:bo******************************@bt.com...

Dann Corbit said:

>"Richard Heathfield" <in*****@invalid.invalidwrote in message
news:14******************************@bt.com...
>>Dann Corbit said:

<snip>

By all operations I meant "insert, update, delete, find." which are all
O(1).

Surely that depends on the collision rate. If the hash table has a great
many buckets and a superb hash function, then obviously that improves
matters, but collisions will still need to be resolved, and that can't
be
done in constant time.

http://citeseer.ist.psu.edu/fotakis03space.html

"The server encountered an internal error and was unable to complete your
request. Error message:
The server encountered an internal error and was unable to complete your
request. Either the server is overloaded or there was an error in a CGI
script."

[sic]

So - that was an interesting counter-argument. :-)

Article:

Space Efficient Hash Tables With Worst Case Constant Access Time (2003)

Dimitris Fotakis, Rasmus Pagh, Peter Sanders, Paul Spirakis

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Jul 11 '06 #25

websnarf

Dann Corbit wrote:

<we******@gmail.comwrote:
Dann Corbit wrote:
"Richard Heathfield" <in*****@invalid.invalidwrote:
we******@gmail.com said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrugAny key can be expressed as a bit pattern. Bit patterns can be
compared in a consistent manner. As long as a key is unique, in the
worst
case you can compare them using memcmp.

Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.
This is not true. A hash table is faster for *most* operations, under
the most typical real world scenarios. A hash table assumes you have a
lot of memory, and basically leverages the large addressing
capabilities of general hardware in lieu of log(n) binary choices to
find individual entries.

A tree structure requires more memory than a hash table.

But with a hash table you still have to compute a hash function
whatever it is you are looking for, before you search for, insert or
delete. In ideal cases of a binary tree with large string keys, it may
be faster just to burn the cost of O(log(tree-size)) comparisons to
find out that your entry is not in the tree, rather than compute the
hash of it in O(key-length) time.

I guess that the hash computation will be as fast as comparison. Since it
happens N times instead of N*log(N) times, the hash computation will be
faster:

You mean on initial inserts. That's fine, but a common scenario is to
be performance limited by searching.

http://www.cse.yorku.ca/~oz/hash.html

Lol! Did you just seriously quote that web page to me in an effort to
educate me about hash functions? Try this page:
http://burtleburtle.net/bob/hash/doobs.html .

Furthermore, if your hash table has been improperly sized, then you
will pay unusually high reprobe penalities, or may thrash your memory
when you don't need to. A hash table usually never has 100% density
efficiency -- on average its half filled. If you make an
auto-resizable hash table, and assume that the number of entries is
generally monotonically increasing, then it means that periodically,
operations will cost O(tree-size) which can be extremely expensive, and
will not fly in real-time systems.

It's pretty unusual to have no idea how large your data set is.

It is? Why do you think this? What if the programmer is not in
control of the data set?

That all being said -- *usually* hash tables are faster. It assumes
you can compute the hash function very quickly, and that your table is
not encumbered by extra overhead for each operation. James Dow Allen
has written a hash table that satisfies these contraints (though I
personally dislike his programming style.)

There are heaps and loads of hash functions with liberal licenses for use.

I know -- but many of them are a complete waste of time because they
are actually unusually *SLOW* because of the way they have been
implemented (which defeats the purpose, of course -- hashes are
supposed to be faster than trees in the best case scenario). Of the
ones I looked at, James' at least does not make this mistake.

The only reason not to use a hash table is if you need to do range
searches.
Well, another case, not described above, is if you are implementing
something like a file system.

Here, you may be onto something. I have seen real-world studies where disk
storage is required where b+trees are faster than hash tables.

Yeah, this is because B+ trees recover a lot of locality, and the
log(n) is divided by a reasonably high constant. B+ trees also reduce
the memory overhead to a point where you might expect it to take less
memory than a hash table.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Jul 11 '06 #26

Dann Corbit

<we******@gmail.comwrote in message
news:11*********************@p79g2000cwp.googlegro ups.com...

Dann Corbit wrote:
><we******@gmail.comwrote:
Dann Corbit wrote:
"Richard Heathfield" <in*****@invalid.invalidwrote:
we******@gmail.com said:
Richard Heathfield wrote:
comp_novice said:
I would like to store a key-value pair and then retrieve the
value
based on a given key value.

I am quite conversent in C++ but this needs be implemented in C.

Would appriciate any idea how to achieve this in C?

Use a binary search tree. [...]

Huh? Doesn't this assume you have a lexical ordering on keys?

<shrugAny key can be expressed as a bit pattern. Bit patterns can
be
compared in a consistent manner. As long as a key is unique, in the
worst
case you can compare them using memcmp.

Why
don't we go for a hash table instead?

I like trees.

A hash table is much faster than a tree for all operations.

This is not true. A hash table is faster for *most* operations, under
the most typical real world scenarios. A hash table assumes you have a
lot of memory, and basically leverages the large addressing
capabilities of general hardware in lieu of log(n) binary choices to
find individual entries.

A tree structure requires more memory than a hash table.

But with a hash table you still have to compute a hash function
whatever it is you are looking for, before you search for, insert or
delete. In ideal cases of a binary tree with large string keys, it may
be faster just to burn the cost of O(log(tree-size)) comparisons to
find out that your entry is not in the tree, rather than compute the
hash of it in O(key-length) time.

I guess that the hash computation will be as fast as comparison. Since
it
happens N times instead of N*log(N) times, the hash computation will be
faster:

You mean on initial inserts. That's fine, but a common scenario is to
be performance limited by searching.

Searching a well designed hash table is O(1). See the article referenced
elsethread.

>http://www.cse.yorku.ca/~oz/hash.html

Lol! Did you just seriously quote that web page to me in an effort to
educate me about hash functions? Try this page:
http://burtleburtle.net/bob/hash/doobs.html .

It was not an attempt to educate you about hash functions. It was an
attempt to show you hash functions which are very fast to compute and which
have excellent dispersion. I am aware of the hash functions on that page,
and use them in a commercial product I wrote, along with FNV hash, UMAC and
several others.

Furthermore, if your hash table has been improperly sized, then you
will pay unusually high reprobe penalities, or may thrash your memory
when you don't need to. A hash table usually never has 100% density
efficiency -- on average its half filled. If you make an
auto-resizable hash table, and assume that the number of entries is
generally monotonically increasing, then it means that periodically,
operations will cost O(tree-size) which can be extremely expensive, and
will not fly in real-time systems.

It's pretty unusual to have no idea how large your data set is.

It is?

Yes.

Why do you think this?

Experience.

What if the programmer is not in
control of the data set?

You mean like a database? The programmer will still have access to some
kind of estimate of the set size (cardinality). In the rare case of
streaming data with unknown start and end a hash table will still be better,
given the right design.

If you are paranoid about it, you can make a hash table of skiplists.

That all being said -- *usually* hash tables are faster. It assumes
you can compute the hash function very quickly, and that your table is
not encumbered by extra overhead for each operation. James Dow Allen
has written a hash table that satisfies these contraints (though I
personally dislike his programming style.)

There are heaps and loads of hash functions with liberal licenses for
use.

I know -- but many of them are a complete waste of time because they
are actually unusually *SLOW* because of the way they have been
implemented (which defeats the purpose, of course -- hashes are
supposed to be faster than trees in the best case scenario). Of the
ones I looked at, James' at least does not make this mistake.

If it is memory based, hash tables are going to be faster unless they are
badly designed. If the data is permanent, then it is a lot harder to get
hashes to be faster, but the advanced database systems do manage it.

>The only reason not to use a hash table is if you need to do range
searches.

Well, another case, not described above, is if you are implementing
something like a file system.

Here, you may be onto something. I have seen real-world studies where
disk
storage is required where b+trees are faster than hash tables.

Yeah, this is because B+ trees recover a lot of locality, and the
log(n) is divided by a reasonably high constant. B+ trees also reduce
the memory overhead to a point where you might expect it to take less
memory than a hash table.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Jul 11 '06 #27

Richard Bos

Eric Sosman <Er*********@sun.comwrote:

Dann Corbit wrote On 07/10/06 15:12,:

A hash table is much faster than a tree for all operations.

Usually, yes -- but as Pitti-Sing says, "Bless you,
it all depends!"

The only reason not to use a hash table is if you need to do range searches.
If all searches are equality searches, then the right answer is a hash
table.

I'd extend this just a tiny bit: If range *operations*
are important, ordered data structures (trees, skip lists,
what-have-you) might be the answer. Even if all the searches
are for exact equality, it is sometimes important to be able
to access the "neighbors" of a located item, or to move
"forward" and/or "backward" from it.

Item: database programming. Indexing, in particular.

Richard

Jul 11 '06 #28

STL map kinda functionality in C

Similar topics