How to implement a Hash Table in C

ravi

Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

Aug 11 '07 #1

Subscribe Post Reply

139

14018

Malcolm McLean

"ravi" <dc**********@gmail.comwrote in message
news:11**********************@i13g2000prf.googlegr oups.com...

Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

Which ds?
You can read all about hash tables in my book, Basic Algorithms. The hash
tables chapter is free.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 11 '07 #2

santosh

ravi wrote:

Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

The hash table is itself a data structure. It works in conjunction with hash
function.

<http://www.cs.auckland.ac.nz/software/AlgAnim/hash_tables.html>
<http://www.cl.cam.ac.uk/~cwc22/hashtable/>
<http://www.monad.me.uk/algorithms/hash.html>
<http://cbfalconer.home.att.net/download/hashlib.zip>
<http://en.wikipedia.org/wiki/Hash_table>

Aug 11 '07 #3

Ben Bacarisse

"Malcolm McLean" <re*******@btinternet.comwrites:

"ravi" <dc**********@gmail.comwrote in message
news:11**********************@i13g2000prf.googlegr oups.com...
>Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

Which ds?
You can read all about hash tables in my book, Basic Algorithms. The
hash tables chapter is free.

I think it should be pointed out that your hash table is not like
most. Most hash tables are dynamic structures that can grow as data
are added. Yours is fixed. The interface should have a clear warning
that the table fails (silently) if an attempt is made to add an item
beyond the initial capacity.

The point I made before (that the function loop indefinitely when the
table becomes full) is still true, but I now see that the add function
will silently fail long before that point is reached. If this limit
is by design, I think the text should make that limitation very clear.

It is odd (and probably very confusing to a beginner) that the add
function seems to return a success/fail indication, but that it does
not use it to signal this disastrous case.

[Aside: shouldn't a book on basic algorithms illustrate (or at least
describe) how a hash table can grow to accommodate more data?]

To the OP:
1) A proper hash table can grow as data are added.

2) The add function should return an error if the data can't be added
-- for example, some do if the key already exists but it certainly
should if the operation would overflow allocated storage.

3) Most designers would make a hash table, in C at least, that did not
duplicate the key and the data. The idea being to leave the actual
data allocation up to the table user. The table itself would just
store pointers.

--
Ben.

Aug 11 '07 #4

CBFalconer

ravi wrote:

>
can anybody tell me that which ds will be best suited to implement
a hash table in C/C++

I am not sure what your question is. However, a complete, and
portable, hash table implementation under GPL is available as
hashlib at:

<http://cbfalconer.home.att.net/download/>

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 11 '07 #5

Malcolm McLean

"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...

"Malcolm McLean" <re*******@btinternet.comwrites:

>"ravi" <dc**********@gmail.comwrote in message
news:11**********************@i13g2000prf.googleg roups.com...
>>Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

Which ds?
You can read all about hash tables in my book, Basic Algorithms. The
hash tables chapter is free.

I think it should be pointed out that your hash table is not like
most. Most hash tables are dynamic structures that can grow as data
are added. Yours is fixed. The interface should have a clear warning
that the table fails (silently) if an attempt is made to add an item
beyond the initial capacity.

The point I made before (that the function loop indefinitely when the
table becomes full) is still true, but I now see that the add function
will silently fail long before that point is reached. If this limit
is by design, I think the text should make that limitation very clear.

I think your understanding of hash tables might need to be raised a notch.
If a hash table beomes over full then the algorithm becomes pathological.
Just as if there are only a few empty spaces in a car park, you spend
virtulaly all your time crawling around looking for a slot.
So the table length is twice the capcity, passed into the constructor.
However if the table items are large, that wastes space. So the able itself
id a list of pointers. However calling malloc() for each item would be
inefficient unless the items are extremely large. So we use the fixed
allocator (in the chapter memory games, also free).
This will be exhaused once the table capacity is reached, so the algorithm
will never reach the pathological condition. I admit that this code might be
a bit hairy. It is probably best to add an explicit test against table
capacity to make clear that the table cannot be over-filled.
The initial code is a simlle hash table tyhat uses linear probing, and has a
fixed capacity. The second implenetation introduces quadratic probing and
dynamic tables. These do raise a lot of memory management and efficiency
issues, which are evaded by making th second table store only pointers

>
It is odd (and probably very confusing to a beginner) that the add
function seems to return a success/fail indication, but that it does
not use it to signal this disastrous case.

The "disastrous case" can't happen. The fixed allocator will run out of
memory and return null when capity is reached. Please avoid this sort of
hyperbole.

>
[Aside: shouldn't a book on basic algorithms illustrate (or at least
describe) how a hash table can grow to accommodate more data?]

It does. That's in the second half of the chapter. I introduce the central
idea first, and then discuss some refinements and complications

>
To the OP:
1) A proper hash table can grow as data are added.

Done

>
2) The add function should return an error if the data can't be added
-- for example, some do if the key already exists but it certainly
should if the operation would overflow allocated storage.

Done.

>
3) Most designers would make a hash table, in C at least, that did not
duplicate the key and the data. The idea being to leave the actual
data allocation up to the table user. The table itself would just
store pointers.

The second table does store pointers. The best representation of keys is a
whole separate issue, which I don't go into in any detail.

>

The other basic point you've missed is that the code is designed to
introduce hash tables gradually. So a few details are missing from the first
examples, so that the central concept of hashing can be brought across.
If you want to go from a state of not understanding hash tables to
understanding how to code them, go to my book. If you want a drop-in hash
table to put into production C code, Chuck's table is better. There is no
point in us duplicating our efforts.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 11 '07 #6

Ark Khasin

Ben Bacarisse wrote:

"Malcolm McLean" <re*******@btinternet.comwrites:

>"ravi" <dc**********@gmail.comwrote in message
news:11**********************@i13g2000prf.googleg roups.com...
>>Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

Which ds?
You can read all about hash tables in my book, Basic Algorithms. The
hash tables chapter is free.

I think it should be pointed out that your hash table is not like
most. Most hash tables are dynamic structures that can grow as data
are added. Yours is fixed. The interface should have a clear warning
that the table fails (silently) if an attempt is made to add an item
beyond the initial capacity.

The point I made before (that the function loop indefinitely when the
table becomes full) is still true, but I now see that the add function
will silently fail long before that point is reached. If this limit
is by design, I think the text should make that limitation very clear.

It is odd (and probably very confusing to a beginner) that the add
function seems to return a success/fail indication, but that it does
not use it to signal this disastrous case.

[Aside: shouldn't a book on basic algorithms illustrate (or at least
describe) how a hash table can grow to accommodate more data?]

To the OP:
1) A proper hash table can grow as data are added.

2) The add function should return an error if the data can't be added
-- for example, some do if the key already exists but it certainly
should if the operation would overflow allocated storage.

3) Most designers would make a hash table, in C at least, that did not
duplicate the key and the data. The idea being to leave the actual
data allocation up to the table user. The table itself would just
store pointers.

Mr McLean appears to be a man of strong faith. As such, he is not
persuaded/dissuaded by rational argument of which elsethread is plenty.

Aug 11 '07 #7

CBFalconer

Malcolm McLean wrote:

>

.... snip ...

>
I think your understanding of hash tables might need to be raised
a notch. If a hash table beomes over full then the algorithm
becomes pathological. Just as if there are only a few empty spaces
in a car park, you spend virtulaly all your time crawling around
looking for a slot.

Not necessarily so. For example, the code in hashlib automatically
expands the table, so that the system is normally between roughly
44 and 88% full. This expansion can continue to ridiculous limits.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 11 '07 #8

Keith Thompson

ravi <dc**********@gmail.comwrites:

can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

Until I read some of the responses, I had no idea what you meant by
"ds" (apparently it's an abbreviation for "data structure").

Please don't use abbreviations like that unless you're sure everyone
is going to understand what they mean. It doesn't take very long to
type out the words "data structure".

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Aug 11 '07 #9

Ben Bacarisse

"Malcolm McLean" <re*******@btinternet.comwrites:

"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>"Malcolm McLean" <re*******@btinternet.comwrites:

>>"ravi" <dc**********@gmail.comwrote in message
news:11**********************@i13g2000prf.google groups.com...
Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

Which ds?
You can read all about hash tables in my book, Basic Algorithms. The
hash tables chapter is free.

I think it should be pointed out that your hash table is not like
most. Most hash tables are dynamic structures that can grow as data
are added. Yours is fixed. The interface should have a clear warning
that the table fails (silently) if an attempt is made to add an item
beyond the initial capacity.

The point I made before (that the function loop indefinitely when the
table becomes full) is still true, but I now see that the add function
will silently fail long before that point is reached. If this limit
is by design, I think the text should make that limitation very clear.

I think your understanding of hash tables might need to be raised a
notch.

If you are going to be like that...

<snip>

>It is odd (and probably very confusing to a beginner) that the add
function seems to return a success/fail indication, but that it does
not use it to signal this disastrous case.

The "disastrous case" can't happen. The fixed allocator will run out
of memory and return null when capity is reached. Please avoid this
sort of hyperbole.

OK. Less hyperbole. Your code exhibits undefined behaviour. You
'dereference' a null pointer. On my system I get a segmentation
violation before hash_add gets any say in the matter. If or when you
find the error, you will say it is a detail, a typo, whatever. I
don't think you worry about details like this.

Do you care, for example, that your statement:

The test for primality is interesting. The only absolutely sure way
of determining whether a number is prime, as far as is known, is to
attempt to divide by all prime numbers below or equal to its square
root.

is not true? Probably not. I will try, in future, to leave you alone.

--
Ben.

Aug 12 '07 #10

Malcolm McLean

"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...

"Malcolm McLean" <re*******@btinternet.comwrites:

>"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>>"Malcolm McLean" <re*******@btinternet.comwrites:

"ravi" <dc**********@gmail.comwrote in message
news:11**********************@i13g2000prf.googl egroups.com...
Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++
>
thanx. in advanced
>
Which ds?
You can read all about hash tables in my book, Basic Algorithms. The
hash tables chapter is free.

I think it should be pointed out that your hash table is not like
most. Most hash tables are dynamic structures that can grow as data
are added. Yours is fixed. The interface should have a clear warning
that the table fails (silently) if an attempt is made to add an item
beyond the initial capacity.

The point I made before (that the function loop indefinitely when the
table becomes full) is still true, but I now see that the add function
will silently fail long before that point is reached. If this limit
is by design, I think the text should make that limitation very clear.

I think your understanding of hash tables might need to be raised a
notch.

If you are going to be like that...

<snip>

>>It is odd (and probably very confusing to a beginner) that the add
function seems to return a success/fail indication, but that it does
not use it to signal this disastrous case.

The "disastrous case" can't happen. The fixed allocator will run out
of memory and return null when capity is reached. Please avoid this
sort of hyperbole.

OK. Less hyperbole. Your code exhibits undefined behaviour. You
'dereference' a null pointer. On my system I get a segmentation
violation before hash_add gets any say in the matter. If or when you
find the error, you will say it is a detail, a typo, whatever. I
don't think you worry about details like this.

Obviously if you say things like "the function will not return at all when
the table is full" you cannot understand that a hash table (to be pedantic,
one without "perfect hashing") cannot be more than about 88% full before the
algorithm becomes pathological. In fact I pass in a "capacity" argument
which is half table size.
I think you need to read the chapter again, as a pupil rather than as a
critic.

As for the derefencing null pointer, that will happen if the constructor
fails, and return NULL. That is standard behaviour throughout the book. If
objects cannot be created, the constructing function returns a null pointer
to show they have failed.
I couldn't see another. The code has been tested, but only on two systems (a
UNIX mainframe and Windows) so that doesn't mean no errors remain, and of
course typographical errors do creep in in the process of reformatting for
print - I wish I had a tool to format source code automatically but I don't.
The normal thing when you an error such as dereference of a null is to say
"your code derererences a null" rather than to talk airly about "catastropic
behaviour" in an arrogant manner.

>
Do you care, for example, that your statement:

The test for primality is interesting. The only absolutely sure way
of determining whether a number is prime, as far as is known, is to
attempt to divide by all prime numbers below or equal to its square
root.

is not true? Probably not. I will try, in future, to leave you alone.

You mean randomised primality testing? It's not absolutely sure.
I am no mathematician, maybe you could tell us your method?

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 12 '07 #11

James Dow Allen

On Aug 11, 7:02 pm, "Malcolm McLean" <regniz...@btinternet.comwrote:

"Ben Bacarisse" <ben.use...@bsb.me.ukwrote ...

I think your understanding of hash tables might need to be raised a
notch.

Hash tables are frequently discussed in this and "nearby"
ngs, but it is clear that most programmers are unaware of
innovations like Cuckoo Hash or "Compact Hash", etc.
These aren't just theoretical curiosities but have important
advantages over hash tables described in old textbooks.

you cannot understand that a hash table (to be pedantic,
one without "perfect hashing") cannot be more than
about 88% full before the algorithm becomes pathological.

Cuckoo hash, just to give one example, can operate
well above 88%. ("Pathology" is a matter of semantics
since many methods can encounter O(n) behavior even
at low utilization, with non-zero probability,
due to very bad "luck".) Cuckoo hash, BTW, has
another very important space-savings advantage over
ordinary open-address hashing. I'll leave the
identification of this advantage as an exercise.... :-)

And popular "chained hashing" methods simply do
not suffer from the pathology you imply.

The test for primality is interesting.

Very interesting, though I know little about it.

The only absolutely sure way
of determining whether a number is prime, as far as is known, is to

The largest *certainly* known non-Marsenne prime is
19249*2^13018586 + 1
a number with almost 4,000,000 digits.
I've no idea how its primality was proved,
but if they...

attempt to divide by all prime numbers below or equal to its square
root.

.... they would have had to perform about 10^2000
long divisions (very long!). Assuming they'd
recruited a googol monkeys, and each monkey
could do a googol divisions per year, they'd
need 10^1800 years to prove primality your way!

I'm no primality expert, but it took me only a
few seconds at Wikipedia to discover:

The first deterministic primality test significantly
faster than the naïve methods was the cyclotomy test;
its runtime can be proven to be
O((log n) ^ clog log log n),
where n is the number to test for primality and
c is a constant independent of n.

Wikipedia (and Google) is your friend! The bad rep
Wikipedia has in recent media reports does not apply,
I think, to most science pages.

I didn't invent Cuckoo Hash (wish I had), learned
of it after Googling in a dialog with another computer
scientist (who hadn't heard of Cuckoo either, even
though he wrote papers on Hash Tables!)

I don't think I'm the only one whose posting
quality would go up if time wasted on Usenet
invective were spent instead on research.

James Dow Allen

Aug 12 '07 #12

Richard Tobin

In article <AP*********************@bt.com>,
Malcolm McLean <re*******@btinternet.comwrote:

>You mean randomised primality testing? It's not absolutely sure.
I am no mathematician, maybe you could tell us your method?

Google for "primality testing". There are deterministic tests faster
than testing all the primes < sqrt(n), and the probabilistic tests can
be used to reduce the probability as much as you like (for example, to
less than the chance of your computer being struck by lightning).

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

Aug 12 '07 #13

Malcolm McLean

"James Dow Allen" <jd*********@yahoo.comwrote in message
news:11**********************@x40g2000prg.googlegr oups.com...
On Aug 11, 7:02 pm, "Malcolm McLean" <regniz...@btinternet.comwrote:

"Ben Bacarisse" <ben.use...@bsb.me.ukwrote ...

I think your understanding of hash tables might need to be raised a
notch.

>Hash tables are frequently discussed in this and "nearby"
ngs, but it is clear that most programmers are unaware of
innovations like Cuckoo Hash or "Compact Hash", etc.
These aren't just theoretical curiosities but have important
advantages over hash tables described in old textbooks.

> you cannot understand that a hash table (to be pedantic,
one without "perfect hashing") cannot be more than
about 88% full before the algorithm becomes pathological.

>Cuckoo hash, just to give one example, can operate
well above 88%. ("Pathology" is a matter of semantics
since many methods can encounter O(n) behavior even
at low utilization, with non-zero probability,
due to very bad "luck".) Cuckoo hash, BTW, has
another very important space-savings advantage over
ordinary open-address hashing. I'll leave the
identification of this advantage as an exercise.... :-)

And popular "chained hashing" methods simply do
not suffer from the pathology you imply.

The book is Basic Algorithms. I am aware that there are more sophisticated
methods of hashing, though I am not an expert in them.
However someone who doesn't understand that a simple hash table breaks down
when it becomes too full obviously needs their understanding even of the
basic algorithm raised a notch, as I said.

>
> The test for primality is interesting.

Very interesting, though I know little about it.

There are certain advantages in making the hash table length prime. However
it is a chapter on hash tables, not prime numbers. So I present a naive test
but mention that there are better ones.
In fact I was wrong about the naive test being the only deterministic one.
Sorry about that. There's no chapter on number theory, in case you are
wondering.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 12 '07 #14

websnarf

On Aug 11, 4:29 am, santosh <santosh....@gmail.comwrote:

ravi wrote:
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

The hash table is itself a data structure. It works in conjunction with
hash function.

It sounds like he is asking what *other* data structures could be used
to implement it. Reallocatable arrays, pointers and structs seems
like the best choices to me. :) Yeah, I know, that's basically most
of what C has, but it seems they are all necessary to solve the
problem anyways. He also asked about C++, in which case you might
like to use a vector instead of a reallocatable array, but that's
apparently off topic here anyways.

The real magic of a hash table, of course, is the internal mapping
structure, the hash function and the operating methods that you decide
to allow for the hash table. But he didn't ask about that.

--
Paul Hsieh
http://www.pobox.com/~qed/hash.html
http://bstring.sf.net/

Aug 12 '07 #15

Eric Sosman

Malcolm McLean wrote:

>
Obviously if you say things like "the function will not return at all
when the table is full" you cannot understand that a hash table (to be
pedantic, one without "perfect hashing") cannot be more than about 88%
full before the algorithm becomes pathological.

Nonsense. Utter uninformed nonsense. "Hash table" covers
a wide variety of data structures and associated algorithms,
with different characteristics. There's an excellent chapter
in TAOCP, and from the above quote there's no evidence that
you've understood any of it.

I think you need to read the chapter again, as a pupil rather than as a
critic.

I have subjected myself to about one and a half chapters
of your awful book already, and will not willingly read more.

The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropic behaviour" in an arrogant manner.

Is dereferencing a null pointer provably non-catastrophic?
Is it even "proBably" non-catastrophic? What guarantees apply
to undefined behavior that might limit its damage? What part
of "undefined" are you having trouble with? What part of "un-"
are you failing to grasp?

I am no mathematician, [...]

I'd guessed as much.

--
Eric Sosman
es*****@ieee-dot-org.invalid

Aug 12 '07 #16

Ben Bacarisse

"Malcolm McLean" <re*******@btinternet.comwrites:

As for the derefencing null pointer, that will happen if the
constructor fails, and return NULL. That is standard behaviour
throughout the book. If objects cannot be created, the constructing
function returns a null pointer to show they have failed.
I couldn't see another. The code has been tested, but only on two
systems (a UNIX mainframe and Windows) so that doesn't mean no errors
remain, and of course typographical errors do creep in in the process
of reformatting for print - I wish I had a tool to format source code
automatically but I don't. The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropic behaviour" in an arrogant
manner.

I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.

I think you should fix that and the other clear errors[1] that have been
pointed out before you call other people arrogant. You say in the
introduction that bugs can be "corrected very quickly", but you don't
seem to have corrected anything.

One reason I have got so worked up about "details" is that your post
was in c.l.c. The things that I object to most strongly in the book
have nothing to do with C and so a lot of my objections get magnified
through those that *are* topical here. I will try to leave the
subject alone. I think we both know how the other feels about this
subject.

>Do you care, for example, that your statement:

The test for primality is interesting. The only absolutely sure way
of determining whether a number is prime, as far as is known, is to
attempt to divide by all prime numbers below or equal to its square
root.

is not true? Probably not. I will try, in future, to leave you alone.

You mean randomised primality testing? It's not absolutely sure.

No. I mean simply that it is not a true statement. There are other
non-probabilistic methods for testing for primality.

I am no mathematician, maybe you could tell us your method?

None is "my" method. They are all easy to find via a quick web
search. The two most famous are the recent Agrawal, Kayal and Saxena
algorithm (that finally proved that primality testing was in P as had
been suspected) and the much older (and currently the fastest) method
based on the work of Adleman, Pomerance and Rumely, improved by Cohen
and Lenstra (this later is sometimes called the APR-CL test).

[1] By "clear error" I mean things that are not a matter of style or
pedagogy. My list is:

(1) redundant (i.e. provably true) if (ptr str) in strtrim

(2) array[y*10=x];

(3) if (8str == '}') in checkbalanced

(4) incorrect quotes on some char constants in the above function
(only visible in the book version)

(5) int expy;; in floating add (not an error, just an obvious typo).

(6) if(x & 0x7FFFFFFFF < y & 0x7FFFFFFF) in same (two errors here).

(7) if(tail == 100] in add (for a queue)

(8) no test for queue being full or queue being empty

(9) if(ptr == room->baddies) room->baddies = ptr; in removebaddy

(10) missing ";" after assert( x 0.0) squareroot

(11) null pointer dereference in fixedallocate

(12) The statement above about prime testing.

--
Ben.

Aug 12 '07 #17

Mark McIntyre

On Sun, 12 Aug 2007 09:09:46 -0400, in comp.lang.c , Eric Sosman
<es*****@ieee-dot-org.invalidwrote:

>Malcolm McLean wrote:
>>
The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropic behaviour" in an arrogant manner.

Is dereferencing a null pointer provably non-catastrophic?

I guess the point is, when you're commenting on something its
generally considered more productive to use non-pejorative language if
you want to cause the writer to change. Obviously if your only
intention is to highlight the error and complain about the author, the
other method works fine...
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Aug 12 '07 #18

Malcolm McLean

"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...

"Malcolm McLean" <re*******@btinternet.comwrites:

>As for the derefencing null pointer, that will happen if the
constructor fails, and return NULL. That is standard behaviour
throughout the book. If objects cannot be created, the constructing
function returns a null pointer to show they have failed.
I couldn't see another. The code has been tested, but only on two
systems (a UNIX mainframe and Windows) so that doesn't mean no errors
remain, and of course typographical errors do creep in in the process
of reformatting for print - I wish I had a tool to format source code
automatically but I don't. The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropic behaviour" in an arrogant
manner.

I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.

Yes, that is another bug.
fixedallocate should have an if(answer) before setting top to answer->next.
Why not say that instead of all that vague talk about catastrophic behaviour
when the hash table gets full?
(The other bug complaint about the code getting stuck in a loop when the
table gets full is false. Which I don't mind, we can all make mistakes, but
at least admit that you are wrong. Otherwise people wonder if you really
understand hash tables at all.)

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 12 '07 #19

Flash Gordon

Malcolm McLean wrote, On 12/08/07 19:22:

>
"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>"Malcolm McLean" <re*******@btinternet.comwrites:

>>As for the derefencing null pointer, that will happen if the
constructor fails, and return NULL. That is standard behaviour
throughout the book. If objects cannot be created, the constructing
function returns a null pointer to show they have failed.
I couldn't see another. The code has been tested, but only on two
systems (a UNIX mainframe and Windows) so that doesn't mean no errors
remain, and of course typographical errors do creep in in the process
of reformatting for print - I wish I had a tool to format source code
automatically but I don't. The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropic behaviour" in an arrogant
manner.

I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.

Yes, that is another bug.
fixedallocate should have an if(answer) before setting top to answer->next.

So fix it and all the other reported bugs. You should also add a change
record to the book so that people know what you have fixed.

Why not say that instead of all that vague talk about catastrophic
behaviour when the hash table gets full?

Perhaps he thought you should actually test the code in your book when
you are charging for it? Proper testing would have included filling the
hash table.

(The other bug complaint about the code getting stuck in a loop when the
table gets full is false. Which I don't mind, we can all make mistakes,
but at least admit that you are wrong. Otherwise people wonder if you
really understand hash tables at all.)

Actually, I would say it shows that your implementation makes it harder
for people to see how the algorithm works. A decent teaching
implementation of a hash would, IMHO, include the check rather than
relying on how another non-standard (although provided) function behaves.
--
Flash Gordon

Aug 12 '07 #20

Malcolm McLean

"Flash Gordon" <sp**@flash-gordon.me.ukwrote in message
news:36************@news.flash-gordon.me.uk...

Malcolm McLean wrote, On 12/08/07 19:22:
>>
"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>>"Malcolm McLean" <re*******@btinternet.comwrites:

As for the derefencing null pointer, that will happen if the
constructor fails, and return NULL. That is standard behaviour
throughout the book. If objects cannot be created, the constructing
function returns a null pointer to show they have failed.
I couldn't see another. The code has been tested, but only on two
systems (a UNIX mainframe and Windows) so that doesn't mean no errors
remain, and of course typographical errors do creep in in the process
of reformatting for print - I wish I had a tool to format source code
automatically but I don't. The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropic behaviour" in an arrogant
manner.

I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.

Yes, that is another bug.
fixedallocate should have an if(answer) before setting top to
answer->next.

So fix it and all the other reported bugs. You should also add a change
record to the book so that people know what you have fixed.

>Why not say that instead of all that vague talk about catastrophic
behaviour when the hash table gets full?

Perhaps he thought you should actually test the code in your book when you
are charging for it? Proper testing would have included filling the hash
table.

>(The other bug complaint about the code getting stuck in a loop when the
table gets full is false. Which I don't mind, we can all make mistakes,
but at least admit that you are wrong. Otherwise people wonder if you
really understand hash tables at all.)

Actually, I would say it shows that your implementation makes it harder
for people to see how the algorithm works. A decent teaching
implementation of a hash would, IMHO, include the check rather than
relying on how another non-standard (although provided) function behaves.

I'm inclined to agree. You pass in a capacity, but it would be a lot clearer
if the function explicitly rejected attempts to fill the table beyond that
capacity.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 12 '07 #21

Ben Bacarisse

"Malcolm McLean" <re*******@btinternet.comwrites:

"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>"Malcolm McLean" <re*******@btinternet.comwrites:

>>As for the derefencing null pointer, that will happen if the
constructor fails, and return NULL. That is standard behaviour
throughout the book. If objects cannot be created, the constructing
function returns a null pointer to show they have failed.
I couldn't see another. The code has been tested, but only on two
systems (a UNIX mainframe and Windows) so that doesn't mean no errors
remain, and of course typographical errors do creep in in the process
of reformatting for print - I wish I had a tool to format source code
automatically but I don't. The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropic behaviour" in an arrogant
manner.

I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.

Yes, that is another bug.
fixedallocate should have an if(answer) before setting top to answer->next.
Why not say that instead of all that vague talk about catastrophic
behaviour when the hash table gets full?

I am sorry that my report was not clear. I am not reacting to this
topic rationally. I find the idea if charging money for a book of
this quality hard to accept, and my anger at that has affected the
clarity of my postings.

(The other bug complaint about the code getting stuck in a loop when
the table gets full is false. Which I don't mind, we can all make
mistakes, but at least admit that you are wrong. Otherwise people
wonder if you really understand hash tables at all.)

Yes, that was my mistake. I don't think it shows your design in a
good light, however. It helps to follow a design if one can think of
the pieces separately, and the correctness of your hash functions all
rely on the allocator failing at the right point. A change to the
code to replace the allocator would break it in a mysterious way.
This kind linkage between components is not desirable and it is
particularly odd to have used it in a teaching book.

Will you accept that readers should get a refund when they see the
code you present for implementing a queue as a circular buffer?

--
Ben.

Aug 13 '07 #22

Richard Heathfield

Malcolm McLean said:

>
"ravi" <dc**********@gmail.comwrote in message
news:11**********************@i13g2000prf.googlegr oups.com...
>Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

Which ds?
You can read all about hash tables in my book, Basic Algorithms. The
hash tables chapter is free.

Please don't do a navia on us, Malcolm. If your book is good enough,
other people here will recommend it when appropriate. And if it isn't
good enough, nobody here should be recommending it, least of all you.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 13 '07 #23

Richard Heathfield

CBFalconer said:

ravi wrote:
>>
can anybody tell me that which ds will be best suited to implement
a hash table in C/C++

I am not sure what your question is.

It seems pretty clear to me. But he needs to decide what language he's
writing in before he starts worrying about difficult stuff.

However, a complete, and
portable, hash table implementation under GPL

....does not answer the question, any more than "what's the best kind of
honey?" is answered by "here, have one of my honey sandwiches".

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 13 '07 #24

Richard Heathfield

Malcolm McLean said:

<snip>

There are certain advantages in making the hash table length prime.

If you use quadratic probing as your collision avoidance mechanism, a
prime hash table length would seem to me to be essential, not just a
"certain advantage". If, on the other hand, you treat the hash table as
an array of collections of some kind, then I see no advantage at all to
the hash table length being prime.

A book that covers hash tables ought, at the very least, to mention
quadratic probing, and demonstrate how it can get into a cycle with
composite hash table sizes.

For example, consider a hash table with ten elements, of which elements
0, 3, 4, 5, 8 and 9 are full (which leaves elements 1, 2, 6 and 7
empty, so the table is only 60% full).

Now try to add an element at index 4, using quadratic probing, and you
will find yourself trying elements 4, 5, 8, 3, 0, 9, 0, 3, 8, 5, 4, 5,
8, 3, 0, 9, 0, 3, 8, 5, 4, 5, ... ad nauseam.

Using a prime number guarantees that you won't get into this cycle
unless the table is actually full (which you can check separately, in
advance).

Primes aren't just an advantage for quadratic probing - they're
essential.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 13 '07 #25

pete

Richard Heathfield wrote:

>
CBFalconer said:

ravi wrote:
>
can anybody tell me that which ds will be best suited to implement
a hash table in C/C++
I am not sure what your question is.

It seems pretty clear to me. But he needs to decide what language he's
writing in before he starts worrying about difficult stuff.

Does "ds" really mean "difficult stuff"?

--
pete

Aug 13 '07 #26

Richard Heathfield

pete said:

Richard Heathfield wrote:
>>
CBFalconer said:

ravi wrote:

can anybody tell me that which ds will be best suited to implement
a hash table in C/C++

I am not sure what your question is.

It seems pretty clear to me. But he needs to decide what language
he's writing in before he starts worrying about difficult stuff.

Does "ds" really mean "difficult stuff"?

<grinPure coincidence, I'm afraid.

If he decides on C++, his answer is probably to be found in the STL. If
he chooses C, he'll just want to slap an array of objects together if
he's using linear or quadratic probing, or an array of collections
otherwise (where "collection" means something like a linked list, a
binary search tree, or maybe even another hash table).

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 13 '07 #27

Frodo Baggins

On Aug 12, 6:19 pm, Ben Bacarisse <ben.use...@bsb.me.ukwrote:

"Malcolm McLean" <regniz...@btinternet.comwrites:
As for the derefencing null pointer, that will happen if the
constructor fails, and return NULL. That is standard behaviour
throughout the book. If objects cannot be created, the constructing
function returns a null pointer to show they have failed.
I couldn't see another. The code has been tested, but only on two
systems (a UNIX mainframe and Windows) so that doesn't mean no errors
remain, and of course typographical errors do creep in in the process
of reformatting for print - I wish I had a tool to format source code
automatically but I don't. The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropic behaviour" in an arrogant
manner.

I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.

I think you should fix that and the other clear errors[1] that have been
pointed out before you call other people arrogant. You say in the
introduction that bugs can be "corrected very quickly", but you don't
seem to have corrected anything.

I would suggest using splint (http://www.splint.org/) if you haven't
done so already.

Regards,
Frodo B

Aug 13 '07 #28

Ivar Rosquist

On Sat, 11 Aug 2007 09:29:09 +0100, Malcolm McLean wrote:

"ravi" <dc**********@gmail.comwrote in message
news:11**********************@i13g2000prf.googlegr oups.com...
>Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

Which ds?
You can read all about hash tables in my book, Basic Algorithms. The
hash tables chapter is free.

Don't bother - that book probably is the buggiest book of the
year.

Aug 13 '07 #29

Malcolm McLean

"Richard Heathfield" <rj*@see.sig.invalidwrote in message
news:dY******************************@bt.com...

Malcolm McLean said:

<snip>

>There are certain advantages in making the hash table length prime.

If you use quadratic probing as your collision avoidance mechanism, a
prime hash table length would seem to me to be essential, not just a
"certain advantage". If, on the other hand, you treat the hash table as
an array of collections of some kind, then I see no advantage at all to
the hash table length being prime.

A book that covers hash tables ought, at the very least, to mention
quadratic probing, and demonstrate how it can get into a cycle with
composite hash table sizes.

For example, consider a hash table with ten elements, of which elements
0, 3, 4, 5, 8 and 9 are full (which leaves elements 1, 2, 6 and 7
empty, so the table is only 60% full).

Now try to add an element at index 4, using quadratic probing, and you
will find yourself trying elements 4, 5, 8, 3, 0, 9, 0, 3, 8, 5, 4, 5,
8, 3, 0, 9, 0, 3, 8, 5, 4, 5, ... ad nauseam.

Using a prime number guarantees that you won't get into this cycle
unless the table is actually full (which you can check separately, in
advance).

Primes aren't just an advantage for quadratic probing - they're
essential.

The hash table chapter includes an example of quadratic probing, which does
as you say.
Maybe the primality test should be more rigorous? The worst code is that
which seems to work, then fails when someone includes it in a life-support
system.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 13 '07 #30

Mark McIntyre

On Mon, 13 Aug 2007 21:40:52 +0100, in comp.lang.c , "Malcolm McLean"
<re*******@btinternet.comwrote:

>"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>>
I am sorry that my report was not clear. I am not reacting to this
topic rationally.

It's surprising how often people react like that. I get similar accusations
all the time about 12 Common Atheist Arguments (refuted).

<OT>
What, you mean as opposed to the irrationality of the book itself?
</ot>
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Aug 13 '07 #31

Ben Bacarisse

"Malcolm McLean" <re*******@btinternet.comwrites:

"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>>
>>Yes, that is another bug.
fixedallocate should have an if(answer) before setting top to answer-
next.
Why not say that instead of all that vague talk about catastrophic
behaviour when the hash table gets full?

I am sorry that my report was not clear. I am not reacting to this
topic rationally. I find the idea if charging money for a book of
this quality hard to accept, and my anger at that has affected the
clarity of my postings.

It's surprising how often people react like that. I get similar
accusations all the time about 12 Common Atheist Arguments
(refuted).

Do you push that book in groups where only a critique of the grammar
is on topic? If so, then I can see the similarity. If not, then you
missed my point. I am trying very hard to be topical for c.l.c. That
explains, I think, what some have seen as nitpicking. Post in
comp.programming or comp.theory if you want to see some comments on
your book as a whole.

There of course there can't be a technical justification.

You have a strange sense of humour (I assume that was a joke?).

<snip>

>Will you accept that readers should get a refund when they see the
code you present for implementing a queue as a circular buffer?

Maybe it's just me incapable of seeing the problems with my own code,
but what is wrong with a circular buffer? Sure, it doesn't grown
dynamically, but generally you want to ration queue length anyway. If
it consistently has more coming in than you're taking out, there's
probably something wrong with your logic, or the machine is
underpowered.

<stay clam>...

No. You are thinking of one kind of queue (where you do want to
balance input rate and output rate) and you've generalised from that.
There are uses of queues where your last statement is simply wrong.
For example, in a breadth first search of a tree using a queue,
continued growth of the queue simply indicates something about the
shape of the tree, not an underpowered machine or faulty logic.

Of course, limiting the queue size is often done, and a circular
buffer is an excellent way to implement a bounded queue. My objection
was that the code you present does not implement a bounded queue.
What you wrote

int queue[100];
int head;
int tail;

void add(int x)
{
if(tail == 100] /* sic */
tail = 0;
queue[tail++] = x;
}

int remove(void)
{
int answer = queue[head];
head++;
if(head == 100)
head = 0;
return answer;
}

is just a circular buffer. A bounded queue would have 'full' and
'empty' tests. The add operation would fail if the queue were full,
and the remove operation (badly named, as has been pointed out) would
fail if it were empty.

What algorithms would your queue implementation be suitable for?

--
Ben.

Aug 14 '07 #32

Richard Heathfield

Malcolm McLean said:

"Richard Heathfield" wrote...

<snip>

>>
Primes aren't just an advantage for quadratic probing - they're
essential.

The hash table chapter includes an example of quadratic probing, which
does as you say.

Well, I didn't buy the book, so I wouldn't know.

Maybe the primality test should be more rigorous? The worst code is
that which seems to work, then fails when someone includes it in a
life-support system.

The primality test should be *definitive*. There's no excuse for it not
to be. It took me less than 30 seconds - on a relatively ancient
machine - to generate all of the 5761455 primes in the range 1 to
100,000,000. And less than two seconds to generate all the 664579
primes in the range 1 to 10,000,000.

Just how big a hash table did your users want? If it has more than a
hundred million entries, I suspect that the cost of definitive
primality testing will be the least of your problems.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 14 '07 #33

James Dow Allen

On Aug 12, 6:11 pm, Richard Heathfield <r...@see.sig.invalidwrote:

Malcolm McLean said:
There are certain advantages in making the hash table length prime.

If you use quadratic probing as your collision avoidance mechanism, a
prime hash table length would seem to me to be essential, not just a
"certain advantage".

A robust quadratic prober will switch to
linear probing after "a while" (since it
would otherwise be able to access only half
the indexes). For this reason, Richard's
comment would be more robust if "essential"
is replaced with "very highly desireable."

If, on the other hand, you treat the hash table as
an array of collections of some kind, then I see no advantage at all to
the hash table length being prime.

I think there is a good practical reason to
use prime size, unrelated to quadratic probing.

A hash function needs to accomplish two
things: scramble, and reduce to range {0,1,...,N-1}
where N is table size. Two birds can be
(partially) killed with one stone when the last
step is taking remainder (index %= N), but this
scrambles best when N is prime.

(Flamers will be happy to note that we "should"
have scrambled well first but, unless we're
extremely confident in our hash function, why
not get the scrambling benefit of prime-number
remaindering?)

Another interesting issue, related to hashing
though perhaps unrelated to table size, concerns
reversible hash functions ("scrambling bijections").
These are easy to implement on {0,1,2,...,N-1}
*if* N is prime.

James Dow Allen

Aug 14 '07 #34

Richard Heathfield

James Dow Allen said:

<snip>

A robust quadratic prober will switch to
linear probing after "a while" (since it
would otherwise be able to access only half
the indexes). For this reason, Richard's
comment would be more robust if "essential"
is replaced with "very highly desireable."

I'll buy that. Here's your dollar.

<snip>

I think there is a good practical reason to
use prime size, unrelated to quadratic probing.

A hash function needs to accomplish two
things: scramble, and reduce to range {0,1,...,N-1}
where N is table size. Two birds can be
(partially) killed with one stone when the last
step is taking remainder (index %= N), but this
scrambles best when N is prime.

Two bucks in one day? Hmmm. I'm not talking to you any more without my
lawyer.

But seriously, there are also advantages in making the table size an
integer power of 2 (which, in all but one case, fails the primality
test with a vengeance), since the hash value can then be reduced to the
table size more rapidly.

Measure, measure, measure!

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 14 '07 #35

Army1987

On Mon, 13 Aug 2007 17:41:33 +0000, Frodo Baggins wrote:

On Aug 12, 6:19 pm, Ben Bacarisse <ben.use...@bsb.me.ukwrote:
>"Malcolm McLean" <regniz...@btinternet.comwrites:
As for the derefencing null pointer, that will happen if the
constructor fails, and return NULL. That is standard behaviour
throughout the book. If objects cannot be created, the constructing
function returns a null pointer to show they have failed.
I couldn't see another. The code has been tested, but only on two
systems (a UNIX mainframe and Windows) so that doesn't mean no errors
remain, and of course typographical errors do creep in in the process
of reformatting for print - I wish I had a tool to format source code
automatically but I don't. The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropic behaviour" in an arrogant
manner.

I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.

I think you should fix that and the other clear errors[1] that have been
pointed out before you call other people arrogant. You say in the
introduction that bugs can be "corrected very quickly", but you don't
seem to have corrected anything.

I would suggest using splint (http://www.splint.org/) if you haven't
done so already.

I wouldn't. Many of the warning it finds are bogus. It claims that
in long k = 1; printf("%g\n", k / 2.0) the second argument to
printf() has a wrong type. It complains that the controlling
expression of if(!ferror(stdout)) isn't Boolean, and suggests me
a way to disable warnings when pointers are used as controlling
expressions. It complains about "assigning a character to an int"
when I write i = '\n', but then it remembers "anyway it's safe,
since actually '\n' has type int". If I turn up the warning level
enough, it will also complain about the signed operands in (1<<8)
in the macro expansion of isspace(x) found in ctype.h.
Or better, use it, but do not take what it says too seriously.
--
Army1987 (Replace "NOSPAM" with "email")
No-one ever won a game by resigning. -- S. Tartakower

Aug 14 '07 #36

CBFalconer

Malcolm McLean wrote:

"Ben Bacarisse" <be********@bsb.me.ukwrote in message

.... snip ...

>
>I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.

Yes, that is another bug. fixedallocate should have an if(answer)
before setting top to answer->next. Why not say that instead of
all that vague talk about catastrophic behaviour when the hash
table gets full? (The other bug complaint about the code getting
stuck in a loop when the table gets full is false. Which I don't
mind, we can all make mistakes, but at least admit that you are
wrong. Otherwise people wonder if you really understand hash
tables at all.)

I haven't looked at your code, but it appears to be bug infested
from the comments here. Take a look at my hashlib package, which
has been out for some years, and I have had zero bug reports since
the first revision (and that was for a minor memory loss on
closing). Very stable, and purely standard C content. See:

<http://cbfalconer.home.att.net/download>

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 14 '07 #37

CBFalconer

Richard Heathfield wrote:

CBFalconer said:
>ravi wrote:
>>>
can anybody tell me that which ds will be best suited to implement
a hash table in C/C++

I am not sure what your question is.

It seems pretty clear to me. But he needs to decide what language
he's writing in before he starts worrying about difficult stuff.

>However, a complete, and
portable, hash table implementation under GPL

...does not answer the question, any more than "what's the best kind
of honey?" is answered by "here, have one of my honey sandwiches".

What question? What's a 'ds'? What's "C/C++"? In c.l.c I think
we can assume the language is intended to be C.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 14 '07 #38

CBFalconer

Richard Heathfield wrote:

>

.... snip ...

>
For example, consider a hash table with ten elements, of which
elements 0, 3, 4, 5, 8 and 9 are full (which leaves elements 1, 2,
6 and 7 empty, so the table is only 60% full).

Now try to add an element at index 4, using quadratic probing, and
you will find yourself trying elements 4, 5, 8, 3, 0, 9, 0, 3, 8,
5, 4, 5, 8, 3, 0, 9, 0, 3, 8, 5, 4, 5, ... ad nauseam.

Using a prime number guarantees that you won't get into this cycle
unless the table is actually full (which you can check separately,
in advance).

Primes aren't just an advantage for quadratic probing - they're
essential.

That's a nice, compact, and clear explanation. I tried explaining
this to someone some time ago (1 year or more) without success. He
continued to insist that non-prime sizes were suitable.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 14 '07 #39

Richard Heathfield

CBFalconer said:

Richard Heathfield wrote:
>CBFalconer said:

<snip>

>>However, a complete, and
portable, hash table implementation under GPL

...does not answer the question, any more than "what's the best kind
of honey?" is answered by "here, have one of my honey sandwiches".

What question?

It seems that he wishes to know what primitive data structures would be
most suitable for implementing a hash table. Elsethread, I have
answered this question already, so I won't repeat the answer here.

What's a 'ds'?

The consensus appears to be that he means "data structure".

What's "C/C++"? In c.l.c I think
we can assume the language is intended to be C.

You have answered your own question. :-)

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Aug 14 '07 #40

user923005

On Aug 13, 11:24 am, CBFalconer <cbfalco...@yahoo.comwrote:

Richard Heathfield wrote:

... snip ...

For example, consider a hash table with ten elements, of which
elements 0, 3, 4, 5, 8 and 9 are full (which leaves elements 1, 2,
6 and 7 empty, so the table is only 60% full).

Now try to add an element at index 4, using quadratic probing, and
you will find yourself trying elements 4, 5, 8, 3, 0, 9, 0, 3, 8,
5, 4, 5, 8, 3, 0, 9, 0, 3, 8, 5, 4, 5, ... ad nauseam.

Using a prime number guarantees that you won't get into this cycle
unless the table is actually full (which you can check separately,
in advance).

Primes aren't just an advantage for quadratic probing - they're
essential.

That's a nice, compact, and clear explanation. I tried explaining
this to someone some time ago (1 year or more) without success. He
continued to insist that non-prime sizes were suitable.

This assumes that quadratic probing is the best overflow handler.

I have found that hash tables which are an integral power of 2 in size
have definite advantages in some situations.
To calculate the appropriate address you can simply do this
(simplified example):

struct hashtab hash_table[1u << 24];
unsigned long long mask = (1u << 24)-1;

....

struct hashtab *entry = &hash_table[current_hash & mask];

It comes up a lot in game programming because we don't worry about
overflow. If a new entry looks better than an old one, we simply
overwrite it.

And when we need to store collision entries we can either roll the egg
out of the nest (cuckoo), or use hash tables of skiplists.

Of course, every different scheme has some advantages and
disadvantages. I did not read the book in question, but it sounds
like it could have used better editing.

Aug 14 '07 #41

Eric Sosman

user923005 wrote On 08/14/07 15:01,:

[...]

I have found that hash tables which are an integral power of 2 in size
have definite advantages in some situations.
To calculate the appropriate address you can simply do this
(simplified example):

struct hashtab hash_table[1u << 24];
unsigned long long mask = (1u << 24)-1;

Just a couple side-issues, not really related to hash
tables as such:

1) On a system whose ints have 16,17,...,24 bits, the
expression `1u << 24' yields undefined behavior,
not the intended 16777216. Suggestion: Change
`1u' to `1uL' in both places, or write 0x1000000
instead.

2) The range of `unsigned long long' vastly exceeds
the desired mask value of 16777215, and its use
may impose performance penalties. (Even on systems
with 64-bit hardware, you may wind up using two
CPU registers instead of just one.) Suggestion:
Change `unsigned long long' to `unsigned long', or
even to `unsigned int' if UINT_MAX suffices.

(I'm sure user923005 is aware of these matters, but
less-experienced readers of his "simplified example" may
not be.)

--
Er*********@sun.com

Aug 14 '07 #42

Malcolm McLean

"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...

Malcolm McLean wrote:
I haven't looked at your code, but it appears to be bug infested
from the comments here. Take a look at my hashlib package, which
has been out for some years, and I have had zero bug reports since
the first revision (and that was for a minor memory loss on
closing). Very stable, and purely standard C content. See:

There are a few glitches which will come out in edition 2.

There are also some design issues, like the avoidance of size_t.

The code is meant to illustrate the algorithm, so that people who don't know
what a hash table is can see how one works. It is not meant to be an
industrial-strength application. If you want a drop-in hashtable, I'd
recommend your package over mine. They are not meant to do the same job.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 14 '07 #43

Malcolm McLean

"Mark McIntyre" <ma**********@spamcop.netwrote in message
news:k1********************************@4ax.com...

On Mon, 13 Aug 2007 21:40:52 +0100, in comp.lang.c , "Malcolm McLean"
<re*******@btinternet.comwrote:

>>"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>>>
I am sorry that my report was not clear. I am not reacting to this
topic rationally.

It's surprising how often people react like that. I get similar
accusations
all the time about 12 Common Atheist Arguments (refuted).

<OT>
What, you mean as opposed to the irrationality of the book itself?
</ot>

As you might expect a lot of atheists are very hostile to my religious
books. What is interesting is how similar the response is to Basic
Algorithms. They are quite different subjects, and there is no reason why
someone who agrees with me on Christiamity should see eye to eye on
programming matters. But the things said are almost identical - I regularly
get demands to withdraw the book because it doesn't contain a definitive
proof of God's existence, for instance (it doesn't claim to, it refutes 12
Common Atheist Arguments, not the same thing as proving Christianity to be
true). In case of Basic Algorithms the pretext is technical, of course, but
I think the basic motive is the same. People see a book as something
socially unacceptable.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 14 '07 #44

user923005

On Aug 14, 12:36 pm, Eric Sosman <Eric.Sos...@sun.comwrote:

user923005 wrote On 08/14/07 15:01,:

[...]

I have found that hash tables which are an integral power of 2 in size
have definite advantages in some situations.
To calculate the appropriate address you can simply do this
(simplified example):

struct hashtab hash_table[1u << 24];
unsigned long long mask = (1u << 24)-1;

Just a couple side-issues, not really related to hash
tables as such:

1) On a system whose ints have 16,17,...,24 bits, the
expression `1u << 24' yields undefined behavior,
not the intended 16777216. Suggestion: Change
`1u' to `1uL' in both places, or write 0x1000000
instead.

2) The range of `unsigned long long' vastly exceeds
the desired mask value of 16777215, and its use
may impose performance penalties. (Even on systems
with 64-bit hardware, you may wind up using two
CPU registers instead of just one.) Suggestion:
Change `unsigned long long' to `unsigned long', or
even to `unsigned int' if UINT_MAX suffices.

(I'm sure user923005 is aware of these matters, but
less-experienced readers of his "simplified example" may
not be.)

Excellent observations.
Aside:
In the case of chess programs, 32 bit hash values are not large
enough.
If we compute 1 million nodes per second (not at all unusual) and
analyze for correspondence chess (e.g. 24 hour search = 86400 seconds)
there will be a huge number of collisions. With a 64 bit hash, we can
easily detect them.
We use the bottom n-bits to compute the hash, and then the top n-bits
are stored in the table as a key-check. Only if both the address and
key-check bits match do we conclude that we have found a match.
Because of the 64 squares on a chessboard, bitboard representations
are quite common and so most of the math is using 64 bit operations
anyway. Hence many modern chess program benefit mightily from 64 bit
operating systems and compilers.

Aug 14 '07 #45

Keith Thompson

"Malcolm McLean" <re*******@btinternet.comwrites:
[...]

As you might expect a lot of atheists are very hostile to my religious
books. What is interesting is how similar the response is to Basic
Algorithms. They are quite different subjects, and there is no reason
why someone who agrees with me on Christiamity should see eye to eye
on programming matters. But the things said are almost identical - I
regularly get demands to withdraw the book because it doesn't contain
a definitive proof of God's existence, for instance (it doesn't claim
to, it refutes 12 Common Atheist Arguments, not the same thing as
proving Christianity to be true). In case of Basic Algorithms the
pretext is technical, of course, but I think the basic motive is the
same. People see a book as something socially unacceptable.

I think that's an extreme case of wishful thinking on your part.

I won't discuss your "12 Common Atheist Arguments" book here, both
because I haven't read it and because it's about as far off-topic as
anything I can imagine.

I don't believe that anybody has any objection to the idea of a book
on basic algorithms. There is absolutely nothing "socially
unacceptable" about such a book. People are objecting to the errors
in your book. If you had written a *good* book on basic algorithms,
nobody would complain.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Aug 14 '07 #46

Kelsey Bjarnason

[snips]

On Tue, 14 Aug 2007 20:57:50 +0100, Malcolm McLean wrote:

As you might expect a lot of atheists are very hostile to my religious
books.

If they're on a par with your programming books, no wonder.

What is interesting is how similar the response is to Basic
Algorithms. They are quite different subjects, and there is no reason why
someone who agrees with me on Christiamity should see eye to eye on
programming matters.

Er... the two have bugger all to do with each other. I don't care if the
coder next to me is Christian, Wiccan, Atheist or what; I care whether I
can work with him and whether his code is any good.

Christianity to be true). In case of Basic Algorithms the pretext is
technical, of course, but I think the basic motive is the same.

Rejecting bogus tripe? You're probably right.

Aug 14 '07 #47

Ben Pfaff

user923005 <dc*****@connx.comwrites:

I am thinking about writing a book about hybrid algorithms.
If I can ever find the time.

I think that I could write an entire book about implementing
linked lists in C, along the lines of GNU libavl. Not sure that
anyone would read it though.
--
Ben Pfaff
http://benpfaff.org

Aug 14 '07 #48

user923005

On Aug 14, 4:04 pm, Ben Pfaff <b...@cs.stanford.eduwrote:

user923005 <dcor...@connx.comwrites:
I am thinking about writing a book about hybrid algorithms.
If I can ever find the time.

I think that I could write an entire book about implementing
linked lists in C, along the lines of GNU libavl. Not sure that
anyone would read it though.

I promise to buy a copy.

Aug 14 '07 #49

CBFalconer

Eric Sosman wrote:

Malcolm McLean wrote:

.... snip ...

>>
The objection is to a regular publishing such a book.
I have written a good book.

You have done no such thing. I have read good books, and I know.

... snip all further elucidation ...

I suspect you need go no further to make an enemy.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 15 '07 #50

How to implement a Hash Table in C

Similar topics