473,699 Members | 2,912 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to implement a Hash Table in C

Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

Aug 11 '07
139 14177

"Ben Bacarisse" <be********@bsb .me.ukwrote in message
news:87******** ****@bsb.me.uk. ..
"Malcolm McLean" <re*******@btin ternet.comwrite s:
>"Ben Bacarisse" <be********@bsb .me.ukwrote in message
news:87******* *****@bsb.me.uk ...
>>"Malcolm McLean" <re*******@btin ternet.comwrite s:

"ravi" <dc**********@g mail.comwrote in message
news:11***** *************** **@i13g2000prf. googlegroups.co m...
Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++
>
thanx. in advanced
>
Which ds?
You can read all about hash tables in my book, Basic Algorithms. The
hash tables chapter is free.

I think it should be pointed out that your hash table is not like
most. Most hash tables are dynamic structures that can grow as data
are added. Yours is fixed. The interface should have a clear warning
that the table fails (silently) if an attempt is made to add an item
beyond the initial capacity.

The point I made before (that the function loop indefinitely when the
table becomes full) is still true, but I now see that the add function
will silently fail long before that point is reached. If this limit
is by design, I think the text should make that limitation very clear.
I think your understanding of hash tables might need to be raised a
notch.

If you are going to be like that...

<snip>
>>It is odd (and probably very confusing to a beginner) that the add
function seems to return a success/fail indication, but that it does
not use it to signal this disastrous case.
The "disastrous case" can't happen. The fixed allocator will run out
of memory and return null when capity is reached. Please avoid this
sort of hyperbole.

OK. Less hyperbole. Your code exhibits undefined behaviour. You
'dereference' a null pointer. On my system I get a segmentation
violation before hash_add gets any say in the matter. If or when you
find the error, you will say it is a detail, a typo, whatever. I
don't think you worry about details like this.
Obviously if you say things like "the function will not return at all when
the table is full" you cannot understand that a hash table (to be pedantic,
one without "perfect hashing") cannot be more than about 88% full before the
algorithm becomes pathological. In fact I pass in a "capacity" argument
which is half table size.
I think you need to read the chapter again, as a pupil rather than as a
critic.

As for the derefencing null pointer, that will happen if the constructor
fails, and return NULL. That is standard behaviour throughout the book. If
objects cannot be created, the constructing function returns a null pointer
to show they have failed.
I couldn't see another. The code has been tested, but only on two systems (a
UNIX mainframe and Windows) so that doesn't mean no errors remain, and of
course typographical errors do creep in in the process of reformatting for
print - I wish I had a tool to format source code automatically but I don't.
The normal thing when you an error such as dereference of a null is to say
"your code derererences a null" rather than to talk airly about "catastropi c
behaviour" in an arrogant manner.
>
Do you care, for example, that your statement:

The test for primality is interesting. The only absolutely sure way
of determining whether a number is prime, as far as is known, is to
attempt to divide by all prime numbers below or equal to its square
root.

is not true? Probably not. I will try, in future, to leave you alone.
You mean randomised primality testing? It's not absolutely sure.
I am no mathematician, maybe you could tell us your method?

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 12 '07 #11
On Aug 11, 7:02 pm, "Malcolm McLean" <regniz...@btin ternet.comwrote :
"Ben Bacarisse" <ben.use...@bsb .me.ukwrote ...
I think your understanding of hash tables might need to be raised a
notch.
Hash tables are frequently discussed in this and "nearby"
ngs, but it is clear that most programmers are unaware of
innovations like Cuckoo Hash or "Compact Hash", etc.
These aren't just theoretical curiosities but have important
advantages over hash tables described in old textbooks.
you cannot understand that a hash table (to be pedantic,
one without "perfect hashing") cannot be more than
about 88% full before the algorithm becomes pathological.
Cuckoo hash, just to give one example, can operate
well above 88%. ("Pathology" is a matter of semantics
since many methods can encounter O(n) behavior even
at low utilization, with non-zero probability,
due to very bad "luck".) Cuckoo hash, BTW, has
another very important space-savings advantage over
ordinary open-address hashing. I'll leave the
identification of this advantage as an exercise.... :-)

And popular "chained hashing" methods simply do
not suffer from the pathology you imply.
The test for primality is interesting.
Very interesting, though I know little about it.
The only absolutely sure way
of determining whether a number is prime, as far as is known, is to
The largest *certainly* known non-Marsenne prime is
19249*2^1301858 6 + 1
a number with almost 4,000,000 digits.
I've no idea how its primality was proved,
but if they...
attempt to divide by all prime numbers below or equal to its square
root.
.... they would have had to perform about 10^2000
long divisions (very long!). Assuming they'd
recruited a googol monkeys, and each monkey
could do a googol divisions per year, they'd
need 10^1800 years to prove primality your way!

I'm no primality expert, but it took me only a
few seconds at Wikipedia to discover:
The first deterministic primality test significantly
faster than the naÔve methods was the cyclotomy test;
its runtime can be proven to be
O((log n) ^ clog log log n),
where n is the number to test for primality and
c is a constant independent of n.
Wikipedia (and Google) is your friend! The bad rep
Wikipedia has in recent media reports does not apply,
I think, to most science pages.

I didn't invent Cuckoo Hash (wish I had), learned
of it after Googling in a dialog with another computer
scientist (who hadn't heard of Cuckoo either, even
though he wrote papers on Hash Tables!)

I don't think I'm the only one whose posting
quality would go up if time wasted on Usenet
invective were spent instead on research.

James Dow Allen



Aug 12 '07 #12
In article <AP************ *********@bt.co m>,
Malcolm McLean <re*******@btin ternet.comwrote :
>You mean randomised primality testing? It's not absolutely sure.
I am no mathematician, maybe you could tell us your method?
Google for "primality testing". There are deterministic tests faster
than testing all the primes < sqrt(n), and the probabilistic tests can
be used to reduce the probability as much as you like (for example, to
less than the chance of your computer being struck by lightning).

-- Richard
--
"Considerat ion shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Aug 12 '07 #13
"James Dow Allen" <jd*********@ya hoo.comwrote in message
news:11******** **************@ x40g2000prg.goo glegroups.com.. .
On Aug 11, 7:02 pm, "Malcolm McLean" <regniz...@btin ternet.comwrote :
"Ben Bacarisse" <ben.use...@bsb .me.ukwrote ...
I think your understanding of hash tables might need to be raised a
notch.
>Hash tables are frequently discussed in this and "nearby"
ngs, but it is clear that most programmers are unaware of
innovations like Cuckoo Hash or "Compact Hash", etc.
These aren't just theoretical curiosities but have important
advantages over hash tables described in old textbooks.
> you cannot understand that a hash table (to be pedantic,
one without "perfect hashing") cannot be more than
about 88% full before the algorithm becomes pathological.
>Cuckoo hash, just to give one example, can operate
well above 88%. ("Pathology" is a matter of semantics
since many methods can encounter O(n) behavior even
at low utilization, with non-zero probability,
due to very bad "luck".) Cuckoo hash, BTW, has
another very important space-savings advantage over
ordinary open-address hashing. I'll leave the
identification of this advantage as an exercise.... :-)

And popular "chained hashing" methods simply do
not suffer from the pathology you imply.
The book is Basic Algorithms. I am aware that there are more sophisticated
methods of hashing, though I am not an expert in them.
However someone who doesn't understand that a simple hash table breaks down
when it becomes too full obviously needs their understanding even of the
basic algorithm raised a notch, as I said.
>
> The test for primality is interesting.

Very interesting, though I know little about it.
There are certain advantages in making the hash table length prime. However
it is a chapter on hash tables, not prime numbers. So I present a naive test
but mention that there are better ones.
In fact I was wrong about the naive test being the only deterministic one.
Sorry about that. There's no chapter on number theory, in case you are
wondering.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 12 '07 #14
On Aug 11, 4:29 am, santosh <santosh....@gm ail.comwrote:
ravi wrote:
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

The hash table is itself a data structure. It works in conjunction with
hash function.
It sounds like he is asking what *other* data structures could be used
to implement it. Reallocatable arrays, pointers and structs seems
like the best choices to me. :) Yeah, I know, that's basically most
of what C has, but it seems they are all necessary to solve the
problem anyways. He also asked about C++, in which case you might
like to use a vector instead of a reallocatable array, but that's
apparently off topic here anyways.

The real magic of a hash table, of course, is the internal mapping
structure, the hash function and the operating methods that you decide
to allow for the hash table. But he didn't ask about that.

--
Paul Hsieh
http://www.pobox.com/~qed/hash.html
http://bstring.sf.net/

Aug 12 '07 #15
Malcolm McLean wrote:
>
Obviously if you say things like "the function will not return at all
when the table is full" you cannot understand that a hash table (to be
pedantic, one without "perfect hashing") cannot be more than about 88%
full before the algorithm becomes pathological.
Nonsense. Utter uninformed nonsense. "Hash table" covers
a wide variety of data structures and associated algorithms,
with different characteristics . There's an excellent chapter
in TAOCP, and from the above quote there's no evidence that
you've understood any of it.
I think you need to read the chapter again, as a pupil rather than as a
critic.
I have subjected myself to about one and a half chapters
of your awful book already, and will not willingly read more.
The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropi c behaviour" in an arrogant manner.
Is dereferencing a null pointer provably non-catastrophic?
Is it even "proBably" non-catastrophic? What guarantees apply
to undefined behavior that might limit its damage? What part
of "undefined" are you having trouble with? What part of "un-"
are you failing to grasp?
I am no mathematician, [...]
I'd guessed as much.

--
Eric Sosman
es*****@ieee-dot-org.invalid
Aug 12 '07 #16
"Malcolm McLean" <re*******@btin ternet.comwrite s:
As for the derefencing null pointer, that will happen if the
constructor fails, and return NULL. That is standard behaviour
throughout the book. If objects cannot be created, the constructing
function returns a null pointer to show they have failed.
I couldn't see another. The code has been tested, but only on two
systems (a UNIX mainframe and Windows) so that doesn't mean no errors
remain, and of course typographical errors do creep in in the process
of reformatting for print - I wish I had a tool to format source code
automatically but I don't. The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropi c behaviour" in an arrogant
manner.
I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.

I think you should fix that and the other clear errors[1] that have been
pointed out before you call other people arrogant. You say in the
introduction that bugs can be "corrected very quickly", but you don't
seem to have corrected anything.

One reason I have got so worked up about "details" is that your post
was in c.l.c. The things that I object to most strongly in the book
have nothing to do with C and so a lot of my objections get magnified
through those that *are* topical here. I will try to leave the
subject alone. I think we both know how the other feels about this
subject.
>Do you care, for example, that your statement:

The test for primality is interesting. The only absolutely sure way
of determining whether a number is prime, as far as is known, is to
attempt to divide by all prime numbers below or equal to its square
root.

is not true? Probably not. I will try, in future, to leave you alone.
You mean randomised primality testing? It's not absolutely sure.
No. I mean simply that it is not a true statement. There are other
non-probabilistic methods for testing for primality.
I am no mathematician, maybe you could tell us your method?
None is "my" method. They are all easy to find via a quick web
search. The two most famous are the recent Agrawal, Kayal and Saxena
algorithm (that finally proved that primality testing was in P as had
been suspected) and the much older (and currently the fastest) method
based on the work of Adleman, Pomerance and Rumely, improved by Cohen
and Lenstra (this later is sometimes called the APR-CL test).

[1] By "clear error" I mean things that are not a matter of style or
pedagogy. My list is:

(1) redundant (i.e. provably true) if (ptr str) in strtrim

(2) array[y*10=x];

(3) if (8str == '}') in checkbalanced

(4) incorrect quotes on some char constants in the above function
(only visible in the book version)

(5) int expy;; in floating add (not an error, just an obvious typo).

(6) if(x & 0x7FFFFFFFF < y & 0x7FFFFFFF) in same (two errors here).

(7) if(tail == 100] in add (for a queue)

(8) no test for queue being full or queue being empty

(9) if(ptr == room->baddies) room->baddies = ptr; in removebaddy

(10) missing ";" after assert( x 0.0) squareroot

(11) null pointer dereference in fixedallocate

(12) The statement above about prime testing.

--
Ben.
Aug 12 '07 #17
On Sun, 12 Aug 2007 09:09:46 -0400, in comp.lang.c , Eric Sosman
<es*****@ieee-dot-org.invalidwrot e:
>Malcolm McLean wrote:
>>
The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropi c behaviour" in an arrogant manner.

Is dereferencing a null pointer provably non-catastrophic?
I guess the point is, when you're commenting on something its
generally considered more productive to use non-pejorative language if
you want to cause the writer to change. Obviously if your only
intention is to highlight the error and complain about the author, the
other method works fine...
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
Aug 12 '07 #18

"Ben Bacarisse" <be********@bsb .me.ukwrote in message
news:87******** ****@bsb.me.uk. ..
"Malcolm McLean" <re*******@btin ternet.comwrite s:
>As for the derefencing null pointer, that will happen if the
constructor fails, and return NULL. That is standard behaviour
throughout the book. If objects cannot be created, the constructing
function returns a null pointer to show they have failed.
I couldn't see another. The code has been tested, but only on two
systems (a UNIX mainframe and Windows) so that doesn't mean no errors
remain, and of course typographical errors do creep in in the process
of reformatting for print - I wish I had a tool to format source code
automaticall y but I don't. The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropi c behaviour" in an arrogant
manner.

I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.
Yes, that is another bug.
fixedallocate should have an if(answer) before setting top to answer->next.
Why not say that instead of all that vague talk about catastrophic behaviour
when the hash table gets full?
(The other bug complaint about the code getting stuck in a loop when the
table gets full is false. Which I don't mind, we can all make mistakes, but
at least admit that you are wrong. Otherwise people wonder if you really
understand hash tables at all.)

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 12 '07 #19
Malcolm McLean wrote, On 12/08/07 19:22:
>
"Ben Bacarisse" <be********@bsb .me.ukwrote in message
news:87******** ****@bsb.me.uk. ..
>"Malcolm McLean" <re*******@btin ternet.comwrite s:
>>As for the derefencing null pointer, that will happen if the
constructor fails, and return NULL. That is standard behaviour
throughout the book. If objects cannot be created, the constructing
function returns a null pointer to show they have failed.
I couldn't see another. The code has been tested, but only on two
systems (a UNIX mainframe and Windows) so that doesn't mean no errors
remain, and of course typographical errors do creep in in the process
of reformatting for print - I wish I had a tool to format source code
automatical ly but I don't. The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropi c behaviour" in an arrogant
manner.

I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.
Yes, that is another bug.
fixedallocate should have an if(answer) before setting top to answer->next.
So fix it and all the other reported bugs. You should also add a change
record to the book so that people know what you have fixed.
Why not say that instead of all that vague talk about catastrophic
behaviour when the hash table gets full?
Perhaps he thought you should actually test the code in your book when
you are charging for it? Proper testing would have included filling the
hash table.
(The other bug complaint about the code getting stuck in a loop when the
table gets full is false. Which I don't mind, we can all make mistakes,
but at least admit that you are wrong. Otherwise people wonder if you
really understand hash tables at all.)
Actually, I would say it shows that your implementation makes it harder
for people to see how the algorithm works. A decent teaching
implementation of a hash would, IMHO, include the check rather than
relying on how another non-standard (although provided) function behaves.
--
Flash Gordon
Aug 12 '07 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1370
by: Yat | last post by:
I want to use hash table in C. Have any build-in library in C so that I can use it without implement hash table myself? Thx
24
4302
by: kdotsky | last post by:
Hello, I am using some very large dictionaries with keys that are long strings (urls). For a large dictionary these keys start to take up a significant amount of memory. I do not need access to these keys -- I only need to be able to retrieve the value associated with a certain key, so I do not want to have the keys stored in memory. Could I just hash() the url strings first and use the resulting integer as the key? I think what I'm...
0
8705
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, weíll explore What is ONU, What Is Router, ONU & Routerís main usage, and What is the difference between ONU and Router. Letís take a closer look ! Part I. Meaning of...
0
9054
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8941
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8896
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6546
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Duprť who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5879
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4637
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2362
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2015
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.