How to implement a Hash Table in C - Page 12

ravi

Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

Aug 11 '07

Subscribe Reply

139

14232

« First
<
2
10
11
12
13
14
>

websnarf

On Aug 18, 11:37 am, CBFalconer <cbfalco...@yah oo.comwrote:

websn...@gmail. com wrote:
CBFalconer <cbfalco...@yah oo.comwrote:
Richard Harter wrote:

... snip ...

>[...] that always using the same probe increment in a
power of two table size also has bad properties. His
solution is to use the unused bits of the full hash to
select an increment size from a table of primes.

That is dangerous. The purpose of the multiple hash is to
have different search patterns for different objects that
have the same primary hash pattern.

No, its to have a different search pattern for different object
that have the same primary hash *INDEX* (i.e., post masking).
This is obvious since, for any good hash function, index
collisions will be many many times more common than full hash
function collisions.

Mathematically speaking, on average a single 32-bit (good) hash
function collision will occur roughly when 65536 entries appear,
while a hash index collision will occur roughly when the square
root of the table size number of entries occur (so for a 9
million entry table, that's just 3000 entries).

No, you are ignoring the size of the hash table itself. For
example, the starting size for hashlib is 17, so no more than 17
hash values (modulo 17) are useful. This changes as the table
enlarges itself.

Huh? Well of course I am ignoring small hash tables -- *ALL*
solutions are basically equivalent for small enough hash tables. The
problem is only interesting when there is a performance impact, which
will happen as the hash table holds thousands of entries.

I think you are misreading my language (I am exercising serious
restraint here) a hash function collision is not the same as a hash
index collision. This distinction matters for completely optimized
hash table implementations .

This avoids clustering, and tends to keep
the search chain short. The very iffy thing in hashlib is
the reduction of the range of the secondary hash, which
encourages light clustering in nearly empty tables.

You are ignoring the fact that skip values of 2, 4 and 6, for
example, for indexes starting on even offsets near each other
will tend to collide with each other as the table fills up.

Same skip-value, and commensurate offset collisions must be
minimized by increasing the chance they miss each other. This
happens most easily if the skip values are predominantly
co-prime, which is easiest to construct when they are simply
taken from a list of prime numbers.

Again, you are ignoring the effect of varying table size.

I am not. I am concerned with asymptotic size of hash tables which
are large enough to matter (a few thousand entries) otherwise the
problem is too small to be concerned with the details about (you just
need to make sure it works).

Perhaps you don't understand the basic mathematics. If you increase
the size of the table, the probability that two randomly chosen skip
values will not be coprime does *NOT* tend towards zero. The number
of pathological cases increases with the increase in table size. Thus
a strategy that just picks random skip values via another hash or
whatever, does not directly address the inherent inter-collision
problem.

You need to pick coprime (between each other) skip values if you use
linear probing. The most sensible way of doing this is to choose from
a set of primes.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 19 '07 #111

Chris Dollin

Richard Heathfield wrote:

Malcolm McLean said:

>>
"CBFalconer " <cb********@yah oo.comwrote in message
news:46******* ********@yahoo. com...
>>>

<snip>

>>"ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.

It's hard to read.

Does anyone else here apart from Malcolm have difficulty reading it?
Hands up please.

I'll put my hand up, just to say "Not me sir! Easy peasy sir!
Fits neatly on the eyeball and can be checked without context!
Win win, sir!"

Of course anyone who's using a remotely syntax-colouring editor
won't think that `sizeof` is an ordinary left operand of a
multiplication; nor will anyone who already knows what `sizeof`
means.

It would be really nice to have empirical evidence either way,
of course, since we're well into anecdotage here.

--
Aged Aged Hedgehog
"Who do you serve, and who do you trust?" /Crusade/

Aug 19 '07 #112

Malcolm McLean

"Flash Gordon" <sp**@flash-gordon.me.ukwro te in message
news:62******** ****@news.flash-gordon.me.uk...

>>Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.

It's hard to read.

A lot of newbies here don't find it hard once it is pointed out to them.

>Purely syntactical.

Only if you consider "ptr = malloc(sizeof(t ype))" to be purely
syntactical.

The objection to

ptr = malloc(sizeof *ptr);

is purely syntactical. As Keith said, if sizeof were spelt $ and given
normal operator rules for whitespace

ptr = malloc($*ptr);

would be OK. The * wouldn't look like multiplication of an identifier.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 19 '07 #113

Malcolm McLean

"Flash Gordon" <sp**@flash-gordon.me.ukwro te in message
news:md******** ****@news.flash-gordon.me.uk...

You should also get your book reviewed somewhere like comp-programming
where the algorithms side will get properly reviewed, instead of the C
issues people concentrate on here.

That will happen.
The practical problem is that people focus on little things like ] instead
of ) for a close, or a missing check for null, because they are unambiguous.
However they are relatively easy to fix. The more serious criticisms come
out later - the car park maybe isn't the best metaphor for a hash table.
So it's best to get C language issues and size_t wars out of the way first.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 19 '07 #114

Richard Harter

On Fri, 17 Aug 2007 13:32:51 -0400, CBFalconer
<cb********@yah oo.comwrote:

>Richard Harter wrote:
>Eric Sosman <es*****@ieee-dot-org.invalidwrot e:
>>James Dow Allen wrote:

[...]
Speaking of Falconer, his method (linear probing with second
hash value used for probe increment) *requires* (even more so
than quadratic probing, see upthread) prime table size.

Well, not exactly: It requires only that the increment
between probes be relatively prime to the table size. A prime
table size makes this criterion easy to satisfy, but it's not
the only way. A table size equal to a power of two, for example,
works just fine if the probe increment is odd. (Degenerate
special case: pure linear probing with an increment of unity
works with any table size at all, prime or composite.)

In a thread in comp.programmin g (IIRCC) Paul Hsieh pointed out
that always using the same probe increment in a power of two
table size also has bad properties. His solution is to use the
unused bits of the full hash to select an increment size from a
table of primes.

That is dangerous. The purpose of the multiple hash is to have
different search patterns for different objects that have the same
primary hash pattern. This avoids clustering, and tends to keep
the search chain short. The very iffy thing in hashlib is the
reduction of the range of the secondary hash, which encourages
light clustering in nearly empty tables. The system attempts to
maintain tables at between 44 and 88% of capacity.

Also, I want the table action to be independent of the hash
functions (apart from speed), so the functions are computed every
time needed. The system functions with all the hash functions
replaced by the constant 1 (unity) (tested in the test suite) at a
considerable speed penalty. So the action is hash function
independent, but the speed is highly dependent.

Here is a little clue: Discussions of hash tables are not
confined to your particular implementation.

>
The system is intended to isolate all these decisions in the hash
table code, so the user need not worry about them. Similarly the
algorithms for table expansion, etc. The result is a storage and
retrieval black box.

Aug 19 '07 #115

Kelsey Bjarnason

On Sun, 19 Aug 2007 09:06:23 +0100, Malcolm McLean wrote:

"Flash Gordon" <sp**@flash-gordon.me.ukwro te in message
news:62******** ****@news.flash-gordon.me.uk...

>>>Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.

It's hard to read.

A lot of newbies here don't find it hard once it is pointed out to them.

>>Purely syntactical.

Only if you consider "ptr = malloc(sizeof(t ype))" to be purely
syntactical.

The objection to

ptr = malloc(sizeof *ptr);

is purely syntactical.

It is? Why? Is there a syntax error in the code?

Aug 19 '07 #116

Kelsey Bjarnason

On Sat, 18 Aug 2007 21:14:44 +0000, Richard Heathfield wrote:

Malcolm McLean said:

>>
"CBFalconer " <cb********@yah oo.comwrote in message
news:46******* ********@yahoo. com...
>>>

<snip>

>>"ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.

It's hard to read.

Does anyone else here apart from Malcolm have difficulty reading it?

This is one of those things which, like migrating to x++ from x = x + 1,
simply takes getting used to. It is idiomatic in what one might term
"good C code", not to mention around here, but a rank newbie - i.e.
someone who should not be reading MM's book in the first place - might not
grok it immediately.

Once grokked, however, it is a perfectly sensible way to do the job, for
the usual reasons.

Aug 19 '07 #117

Kelsey Bjarnason

[snips]

On Sat, 18 Aug 2007 22:08:12 +0100, Malcolm McLean wrote:

>For contrast, look at my hashlib. It was published complete with
the test suite and test programs that evaluated it during and after
development. I have had two and a half bug reports since the
initial publication, and no more for years.

And there was one I found in Dann's recommended text, admittedly in a code
fragment, and one in Ben Pfaff's circular buffer. Two in my two hash
tables, though one could be described as a feature. So assuming
Poisson-distributed bugs, we're doing about equally. However I am blamed.

Do you see Dann or Ben using ints when size_t is the correct tool to be
used, defending this on the basis that they don't want to code in C and
that, in effect, C should be changed to suit their weird - and objectively
wrong - views?

No, you don't. They have bugs. You have concept failures *plus* bugs.
All non-trivial code has bugs. Code written by people who don't know what
they're doing has bugs *plus* assorted other issues. You never did, for
example, explain the magic values used by squnch, or what magic occurs
when it is passed length values of -1, -137 or -32767. Nor did you
explain how to make it work on a perfectly legitimately-sized buffer
too large to have its size specified by a signed int.

Their code has bugs, so does yours. Yours adds in umpteen different sorts
of unpredictabilit y and confusion, without justification.

Aug 19 '07 #118

Keith Thompson

"Malcolm McLean" <re*******@btin ternet.comwrite s:

"Flash Gordon" <sp**@flash-gordon.me.ukwro te in message
news:62******** ****@news.flash-gordon.me.uk...

>>>Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.

It's hard to read.

A lot of newbies here don't find it hard once it is pointed out to them.

>>Purely syntactical.

Only if you consider "ptr = malloc(sizeof(t ype))" to be purely
syntactical.

The objection to

ptr = malloc(sizeof *ptr);

is purely syntactical. As Keith said, if sizeof were spelt $ and given
normal operator rules for whitespace

ptr = malloc($*ptr);

would be OK. The * wouldn't look like multiplication of an identifier.

As I said, if you really think it's such a problem, you *can* put
partentheses around the operand:

ptr = malloc(sizeof(* ptr));

Do you have any objection, "syntactica l" or otherwise, to that form?

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Aug 19 '07 #119

Malcolm McLean

"Keith Thompson" <ks***@mib.orgw rote in message
news:ln******** ****@nuthaus.mi b.org...

As I said, if you really think it's such a problem, you *can* put
partentheses around the operand:

ptr = malloc(sizeof(* ptr));

Do you have any objection, "syntactica l" or otherwise, to that form?

I prefer it. It is not uncommon to malloc more than one data item.

so ptr = malloc( N * sizeof *ptr);

if less clear than
ptr = malloc(N * sizeof(*ptr));

unless you are one of those people who likes to put the test in the line

if( ptr = malloc(N * sizeof(*ptr)) );

then some compilers like

if( (ptr = malloc(N * sizeof(*ptr))) )

at which point we've broken the rule of three and lost readbility.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 19 '07 #120

Similar topics

1372

about hash table

by: Yat | last post by:

I want to use hash table in C. Have any build-in library in C so that I can use it without implement hash table myself? Thx

C / C++

4309

don't need dictionary's keys - hash table?

by: kdotsky | last post by:

Hello, I am using some very large dictionaries with keys that are long strings (urls). For a large dictionary these keys start to take up a significant amount of memory. I do not need access to these keys -- I only need to be able to retrieve the value associated with a certain key, so I do not want to have the keys stored in memory. Could I just hash() the url strings first and use the resulting integer as the key? I think what I'm...

Python

9453

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10254

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10099

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

9904

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7451

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5354

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5481

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4007

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2849

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General