469,315 Members | 1,802 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,315 developers. It's quick & easy.

How to implement a Hash Table in C

Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++

thanx. in advanced

Aug 11 '07
139 13214
Malcolm McLean wrote:
"Richard Heathfield" <rj*@see.sig.invalidwrote in message
.... snip ...
>
>His malloc is not ideal - answer = malloc(sizeof *answer); would
be preferable. He repeats this mistake later in the (poorly named)
hashtable() function.

Too hard to read. malloc( sizeof *ptr) is a bad clc ism that you
will seldom see in another environment. The reader has to perform
a mental dereference which makes it harder to see if the size of
correct.
Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 18 '07 #101

"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
Malcolm McLean wrote:
>"Richard Heathfield" <rj*@see.sig.invalidwrote in message
... snip ...
>>
>>His malloc is not ideal - answer = malloc(sizeof *answer); would
be preferable. He repeats this mistake later in the (poorly named)
hashtable() function.

Too hard to read. malloc( sizeof *ptr) is a bad clc ism that you
will seldom see in another environment. The reader has to perform
a mental dereference which makes it harder to see if the size of
correct.

Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.
It's hard to read. Purely syntactical. The indirection operator happens to
be the same as a multiply. All functions take parentheses, except sizeof,
which is a function in the sense that matters - it retuns a value based on
its argument, but not of course in terms of generated code. So you've got
two quirks in one expression.

In compiler terms t is better, of course, because if you change ptr's type
the call will automatically update. But that's an overly narrow
understanding of maintainability which doesn't address the real issue.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm
Aug 18 '07 #102

"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
Flash Gordon wrote:
>Malcolm McLean wrote, On 18/08/07 10:58:
... snip ...
>>
>>Criticisms have been made, some of them indisuptable bug reports,

For which you have not published an errata so that your audience
know about them.

For contrast, look at my hashlib. It was published complete with
the test suite and test programs that evaluated it during and after
development. I have had two and a half bug reports since the
initial publication, and no more for years.
And there was one I found in Dann's recommended text, admittedly in a code
fragment, and one in Ben Pfaff's circular buffer. Two in my two hash
tables, though one could be described as a feature. So assuming
Poisson-distributed bugs, we're doing about equally. However I am blamed.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 18 '07 #103
Malcolm McLean said:
>
"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
>>
<snip>
>"ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.
It's hard to read.
Does anyone else here apart from Malcolm have difficulty reading it?
Hands up please.

<snip>

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Aug 18 '07 #104
Richard Heathfield wrote:
Malcolm McLean said:
>>
"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
>>>

<snip>
>>"ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.
It's hard to read.

Does anyone else here apart from Malcolm have difficulty reading it?
Hands up please.
To be honest this construct threw me the first time I encountered it, which
is of course in clc, but it hasn't bothered me in the slightest after I
looked it up in the FAQ.

C's expressions have to be understood with C's rules and quirks in mind, not
by comparing them to mathematical expressions. I suppose someone new to C
will find it a bit strange, but then this is true for almost all
programming languages. There's a reason they are not classified under the
natural languages class.

Aug 18 '07 #105
CBFalconer wrote, On 18/08/07 19:40:
Malcolm McLean wrote:
>"Richard Heathfield" <rj*@see.sig.invalidwrote in message
... snip ...
>>His malloc is not ideal - answer = malloc(sizeof *answer); would
be preferable. He repeats this mistake later in the (poorly named)
hashtable() function.
Too hard to read. malloc( sizeof *ptr) is a bad clc ism that you
will seldom see in another environment. The reader has to perform
a mental dereference which makes it harder to see if the size of
correct.

Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.
Unless you need enough space for two objects ;-)

Unsurprisingly, I agree with you on this.
--
Flash Gordon
Aug 18 '07 #106
"Malcolm McLean" <re*******@btinternet.comwrites:
"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
>Malcolm McLean wrote:
>>"Richard Heathfield" <rj*@see.sig.invalidwrote in message
... snip ...
>>>
His malloc is not ideal - answer = malloc(sizeof *answer); would
be preferable. He repeats this mistake later in the (poorly named)
hashtable() function.

Too hard to read. malloc( sizeof *ptr) is a bad clc ism that you
will seldom see in another environment. The reader has to perform
a mental dereference which makes it harder to see if the size of
correct.

Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.
It's hard to read. Purely syntactical. The indirection operator
happens to be the same as a multiply. All functions take parentheses,
except sizeof, which is a function in the sense that matters - it
retuns a value based on its argument, but not of course in terms of
generated code. So you've got two quirks in one expression.
No, sizeof is not a function. It's a unary operator, just like "-",
"*", and "&". (If "sizeof" is a function in some abstract sense, then
so is "-", but C doesn't call them functions.) The confusion is
probably caused by the fact that it's the only operator whose symbol
look like an identifier rather than punctuation.

Pretend that 'sizeof' had been spelled '$', and you'll see what I mean.

But if you find
ptr = malloc(sizeof *ptr);
too confusing because it looks too much like a multiplication, you can
write
ptr = malloc(sizeof(*ptr));
applying the operator to a parenthesized expression.
In compiler terms t is better, of course, because if you change ptr's
type the call will automatically update. But that's an overly narrow
understanding of maintainability which doesn't address the real issue.
The real issue is that the clc-approved form (a) is perfectly
understandable once it's explained (something you could do in your
book if you chose), and (b) makes maintenance easier, because you
don't have to keep the target type and the size expression
synchronized manually (with no warning from the compiler if you get it
wrong).

You're right that 'ptr = malloc(sizeof *ptr)' is not common. That's a
pity, because it's *better* than the alternatives, and you have an
opportunity to help teach that.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 18 '07 #107
Malcolm McLean wrote, On 18/08/07 22:02:
>
"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
>Malcolm McLean wrote:
>>"Richard Heathfield" <rj*@see.sig.invalidwrote in message
... snip ...
>>>
His malloc is not ideal - answer = malloc(sizeof *answer); would
be preferable. He repeats this mistake later in the (poorly named)
hashtable() function.

Too hard to read. malloc( sizeof *ptr) is a bad clc ism that you
will seldom see in another environment. The reader has to perform
a mental dereference which makes it harder to see if the size of
correct.

Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.
It's hard to read.
A lot of newbies here don't find it hard once it is pointed out to them.
Purely syntactical.
Only if you consider "ptr = malloc(sizeof(type))" to be purely syntactical.

The indirection operator happens
to be the same as a multiply.
I very much doubt that people would mistake it for being multiplication
in this usage.
All functions take parentheses, except
sizeof, which is a function in the sense that matters - it retuns a
value based on its argument, but not of course in terms of generated
code.
It is only a function in the same sense that a unary minus is a
function. It has one operand and yields a result. The lack of
parenthesis in the "clc recommended" form if anything make it clearer it
is not a function.
So you've got two quirks in one expression.
You have more quirks than that.
In compiler terms t is better, of course, because if you change ptr's
type the call will automatically update. But that's an overly narrow
understanding of maintainability which doesn't address the real issue.
Most of the regulars here have a good few years of experience in
maintaining code written by themselves and others. I personally also
have experience of passing code on (not C code) to be maintained by
others and still being around myself to here what their issues are. So I
think many of us have reason to believe we *do* understand what the
maintainability issues are, and many of us disagree with you on this.
--
Flash Gordon
Aug 18 '07 #108
Richard Heathfield wrote:
>
Malcolm McLean said (in reply to me):
malloc( sizeof *ptr) is a bad clc ism

No, it's a good clc-ism.
(ptr = malloc(N * sizeof *ptr)) is an absolutely great expression;
I use it much.
that you will seldom see in another environment.

Right. As Sturgeon says, at least 90% of everything is crud, so you'd
expect to find malloc(sizeof *ptr) in only 10% of cases (or fewer).
That doesn't mean it's bad. It just means most people haven't come
across it yet.
The reader has to perform a mental dereference which makes it harder
to see if the size of correct.

If the reader has to perform a mental dereference, he (or she) doesn't
understand sizeof. No dereferencing occurs, because sizeof does not
evaluate its argument. Furthermore, the size is *bound* to be correct
for any object pointer type.
For strings, I frequently call malloc this way:
char *string = malloc(length + 1);

--
pete
Aug 18 '07 #109
Richard Heathfield <rj*@see.sig.invalidwrites:
Malcolm McLean said:
>>
"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
>>>

<snip>
>>"ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.
It's hard to read.

Does anyone else here apart from Malcolm have difficulty reading it?
Hands up please.

<snip>
IMO, It is fairly uncommon to see

(sizeof *p)

in legacy code bases and is at odds with "normal" C operators as was
well described by Keith in another post in this thread. I think MM has
a point.

The people reading his algorithms shouldn't need to be C experts to
understand his code.

But don't let that stop you taking another opportunity to proclaim your
genius.
Aug 18 '07 #110
On Aug 18, 11:37 am, CBFalconer <cbfalco...@yahoo.comwrote:
websn...@gmail.com wrote:
CBFalconer <cbfalco...@yahoo.comwrote:
Richard Harter wrote:

... snip ...
>[...] that always using the same probe increment in a
power of two table size also has bad properties. His
solution is to use the unused bits of the full hash to
select an increment size from a table of primes.
That is dangerous. The purpose of the multiple hash is to
have different search patterns for different objects that
have the same primary hash pattern.
No, its to have a different search pattern for different object
that have the same primary hash *INDEX* (i.e., post masking).
This is obvious since, for any good hash function, index
collisions will be many many times more common than full hash
function collisions.
Mathematically speaking, on average a single 32-bit (good) hash
function collision will occur roughly when 65536 entries appear,
while a hash index collision will occur roughly when the square
root of the table size number of entries occur (so for a 9
million entry table, that's just 3000 entries).

No, you are ignoring the size of the hash table itself. For
example, the starting size for hashlib is 17, so no more than 17
hash values (modulo 17) are useful. This changes as the table
enlarges itself.
Huh? Well of course I am ignoring small hash tables -- *ALL*
solutions are basically equivalent for small enough hash tables. The
problem is only interesting when there is a performance impact, which
will happen as the hash table holds thousands of entries.

I think you are misreading my language (I am exercising serious
restraint here) a hash function collision is not the same as a hash
index collision. This distinction matters for completely optimized
hash table implementations.
This avoids clustering, and tends to keep
the search chain short. The very iffy thing in hashlib is
the reduction of the range of the secondary hash, which
encourages light clustering in nearly empty tables.
You are ignoring the fact that skip values of 2, 4 and 6, for
example, for indexes starting on even offsets near each other
will tend to collide with each other as the table fills up.
Same skip-value, and commensurate offset collisions must be
minimized by increasing the chance they miss each other. This
happens most easily if the skip values are predominantly
co-prime, which is easiest to construct when they are simply
taken from a list of prime numbers.

Again, you are ignoring the effect of varying table size.
I am not. I am concerned with asymptotic size of hash tables which
are large enough to matter (a few thousand entries) otherwise the
problem is too small to be concerned with the details about (you just
need to make sure it works).

Perhaps you don't understand the basic mathematics. If you increase
the size of the table, the probability that two randomly chosen skip
values will not be coprime does *NOT* tend towards zero. The number
of pathological cases increases with the increase in table size. Thus
a strategy that just picks random skip values via another hash or
whatever, does not directly address the inherent inter-collision
problem.

You need to pick coprime (between each other) skip values if you use
linear probing. The most sensible way of doing this is to choose from
a set of primes.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 19 '07 #111
Richard Heathfield wrote:
Malcolm McLean said:
>>
"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
>>>

<snip>
>>"ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.
It's hard to read.

Does anyone else here apart from Malcolm have difficulty reading it?
Hands up please.
I'll put my hand up, just to say "Not me sir! Easy peasy sir!
Fits neatly on the eyeball and can be checked without context!
Win win, sir!"

Of course anyone who's using a remotely syntax-colouring editor
won't think that `sizeof` is an ordinary left operand of a
multiplication; nor will anyone who already knows what `sizeof`
means.

It would be really nice to have empirical evidence either way,
of course, since we're well into anecdotage here.

--
Aged Aged Hedgehog
"Who do you serve, and who do you trust?" /Crusade/

Aug 19 '07 #112

"Flash Gordon" <sp**@flash-gordon.me.ukwrote in message
news:62************@news.flash-gordon.me.uk...
>>Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.
It's hard to read.

A lot of newbies here don't find it hard once it is pointed out to them.
>Purely syntactical.

Only if you consider "ptr = malloc(sizeof(type))" to be purely
syntactical.
The objection to

ptr = malloc(sizeof *ptr);

is purely syntactical. As Keith said, if sizeof were spelt $ and given
normal operator rules for whitespace

ptr = malloc($*ptr);

would be OK. The * wouldn't look like multiplication of an identifier.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Aug 19 '07 #113

"Flash Gordon" <sp**@flash-gordon.me.ukwrote in message
news:md************@news.flash-gordon.me.uk...
You should also get your book reviewed somewhere like comp-programming
where the algorithms side will get properly reviewed, instead of the C
issues people concentrate on here.
That will happen.
The practical problem is that people focus on little things like ] instead
of ) for a close, or a missing check for null, because they are unambiguous.
However they are relatively easy to fix. The more serious criticisms come
out later - the car park maybe isn't the best metaphor for a hash table.
So it's best to get C language issues and size_t wars out of the way first.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm
Aug 19 '07 #114
On Fri, 17 Aug 2007 13:32:51 -0400, CBFalconer
<cb********@yahoo.comwrote:
>Richard Harter wrote:
>Eric Sosman <es*****@ieee-dot-org.invalidwrote:
>>James Dow Allen wrote:

[...]
Speaking of Falconer, his method (linear probing with second
hash value used for probe increment) *requires* (even more so
than quadratic probing, see upthread) prime table size.

Well, not exactly: It requires only that the increment
between probes be relatively prime to the table size. A prime
table size makes this criterion easy to satisfy, but it's not
the only way. A table size equal to a power of two, for example,
works just fine if the probe increment is odd. (Degenerate
special case: pure linear probing with an increment of unity
works with any table size at all, prime or composite.)

In a thread in comp.programming (IIRCC) Paul Hsieh pointed out
that always using the same probe increment in a power of two
table size also has bad properties. His solution is to use the
unused bits of the full hash to select an increment size from a
table of primes.

That is dangerous. The purpose of the multiple hash is to have
different search patterns for different objects that have the same
primary hash pattern. This avoids clustering, and tends to keep
the search chain short. The very iffy thing in hashlib is the
reduction of the range of the secondary hash, which encourages
light clustering in nearly empty tables. The system attempts to
maintain tables at between 44 and 88% of capacity.

Also, I want the table action to be independent of the hash
functions (apart from speed), so the functions are computed every
time needed. The system functions with all the hash functions
replaced by the constant 1 (unity) (tested in the test suite) at a
considerable speed penalty. So the action is hash function
independent, but the speed is highly dependent.
Here is a little clue: Discussions of hash tables are not
confined to your particular implementation.

>
The system is intended to isolate all these decisions in the hash
table code, so the user need not worry about them. Similarly the
algorithms for table expansion, etc. The result is a storage and
retrieval black box.
Aug 19 '07 #115
On Sun, 19 Aug 2007 09:06:23 +0100, Malcolm McLean wrote:
"Flash Gordon" <sp**@flash-gordon.me.ukwrote in message
news:62************@news.flash-gordon.me.uk...
>>>Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.

It's hard to read.

A lot of newbies here don't find it hard once it is pointed out to them.
>>Purely syntactical.

Only if you consider "ptr = malloc(sizeof(type))" to be purely
syntactical.
The objection to

ptr = malloc(sizeof *ptr);

is purely syntactical.
It is? Why? Is there a syntax error in the code?
Aug 19 '07 #116
On Sat, 18 Aug 2007 21:14:44 +0000, Richard Heathfield wrote:
Malcolm McLean said:
>>
"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
>>>

<snip>
>>"ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.
It's hard to read.

Does anyone else here apart from Malcolm have difficulty reading it?
This is one of those things which, like migrating to x++ from x = x + 1,
simply takes getting used to. It is idiomatic in what one might term
"good C code", not to mention around here, but a rank newbie - i.e.
someone who should not be reading MM's book in the first place - might not
grok it immediately.

Once grokked, however, it is a perfectly sensible way to do the job, for
the usual reasons.
Aug 19 '07 #117
[snips]

On Sat, 18 Aug 2007 22:08:12 +0100, Malcolm McLean wrote:
>For contrast, look at my hashlib. It was published complete with
the test suite and test programs that evaluated it during and after
development. I have had two and a half bug reports since the
initial publication, and no more for years.
And there was one I found in Dann's recommended text, admittedly in a code
fragment, and one in Ben Pfaff's circular buffer. Two in my two hash
tables, though one could be described as a feature. So assuming
Poisson-distributed bugs, we're doing about equally. However I am blamed.
Do you see Dann or Ben using ints when size_t is the correct tool to be
used, defending this on the basis that they don't want to code in C and
that, in effect, C should be changed to suit their weird - and objectively
wrong - views?

No, you don't. They have bugs. You have concept failures *plus* bugs.
All non-trivial code has bugs. Code written by people who don't know what
they're doing has bugs *plus* assorted other issues. You never did, for
example, explain the magic values used by squnch, or what magic occurs
when it is passed length values of -1, -137 or -32767. Nor did you
explain how to make it work on a perfectly legitimately-sized buffer
too large to have its size specified by a signed int.

Their code has bugs, so does yours. Yours adds in umpteen different sorts
of unpredictability and confusion, without justification.
Aug 19 '07 #118
"Malcolm McLean" <re*******@btinternet.comwrites:
"Flash Gordon" <sp**@flash-gordon.me.ukwrote in message
news:62************@news.flash-gordon.me.uk...
>>>Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.

It's hard to read.

A lot of newbies here don't find it hard once it is pointed out to them.
>>Purely syntactical.

Only if you consider "ptr = malloc(sizeof(type))" to be purely
syntactical.
The objection to

ptr = malloc(sizeof *ptr);

is purely syntactical. As Keith said, if sizeof were spelt $ and given
normal operator rules for whitespace

ptr = malloc($*ptr);

would be OK. The * wouldn't look like multiplication of an identifier.
As I said, if you really think it's such a problem, you *can* put
partentheses around the operand:

ptr = malloc(sizeof(*ptr));

Do you have any objection, "syntactical" or otherwise, to that form?

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 19 '07 #119

"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
As I said, if you really think it's such a problem, you *can* put
partentheses around the operand:

ptr = malloc(sizeof(*ptr));

Do you have any objection, "syntactical" or otherwise, to that form?
I prefer it. It is not uncommon to malloc more than one data item.

so ptr = malloc( N * sizeof *ptr);

if less clear than
ptr = malloc(N * sizeof(*ptr));

unless you are one of those people who likes to put the test in the line

if( ptr = malloc(N * sizeof(*ptr)) );

then some compilers like

if( (ptr = malloc(N * sizeof(*ptr))) )

at which point we've broken the rule of three and lost readbility.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm
Aug 19 '07 #120
Malcolm McLean <re*******@btinternet.comwrote:
I prefer it. It is not uncommon to malloc more than one data item.

so ptr = malloc( N * sizeof *ptr);

if less clear than
ptr = malloc(N * sizeof(*ptr));

unless you are one of those people who likes to put the test in the line

if( ptr = malloc(N * sizeof(*ptr)) );

then some compilers like

if( (ptr = malloc(N * sizeof(*ptr))) )

at which point we've broken the rule of three and lost readbility.
There's always:

if (ptr = malloc(N * sizeof *ptr), ptr == NULL) ...
Aug 19 '07 #121
Ed Jensen <ej*****@visi.comwrites:
Malcolm McLean <re*******@btinternet.comwrote:
>I prefer it. It is not uncommon to malloc more than one data item.

so ptr = malloc( N * sizeof *ptr);

if less clear than
ptr = malloc(N * sizeof(*ptr));

unless you are one of those people who likes to put the test in the line

if( ptr = malloc(N * sizeof(*ptr)) );

then some compilers like

if( (ptr = malloc(N * sizeof(*ptr))) )

at which point we've broken the rule of three and lost readbility.

There's always:

if (ptr = malloc(N * sizeof *ptr), ptr == NULL) ...
Or just:

if ((ptr = malloc(N * sizeof *ptr) != NULL) ...

I'd probably just put the test on a separate line.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 19 '07 #122
Keith Thompson <ks***@mib.orgwrote:
>There's always:

if (ptr = malloc(N * sizeof *ptr), ptr == NULL) ...

Or just:

if ((ptr = malloc(N * sizeof *ptr) != NULL) ...

I'd probably just put the test on a separate line.
Good idea, since your one liner would fail to compile.
Aug 19 '07 #123
Richard Harter wrote:
On Fri, 17 Aug 2007 21:33:47 -0400, Eric Sosman
>[...[
This is one of the reasons
twin primes are popular as table sizes: you use the hash code
mod N for the first probe, and then if necessary you use the
hash mod (N-2) plus 1 as the increment for further probes. The
primality of N guarantees that the increment is relatively prime
to it; the primality of N-2 helps scramble groups of keys that
have simple relationships (sum1, sum2, sum3, ...).

I'm not clear as to why the other sibling in a pair twin prime
should be superior for scrambling as compared to using some other
prime.
Any prime will scramble about as well as any other (I'm
speaking very loosely here, of course), but a larger prime
will yield more different remainders than a smaller one --
for example, consider K % 65537 vs. K % 2.

--
Eric Sosman
es*****@ieee-dot-org.invalid
Aug 19 '07 #124
Keith Thompson wrote:
>
Ed Jensen <ej*****@visi.comwrites:
Malcolm McLean <re*******@btinternet.comwrote:
I prefer it. It is not uncommon to malloc more than one data item.

so ptr = malloc( N * sizeof *ptr);

if less clear than
ptr = malloc(N * sizeof(*ptr));

unless you are one of those people
who likes to put the test in the line

if( ptr = malloc(N * sizeof(*ptr)) );

then some compilers like

if( (ptr = malloc(N * sizeof(*ptr))) )

at which point we've broken the rule of three and lost readbility.
There's always:

if (ptr = malloc(N * sizeof *ptr), ptr == NULL) ...

Or just:

if ((ptr = malloc(N * sizeof *ptr) != NULL) ...

I'd probably just put the test on a separate line.
I recall Eric Sosman advising me that in the case of loops,
it's probably better to write something like

while((ptr = malloc(N * sizeof *ptr) != NULL) {
/**/
}

than it is to write

ptr = malloc(N * sizeof *ptr);
while(ptr != NULL) {
/**/
ptr = malloc(N * sizeof *ptr);
}

because it makes it more obvious that there is only one expression
generating values for ptr.

--
pete
Aug 20 '07 #125
pete wrote:
>
I recall Eric Sosman advising me that in the case of loops,
it's probably better to write something like

while((ptr = malloc(N * sizeof *ptr) != NULL) {
/**/
}
I hope I didn't advise exactly that! (Hint: count
the parentheses.
than it is to write

ptr = malloc(N * sizeof *ptr);
while(ptr != NULL) {
/**/
ptr = malloc(N * sizeof *ptr);
}

because it makes it more obvious that there is only one expression
generating values for ptr.
General principle: If something is written in only one
place, that one place is authoritative -- it may be right or
it may be wrong, but it can be judged and possibly corrected
on its own. If something is written in two places, neither
is authoritative and there can be a question about which is
right, and if both are wrong but only one gets corrected, ...

Or, as an older adage has it: "A man with two watches
never knows the correct time."

--
Eric Sosman
es*****@ieee-dot-org.invalid
Aug 20 '07 #126
Eric Sosman wrote:
>
pete wrote:

I recall Eric Sosman advising me that in the case of loops,
it's probably better to write something like

while((ptr = malloc(N * sizeof *ptr) != NULL) {
/**/
}

I hope I didn't advise exactly that! (Hint: count
the parentheses.
than it is to write

ptr = malloc(N * sizeof *ptr);
while(ptr != NULL) {
/**/
ptr = malloc(N * sizeof *ptr);
}

because it makes it more obvious that there is only one expression
generating values for ptr.

General principle: If something is written in only one
place, that one place is authoritative -- it may be right or
it may be wrong, but it can be judged and possibly corrected
on its own. If something is written in two places, neither
is authoritative and there can be a question about which is
right, and if both are wrong but only one gets corrected, ...

Or, as an older adage has it: "A man with two watches
never knows the correct time."
Thank you.

--
pete
Aug 20 '07 #127
Ed Jensen <ej*****@visi.comwrites:
Keith Thompson <ks***@mib.orgwrote:
>>There's always:

if (ptr = malloc(N * sizeof *ptr), ptr == NULL) ...

Or just:

if ((ptr = malloc(N * sizeof *ptr) != NULL) ...

I'd probably just put the test on a separate line.

Good idea, since your one liner would fail to compile.
D'oh! Thanks for the correction:

if ((ptr = malloc(N * sizeof *ptr)) != NULL) ...

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 20 '07 #128
we******@gmail.com wrote:
>Yes, I understood your claim the first time round. I don't
think you've justified your premises yet. Again, I'm not
saying that you're /wrong/; I'm saying that I don't think
your argument is complete.

Well, much of the industry seems to have accepted the analysis shown
above, so I don't know what else to tell you.
"The analysis shown above" [1] was not part of your presented
argument; if it's necessary to your argument, you should have
mentioned it /first/, not spent an eternity backing down to it.

Now I have somethings to read.

[1] Not shown, just mentioned.

--
Eternal Hedgehog
Meaning precedes definition.

Aug 20 '07 #129

"Eric Sosman" <es*****@ieee-dot-org.invalidwrote in message
news:o4******************************@comcast.com. ..
pete wrote:
>I recall Eric Sosman advising me that in the case of loops,
it's probably better to write something like while((ptr = malloc(N *
sizeof *ptr) != NULL) {
/**/
}

I hope I didn't advise exactly that! (Hint: count
the parentheses.
That's th snag, isn't it. Just add a tiny bit of visual complexity and
people make errors in the expression. Not just once but twice.

Do you really want to exhaust the computer's memory? If not, the loop must
not terminate at the while() in normal flow control.
It should be

while( allocationsleftotodo() )
{
ptr = malloc(N * sizeof *ptr);
if(!ptr)
/* oh dear. take emergency action */
/* normal code */
}

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm
Aug 20 '07 #130
Malcolm McLean wrote:
>
"Eric Sosman" <es*****@ieee-dot-org.invalidwrote in message
news:o4******************************@comcast.com. ..
pete wrote:
I recall Eric Sosman advising me that in the case of loops,
it's probably better to write something like while((ptr = malloc(N *
sizeof *ptr) != NULL) {
/**/
}
I hope I didn't advise exactly that! (Hint: count
the parentheses.
That's th snag, isn't it. Just add a tiny bit of visual complexity and
people make errors in the expression. Not just once but twice.

Do you really want to exhaust the computer's memory? If not, the loop must
not terminate at the while() in normal flow control.
It should be

while( allocationsleftotodo() )
{
ptr = malloc(N * sizeof *ptr);
if(!ptr)
/* oh dear. take emergency action */
/* normal code */
}
The context of Eric's original advice,
was not allocation, as it is here.
The topic that I mean to address was:
assignment within a loop controling expression versus
splitting the assignment into a separate line.

--
pete
Aug 20 '07 #131
Richard Heathfield wrote:
Malcolm McLean said:
>"CBFalconer" <cb********@yahoo.comwrote in message
news:46***************@yahoo.com...
>>"ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.
It's hard to read.

Does anyone else here apart from Malcolm have difficulty reading it?
Hands up please.
Like others have said, I found it confusing at first, but once I had
seen it once that was all I needed. It *is* different and idiomatic and
uncommon, but it doesn't take much to learn what it means and why it is
helpful.

Compare it to other language features such as pointers, unions, and
setjmp/longjmp, which take longer to learn and longer still to use well.
Having seen this construct once, one can use it successfully.

Phil

--
Philip Potter pgp <atdoc.ic.ac.uk
Aug 20 '07 #132
Malcolm McLean wrote:
>
"Ben Bacarisse" <be********@bsb.me.ukwrote in message
news:87************@bsb.me.uk...
>Yes, he does. I missed that. Quadratic probing is usually explained
as inspecting

h(k, i) = h(k) + c1 * i + c2 * i * i

for i = 0, 1 and so on. When c2 == 0 you have linear probing. I have
never seen the term used in the way MM uses it. Squaring the hash
looks like a bad idea, at least on the surface. I wonder if he has
any data about the advantages and disadvantages of doing so.
I've misunderstood the algorithm and invented something better. (I
think). I'm using the square of the hash value modulus table size as an
increment. So effectively it's secondary hashing and not quadratic
probing at all.
I honestly can't believe you just said this. Hashing functions and
algorithms, like random number generators, are not to be guessed at, and
intuitively inspected for quality. If you're going to say "I think I've
invented something better" you ought to have some sort of evidence to
back this up!

Phil

--
Philip Potter pgp <atdoc.ic.ac.uk
Aug 20 '07 #133
On Aug 18, 7:50 am, "Malcolm McLean" <regniz...@btinternet.comwrote:
"Ben Bacarisse" <ben.use...@bsb.me.ukwrote in message
Yes, he does. I missed that. Quadratic probing is usually explained
as inspecting
h(k, i) = h(k) + c1 * i + c2 * i * i
for i = 0, 1 and so on. When c2 == 0 you have linear probing. I have
never seen the term used in the way MM uses it. Squaring the hash
looks like a bad idea, at least on the surface. I wonder if he has
any data about the advantages and disadvantages of doing so.

I've misunderstood the algorithm and invented something better. (I think).
I'm using the square of the hash value modulus table size as an increment.
So effectively it's secondary hashing and not quadratic probing at all.
As pointed out by Philip Potter, you can't be so cavalier about such
things. If the index is 0 or 1, then squaring it is going to give you
0 and 1 as the new index all the time. Furthermore, depending on the
table size, you can get all sorts of "short cycles". For example 2*2
== 4 (mod 7) and 4*4 == 16 == 2 (mod 7). So 2 and 4 just cycle back
and forth in a table of sized 7.

While that analysis from academia (specifically Knuth, but others
reproduce it) is not very generally applicable to *real world* hashing
(i.e., hashing keys/objects which are *not* integers) under their
assumptions this analysis is fairly thorough. Also the quadratic
probing technique has the advantage that you can compute the offsets
quickly using finite differences. (Your self-squaring technique,
requires the multiplier anyways, which has only very recently become
fast in all CPUs.)

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 20 '07 #134
Malcolm McLean wrote:
>
.... snip ...
>
while( allocationsleftotodo() )
{
ptr = malloc(N * sizeof *ptr);
if(!ptr)
/* oh dear. take emergency action */
/* normal code */
}
Simpler, more readable, and more direct, is:

while (allocationsleftotodo()) {
if (!(ptr = malloc(N * sizeof *ptr))) {
/* oh dear. take emergency action */
}
else {
/* all well, take normal action */
}
}

which also allows eliding of the else to allow for 'emergency
action' making 'normal action' feasable. Note that the key event
is the success or failure of the malloc, and the structure makes
that evident. There is no searching to discover what the state of
ptr represents.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 20 '07 #135
Keith Thompson wrote:
Ed Jensen <ej*****@visi.comwrites:
>Keith Thompson <ks***@mib.orgwrote:
.... snip ...
>>>
if ((ptr = malloc(N * sizeof *ptr) != NULL) ...

I'd probably just put the test on a separate line.

Good idea, since your one liner would fail to compile.

D'oh! Thanks for the correction:

if ((ptr = malloc(N * sizeof *ptr)) != NULL) ...
However this is a non-critical typo, since the compiler will always
catch it. The nuisance is when there are multiple such and the
compiler stops on the first error.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Aug 20 '07 #136

<we******@gmail.comwrote in message
news:11*********************@q4g2000prc.googlegrou ps.com...
On Aug 18, 7:50 am, "Malcolm McLean" <regniz...@btinternet.comwrote:
>"Ben Bacarisse" <ben.use...@bsb.me.ukwrote in message
Yes, he does. I missed that. Quadratic probing is usually explained
as inspecting
h(k, i) = h(k) + c1 * i + c2 * i * i
for i = 0, 1 and so on. When c2 == 0 you have linear probing. I have
never seen the term used in the way MM uses it. Squaring the hash
looks like a bad idea, at least on the surface. I wonder if he has
any data about the advantages and disadvantages of doing so.

I've misunderstood the algorithm and invented something better. (I
think).
I'm using the square of the hash value modulus table size as an
increment.
So effectively it's secondary hashing and not quadratic probing at all.

As pointed out by Philip Potter, you can't be so cavalier about such
things. If the index is 0 or 1, then squaring it is going to give you
0 and 1 as the new index all the time. Furthermore, depending on the
table size, you can get all sorts of "short cycles". For example 2*2
== 4 (mod 7) and 4*4 == 16 == 2 (mod 7). So 2 and 4 just cycle back
and forth in a table of sized 7.
It doesn't work like that.
The index returned from the hash is squared, made modulus table size, and
used as a linear probe increment. It's effectively secondary hashing where
function 2 is hash 1, squared. It does a reasonable job of busting clusters,
since apart from the first few keys are scattered all over the place.
However I should square the hash, not the index, to get a truly distributed
secondary hash. As it is collisions will create chains of collisions.
It was an accident, anyway.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm
Aug 20 '07 #137
On Aug 20, 1:15 pm, "Malcolm McLean" <regniz...@btinternet.comwrote:
<websn...@gmail.comwrote in message
news:11*********************@q4g2000prc.googlegrou ps.com...
On Aug 18, 7:50 am, "Malcolm McLean" <regniz...@btinternet.comwrote:
"Ben Bacarisse" <ben.use...@bsb.me.ukwrote in message
Yes, he does. I missed that. Quadratic probing is usually explained
as inspecting
h(k, i) = h(k) + c1 * i + c2 * i * i
for i = 0, 1 and so on. When c2 == 0 you have linear probing. I have
never seen the term used in the way MM uses it. Squaring the hash
looks like a bad idea, at least on the surface. I wonder if he has
any data about the advantages and disadvantages of doing so.
I've misunderstood the algorithm and invented something better. (I
think). I'm using the square of the hash value modulus table size as an
increment. So effectively it's secondary hashing and not quadratic probing
at all.
As pointed out by Philip Potter, you can't be so cavalier about such
things. If the index is 0 or 1, then squaring it is going to give you
0 and 1 as the new index all the time. Furthermore, depending on the
table size, you can get all sorts of "short cycles". For example 2*2
== 4 (mod 7) and 4*4 == 16 == 2 (mod 7). So 2 and 4 just cycle back
and forth in a table of sized 7.

It doesn't work like that.
The index returned from the hash is squared, made modulus table size, and
used as a linear probe increment.
Well, 0*0 == 0, so you still have a problem. Also this does nothing
special about the general problems that linear probes have then.

Ok, so lets think about this again using the examples I showed above.
Table size of 7, on offsets 2 and 4. If you land on offset 2, then
you will skip in steps of 4. If you land of offset 4 you will go in
steps of 2. In both cases, 2+4 = 4+2 = 6, so they both collide with
their first probe. Furthermore, 2+4+4 = 3 (mod 7) and 4+2+2+2 = 3
(mod 7) so again they collide. Sorry but no, this is a terrible
strategy.
[...] It's effectively secondary hashing where
function 2 is hash 1, squared. It does a reasonable job of busting clusters,
since apart from the first few keys are scattered all over the place.
This is not analysis. Try doing some math, or collecting real data to
back up your claim. I am almost certain that this is actually an
unusually *BAD* strategy -- even worse than CBFalconer's "just hash it
again, and mask it down to a quarter the size" kind of strategy.
However I should square the hash, not the index, to get a truly distributed
secondary hash. As it is collisions will create chains of collisions.
It was an accident, anyway.
Well this fixes one anomaly (the same chain sequence always coming out
of the same slot) bit it does not settle the other standard numeric
collision anomaly (stepping on each other just due to a lack of
coprimality of the two chains).

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Aug 20 '07 #138
we******@gmail.com writes:
On Aug 18, 7:50 am, "Malcolm McLean" <regniz...@btinternet.comwrote:
>"Ben Bacarisse" <ben.use...@bsb.me.ukwrote in message
Yes, he does. I missed that. Quadratic probing is usually explained
as inspecting
h(k, i) = h(k) + c1 * i + c2 * i * i
for i = 0, 1 and so on. When c2 == 0 you have linear probing. I have
never seen the term used in the way MM uses it. Squaring the hash
looks like a bad idea, at least on the surface. I wonder if he has
any data about the advantages and disadvantages of doing so.

I've misunderstood the algorithm and invented something better. (I think).
I'm using the square of the hash value modulus table size as an increment.
So effectively it's secondary hashing and not quadratic probing at all.

As pointed out by Philip Potter, you can't be so cavalier about such
things.
Quite.
If the index is 0 or 1, then squaring it is going to give you
0 and 1 as the new index all the time. Furthermore, depending on the
table size, you can get all sorts of "short cycles". For example 2*2
== 4 (mod 7) and 4*4 == 16 == 2 (mod 7). So 2 and 4 just cycle back
and forth in a table of sized 7.
That is not what is done. I don't know if what he does is better or
worse than that, but it is in effect using (idx * idx) % size as an
_increment_ (with 0 mapped to 1) -- similar to basic linear probing.
The % size is pointless (since the he repeats the reduction to do the
index operation after adding the increment) and I specifically say idx
rather hash because what gets used is the index that has caused the
collision, not the full hash of the key.

As I understand it, most linear probing schemes that use an increment
other than 1, choose one based on the full hash (or even another hash)
so as to avoid generating the same sequence of probes for all elements
that map a particular initial index.

Of course, there may be some curious reason why this form of linear
probing *is* a good idea in practice, but it certainly looks bad on
paper and it most definitely is *not* quadratic probing as promised in
the text.

--
Ben.
Aug 20 '07 #139
Julienne Walker wrote:
"Malcolm McLean" <regniz...@btinternet.comwrote:
>>>However can you point to a better introductory web page on hash
tables? That's the real test.
>><snip>
>>Personally, I like this:
http://eternallyconfuzzled.com/tuts/..._tut_hashtable....
When the twisted minds of Julienne Walker and The EC Team get
together and write a book, I will buy a copy for sure.

It is a good description. The writing is rather dry and my car
park metaphor is more interesting.
Piggybacking. Take a look at my hashlib.zip, written in purely
standard C and totally portable. GPL licenced. See:

<http://cbfalconer.home.att.net/download/>

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Sep 7 '07 #140

This discussion thread is closed

Replies have been disabled for this discussion.

By using this site, you agree to our Privacy Policy and Terms of Use.