Hi
can anybody tell me that which ds will be best suited to implement a
hash table in C/C++
thanx. in advanced
Aug 11 '07
139 14232
On Aug 18, 11:37 am, CBFalconer <cbfalco...@yah oo.comwrote:
websn...@gmail. com wrote:
CBFalconer <cbfalco...@yah oo.comwrote:
Richard Harter wrote:
... snip ...
>[...] that always using the same probe increment in a power of two table size also has bad properties. His solution is to use the unused bits of the full hash to select an increment size from a table of primes.
That is dangerous. The purpose of the multiple hash is to
have different search patterns for different objects that
have the same primary hash pattern.
No, its to have a different search pattern for different object
that have the same primary hash *INDEX* (i.e., post masking).
This is obvious since, for any good hash function, index
collisions will be many many times more common than full hash
function collisions.
Mathematically speaking, on average a single 32-bit (good) hash
function collision will occur roughly when 65536 entries appear,
while a hash index collision will occur roughly when the square
root of the table size number of entries occur (so for a 9
million entry table, that's just 3000 entries).
No, you are ignoring the size of the hash table itself. For
example, the starting size for hashlib is 17, so no more than 17
hash values (modulo 17) are useful. This changes as the table
enlarges itself.
Huh? Well of course I am ignoring small hash tables -- *ALL*
solutions are basically equivalent for small enough hash tables. The
problem is only interesting when there is a performance impact, which
will happen as the hash table holds thousands of entries.
I think you are misreading my language (I am exercising serious
restraint here) a hash function collision is not the same as a hash
index collision. This distinction matters for completely optimized
hash table implementations .
This avoids clustering, and tends to keep
the search chain short. The very iffy thing in hashlib is
the reduction of the range of the secondary hash, which
encourages light clustering in nearly empty tables.
You are ignoring the fact that skip values of 2, 4 and 6, for
example, for indexes starting on even offsets near each other
will tend to collide with each other as the table fills up.
Same skip-value, and commensurate offset collisions must be
minimized by increasing the chance they miss each other. This
happens most easily if the skip values are predominantly
co-prime, which is easiest to construct when they are simply
taken from a list of prime numbers.
Again, you are ignoring the effect of varying table size.
I am not. I am concerned with asymptotic size of hash tables which
are large enough to matter (a few thousand entries) otherwise the
problem is too small to be concerned with the details about (you just
need to make sure it works).
Perhaps you don't understand the basic mathematics. If you increase
the size of the table, the probability that two randomly chosen skip
values will not be coprime does *NOT* tend towards zero. The number
of pathological cases increases with the increase in table size. Thus
a strategy that just picks random skip values via another hash or
whatever, does not directly address the inherent inter-collision
problem.
You need to pick coprime (between each other) skip values if you use
linear probing. The most sensible way of doing this is to choose from
a set of primes.
--
Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/
Richard Heathfield wrote:
Malcolm McLean said:
>> "CBFalconer " <cb********@yah oo.comwrote in message news:46******* ********@yahoo. com...
>>>
<snip>
>>"ptr = malloc(sizeof *ptr);" is automatically the right size. No checking of any form is required.
It's hard to read.
Does anyone else here apart from Malcolm have difficulty reading it?
Hands up please.
I'll put my hand up, just to say "Not me sir! Easy peasy sir!
Fits neatly on the eyeball and can be checked without context!
Win win, sir!"
Of course anyone who's using a remotely syntax-colouring editor
won't think that `sizeof` is an ordinary left operand of a
multiplication; nor will anyone who already knows what `sizeof`
means.
It would be really nice to have empirical evidence either way,
of course, since we're well into anecdotage here.
--
Aged Aged Hedgehog
"Who do you serve, and who do you trust?" /Crusade/
"Flash Gordon" <sp**@flash-gordon.me.ukwro te in message
news:62******** ****@news.flash-gordon.me.uk...
>>Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right size. No checking of any form is required.
It's hard to read.
A lot of newbies here don't find it hard once it is pointed out to them.
>Purely syntactical.
Only if you consider "ptr = malloc(sizeof(t ype))" to be purely
syntactical.
The objection to
ptr = malloc(sizeof *ptr);
is purely syntactical. As Keith said, if sizeof were spelt $ and given
normal operator rules for whitespace
ptr = malloc($*ptr);
would be OK. The * wouldn't look like multiplication of an identifier.
--
Free games and programming goodies. http://www.personal.leeds.ac.uk/~bgy1mm
"Flash Gordon" <sp**@flash-gordon.me.ukwro te in message
news:md******** ****@news.flash-gordon.me.uk...
You should also get your book reviewed somewhere like comp-programming
where the algorithms side will get properly reviewed, instead of the C
issues people concentrate on here.
That will happen.
The practical problem is that people focus on little things like ] instead
of ) for a close, or a missing check for null, because they are unambiguous.
However they are relatively easy to fix. The more serious criticisms come
out later - the car park maybe isn't the best metaphor for a hash table.
So it's best to get C language issues and size_t wars out of the way first.
--
Free games and programming goodies. http://www.personal.leeds.ac.uk/~bgy1mm
On Fri, 17 Aug 2007 13:32:51 -0400, CBFalconer
<cb********@yah oo.comwrote:
>Richard Harter wrote:
>Eric Sosman <es*****@ieee-dot-org.invalidwrot e:
>>James Dow Allen wrote:
[...] Speaking of Falconer, his method (linear probing with second hash value used for probe increment) *requires* (even more so than quadratic probing, see upthread) prime table size.
Well, not exactly: It requires only that the increment between probes be relatively prime to the table size. A prime table size makes this criterion easy to satisfy, but it's not the only way. A table size equal to a power of two, for example, works just fine if the probe increment is odd. (Degenerate special case: pure linear probing with an increment of unity works with any table size at all, prime or composite.)
In a thread in comp.programmin g (IIRCC) Paul Hsieh pointed out that always using the same probe increment in a power of two table size also has bad properties. His solution is to use the unused bits of the full hash to select an increment size from a table of primes.
That is dangerous. The purpose of the multiple hash is to have different search patterns for different objects that have the same primary hash pattern. This avoids clustering, and tends to keep the search chain short. The very iffy thing in hashlib is the reduction of the range of the secondary hash, which encourages light clustering in nearly empty tables. The system attempts to maintain tables at between 44 and 88% of capacity.
Also, I want the table action to be independent of the hash functions (apart from speed), so the functions are computed every time needed. The system functions with all the hash functions replaced by the constant 1 (unity) (tested in the test suite) at a considerable speed penalty. So the action is hash function independent, but the speed is highly dependent.
Here is a little clue: Discussions of hash tables are not
confined to your particular implementation.
> The system is intended to isolate all these decisions in the hash table code, so the user need not worry about them. Similarly the algorithms for table expansion, etc. The result is a storage and retrieval black box.
On Sun, 19 Aug 2007 09:06:23 +0100, Malcolm McLean wrote:
"Flash Gordon" <sp**@flash-gordon.me.ukwro te in message
news:62******** ****@news.flash-gordon.me.uk...
>>>Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right size. No checking of any form is required.
It's hard to read.
A lot of newbies here don't find it hard once it is pointed out to them.
>>Purely syntactical.
Only if you consider "ptr = malloc(sizeof(t ype))" to be purely syntactical.
The objection to
ptr = malloc(sizeof *ptr);
is purely syntactical.
It is? Why? Is there a syntax error in the code?
On Sat, 18 Aug 2007 21:14:44 +0000, Richard Heathfield wrote:
Malcolm McLean said:
>> "CBFalconer " <cb********@yah oo.comwrote in message news:46******* ********@yahoo. com...
>>>
<snip>
>>"ptr = malloc(sizeof *ptr);" is automatically the right size. No checking of any form is required.
It's hard to read.
Does anyone else here apart from Malcolm have difficulty reading it?
This is one of those things which, like migrating to x++ from x = x + 1,
simply takes getting used to. It is idiomatic in what one might term
"good C code", not to mention around here, but a rank newbie - i.e.
someone who should not be reading MM's book in the first place - might not
grok it immediately.
Once grokked, however, it is a perfectly sensible way to do the job, for
the usual reasons.
[snips]
On Sat, 18 Aug 2007 22:08:12 +0100, Malcolm McLean wrote:
>For contrast, look at my hashlib. It was published complete with the test suite and test programs that evaluated it during and after development. I have had two and a half bug reports since the initial publication, and no more for years.
And there was one I found in Dann's recommended text, admittedly in a code
fragment, and one in Ben Pfaff's circular buffer. Two in my two hash
tables, though one could be described as a feature. So assuming
Poisson-distributed bugs, we're doing about equally. However I am blamed.
Do you see Dann or Ben using ints when size_t is the correct tool to be
used, defending this on the basis that they don't want to code in C and
that, in effect, C should be changed to suit their weird - and objectively
wrong - views?
No, you don't. They have bugs. You have concept failures *plus* bugs.
All non-trivial code has bugs. Code written by people who don't know what
they're doing has bugs *plus* assorted other issues. You never did, for
example, explain the magic values used by squnch, or what magic occurs
when it is passed length values of -1, -137 or -32767. Nor did you
explain how to make it work on a perfectly legitimately-sized buffer
too large to have its size specified by a signed int.
Their code has bugs, so does yours. Yours adds in umpteen different sorts
of unpredictabilit y and confusion, without justification.
"Malcolm McLean" <re*******@btin ternet.comwrite s:
"Flash Gordon" <sp**@flash-gordon.me.ukwro te in message
news:62******** ****@news.flash-gordon.me.uk...
>>>Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right size. No checking of any form is required.
It's hard to read.
A lot of newbies here don't find it hard once it is pointed out to them.
>>Purely syntactical.
Only if you consider "ptr = malloc(sizeof(t ype))" to be purely syntactical.
The objection to
ptr = malloc(sizeof *ptr);
is purely syntactical. As Keith said, if sizeof were spelt $ and given
normal operator rules for whitespace
ptr = malloc($*ptr);
would be OK. The * wouldn't look like multiplication of an identifier.
As I said, if you really think it's such a problem, you *can* put
partentheses around the operand:
ptr = malloc(sizeof(* ptr));
Do you have any objection, "syntactica l" or otherwise, to that form?
--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
"Keith Thompson" <ks***@mib.orgw rote in message
news:ln******** ****@nuthaus.mi b.org...
As I said, if you really think it's such a problem, you *can* put
partentheses around the operand:
ptr = malloc(sizeof(* ptr));
Do you have any objection, "syntactica l" or otherwise, to that form?
I prefer it. It is not uncommon to malloc more than one data item.
so ptr = malloc( N * sizeof *ptr);
if less clear than
ptr = malloc(N * sizeof(*ptr));
unless you are one of those people who likes to put the test in the line
if( ptr = malloc(N * sizeof(*ptr)) );
then some compilers like
if( (ptr = malloc(N * sizeof(*ptr))) )
at which point we've broken the rule of three and lost readbility.
--
Free games and programming goodies. http://www.personal.leeds.ac.uk/~bgy1mm This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Yat |
last post by:
I want to use hash table in C.
Have any build-in library in C so that I can use it without implement hash
table myself?
Thx
|
by: kdotsky |
last post by:
Hello,
I am using some very large dictionaries with keys that are long strings
(urls). For a large dictionary these keys start to take up a
significant amount of memory. I do not need access to these keys -- I
only need to be able to retrieve the value associated with a certain
key, so I do not want to have the keys stored in memory. Could I just
hash() the url strings first and use the resulting integer as the key?
I think what I'm...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |