On Fri, 11 Jan 2008 16:15:26 -0800, j1mb0jay <no**@none.comwrote:
[...]
I just wanted to improve the insert method of the hash table because
after inserting all words I have to check a maximum 4 words rather than
1 to see if the word already exists in the hash table which is
2.5million words long.
There's no way for you to guarantee that you don't have to check multiple
words, even if you have as many entries in the hash table as you have data
elements. Collisions are always a possibility.
Besides that, you seem to be pursuing a pointless optimization. Even if
your hash table required you to check dozens of words, it would still be
ridiculously faster than a linear search of 2.5 million words. That's the
point of a hash table. You may be able to improve performance slightly by
implementing your own hash table and/or hash function, but IMHO once
you've got to the point of use _some_ kind of hash table, you're likely to
find the algorithm isn't going to get a lot faster by messing around with
the hash table. Your efforts will be better spent looking for other
low-hanging fruit elsewhere in whatever the algorithm you're doing is.
And if you can't get acceptable performance that way while still using a
hash table, you may wind up needing an algorithm that uses a data
structure that's even faster than the hash table.
Pete