473,386 Members | 1,630 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Perfect hashing for Py

Following links from this thread:
http://groups.google.com/group/comp....e1a45485ab36a#

I have found this perfect hash (minimal too) implementation:
http://burtleburtle.net/bob/hash/perfect.html

I have already translated part of it to D, and it seems to work well
enough. As discussed in the PyConDue, I think this may be used in
frozenset (and frozendict) to build a (minimal too?) perfect hash on
the fly, to allow (hopefully) faster retrieval of items that don't
change.
That code is C and I think it's public domain, so if experiments show
it gives enough speed up, it may be added to CPython 2.6/3.

Bye,
bearophile
Jul 11 '08 #1
2 3243
On Jul 11, 8:01*am, bearophileH...@lycos.com wrote:
Following links from this thread:http://groups.google.com/group/comp....thread/thread/...

I have found this perfect hash (minimal too) implementation:http://burtleburtle.net/bob/hash/perfect.html

I have already translated part of it to D, and it seems to work well
enough. As discussed in the PyConDue, I think this may be used in
frozenset (and frozendict) to build a (minimal too?) perfect hash on
the fly, to allow (hopefully) faster retrieval of items that don't
change.
That code is C and I think it's public domain, so if experiments show
it gives enough speed up, it may be added to CPython 2.6/3.

Bye,
bearophile
It would be interesting to see if such an algorithm could actually
provide any significant performance improvements for the size of sets
that I suspect are most often used in practice. The chance of a hash
collision for a good 32-bit general is fairly low even for a set of
1,000,000 unique elements, which seems to me to be a pretty large
memory-based set. Compare that with the cost of determining a perfect
hash (O(n**2)?).

From my perspective, a perfect hash would certainly be a welcome
addition to the Python library or even as an optional algorithm
supporting
hash-based collections.
Jul 11 '08 #2
On Jul 12, 10:13*am, Raymond Hettinger <pyt...@rcn.comwrote:
On Jul 11, 3:01*pm, bearophileH...@lycos.com wrote:
I have found this perfect hash (minimal too) implementation:http://burtleburtle.net/bob/hash/perfect.html
I have already translated part of it to D, and it seems to work well
enough. As discussed in the PyConDue, I think this may be used in
frozenset (and frozendict) to build a (minimal too?) perfect hash on
the fly, to allow (hopefully) faster retrieval of items that don't
change.
I played around with the idea and have a rough implementation sketch:

class PerfectSet(collections.Set):
__slots__ = ['len', 'hashvalue', 'hashfunc', 'hashtable']
DUMMY = object()
def __init__(self, keys, sparseness=0.5):
keys = frozenset(keys)
self.len = len(keys)
self.hashvalue = hash(keys)
n = (len(keys) / sparseness) + 1
assert n self.len
self.hashfunc = make_perfect_hash_function(keys, n)
self.hashtable = [self.DUMMY] * n
for k in keys:
self.hashtable[self.hashfunc(k)] = k
del __len__(self, k):
return self.len
def __iter__(self, k)
return (k for k in self.hashtable if k is not self.DUMMY)
def __contains__(self, k):
return self.table[self.hashfunc(k)] is not self.DUMMY

The perfect hash function will need to use the regular hash(obj) as a
starting point. This helps with non-string types and it preserves
existing relationships between __hash__ and __eq__.

The construction is more expensive than with regular frozensets so it
is only useful when lookups are very common.

Playing around with the idea suggests that a sparseness variable for
frozensets would largely accomplish the same goal without the overhead
of creating and applying a perfect hash function. Sparse hash tables
rarely collide.
Raymond
Jul 12 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: snowteo | last post by:
Hi,I have to do this exercises can you help me: 1)Write a program to implement exetendible hashing.If the table is small enough to fin in main memory,how does its performance compare with open and...
4
by: macluvitch | last post by:
hello folks, do you have any idea how gperf generates the numeric table ? I know that it uses perfect hashing for speeding up look up. any link, trick will be really very appreciated thanks...
11
by: Wm. Scott Miller | last post by:
Hello all! We are building applications here and have hashing algorithms to secure secrets (e.g passwords) by producing one way hashes. Now, I've read alot and I've followed most of the advice...
10
by: Dino M. Buljubasic | last post by:
Hi, I am using MD5 to hash my passwords and add them to database as hashed. I have noticed though that some passwords don't get recognized and I suppose that it happen because hashing might...
19
by: Ole Nielsby | last post by:
How does the GetHashCode() of an array object behave? Does it combine the GetHashCode() of its elements, or does it create a sync block for the object? I want to use readonly arrays as...
8
by: Maya | last post by:
Hello all, I'm using MD5 hashing in my application to give unique values to huge list of items my application receives, originally every item's name was difficult to use as an id for this item...
6
by: Jayender | last post by:
Hi, What is the difference between Hashing and Encryption ?
1
by: Tinku | last post by:
Hi friends I know Static Hashing and i know about Dynamic Hashing, still i have problem to make program with Dynamic Hashing I am new in "C" world, please help me, my problem is: i have to...
15
by: Vinodh | last post by:
I am reading about hashing techniques. The map data structure available in C++ STL uses hashing techniques?
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.