469,627 Members | 943 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,627 developers. It's quick & easy.

creating bitmasks (for bloom filters)


Included below is a copy of a message I sent to the py-tutor list. It
didn't garner much in the way of a solution and it was suggested that
this list might be helpful.


-----Forwarded Message-----
From: Aaron Straup Cope <as*@vineyard.net>
To: pytutor <tu***@python.org>
Subject: creating bitmasks (for bloom filters)
Date: Fri, 11 Jun 2004 08:49:20 -0400


A quick introduction : I am mostly a Perl hacker, but I am teaching
myself Python because it seems like the right tool for the job in
question. [1] I am not, by any stretch, much of a math weenie.

I am trying to port the Bloom::Filter Perl module [2] for use with my
tool. I know that there is a "pybloom" sourceforge project [3] but
licensing issues aside it's reliance on Unix math libs makes it
problematic for use with a cross-platform GUI tool.

At this point, my not very deep grasp of Python and my poor
understanding of some basic compsci/math principles have combined to
frustrate my efforts.

I was hoping that the the Perl/Python operator and function set would be
similar enough that, with only a few modifications, I could clone the
B:F.pm package pretty much "as is". All other bugs aside, I am stuck
trying to create a bitmask of a key :

"Given a key, hashes it using the list of salts and returns a bitmask
the same length as the Bloom filter. Note that Perl will pad the
bitmask out with zeroes so it's a muliple of 8."

# from the docs for the _make_bitmask method

Specifically, it's not clear to me :

* how to map the various Perl pack/unpack formats to their Python
equivalents (things like Perl's "b*" which is "a bit string, in
ascending bit order inside each byte" where "*" means to use all
the bytes from the input field")

* how to account for the fact that unpack returns a list in Perl and
Python returns a string

* how to pass a salt to the Python sha module

* how to XOR the pieces of an unpacked sha digest (which may simply
be a function of a bad format unpacking stuff)

* A 'vec' [4] function in Python; does one exist?

In short, I don't really have any idea what I'm doing :-(

As much as I relish the challenge purely on the grounds of learning new
stuff I am starting to reach the point of irrational frustration and
wanting to just get this done so I can move on to other things.

The tool should be written in Python because it's the best choice for
the job. The tool needs to be able to read/write Bloom Filters. Python
needs a Bloom Filter library, so I will step up and write it. Except it
appears to be beyond my skill level right now...

Any thoughts, suggestions or clue-bats would be very much welcomed.
Pointers to an already existing implementation that I've missed would be
even better but I'll take what I can get. :-)



[2] http://search.cpan.org/src/MCEGLOWS/...0.02/Filter.pm
[3] http://cvs.sourceforge.net/viewcvs.py/pybloom/
[4] http://www.perldoc.com/perl5.8.4/pod/func/vec.html

Jul 18 '05 #1
0 1569

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Kim | last post: by
2 posts views Thread by Jonathan Boivin | last post: by
3 posts views Thread by Mr 200 | last post: by
5 posts views Thread by MrPickle | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.