By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,427 Members | 1,378 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,427 IT Pros & Developers. It's quick & easy.

Most efficient way of storing 1024*1024 bits

P: n/a
Hi

I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.

regards tores
Nov 2 '05 #1
Share this Question
Share on Google+
15 Replies


P: n/a
C ?

Tor Erik Sønvisen wrote:
Hi

I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.

regards tores


Nov 2 '05 #2

P: n/a
"Tor Erik Sønvisen" <to***@stud.cs.uit.no> writes:
I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.


Umm, what kind of searches do you want to do? For speed you want to
use built-in functions wherever you can, string.find and that kind of
thing. So choose your data format accordingly.
Nov 2 '05 #3

P: n/a
"Tor Erik Sønvisen" <to***@stud.cs.uit.no> writes:
I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.


Six megabytes is pretty much nothing on a modern computer. I'd store
the things as a string of "0" and "1", and then use .find (or maybe
the in keyword) for doing the searches.

This doesn't work very well if you're going to mutate the string,
though.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 2 '05 #4

P: n/a
Mike Meyer <mw*@mired.org> writes:
Six megabytes is pretty much nothing on a modern computer. I'd store
the things as a string of "0" and "1", and then use .find (or maybe
the in keyword) for doing the searches.

This doesn't work very well if you're going to mutate the string,
though.


You can use re.search on array.array byte vectors. I don't know how
the speed compares with string.find.
Nov 2 '05 #5

P: n/a
Paul Rubin <http://ph****@NOSPAM.invalid> wrote:
...
You can use re.search on array.array byte vectors. I don't know how
the speed compares with string.find.
Pretty well, though of course one should measure on representative cases
for one's specific application needs:

Helen:~ alex$ python -mtimeit -s'import re, array' -s'x=999999*"x"'
-s'x=x+"a"+x' \ 'x.find("a")'

100 loops, best of 3: 13.3 msec per loop

Helen:~ alex$ python -mtimeit -s'import re, array' -s'x=999999*"x"'
-s'x=x+"a"+x' 're.search("a", x)'
100 loops, best of 3: 8.73 msec per loop

Helen:~ alex$ python -mtimeit -s'import re, array' -s'x=999999*"x"'
-s'x=x+"a"+x' -s'x=array.array("c",x)' 're.search("a", x)'
100 loops, best of 3: 8.72 msec per loop
Alex
Nov 2 '05 #6

P: n/a
On Wed, 2 Nov 2005 13:55:10 +0100, "Tor Erik Sønvisen" <to***@stud.cs.uit.no> wrote:
Hi

I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.

Very dependent on what kind of "searches" -- e.g., 1024*1024 suggests the
possibility of two dimensions. Quad-trees? How sparse is the data? Etc.
What kinds of patterns are you going to search for?

Regards,
Bengt Richter
Nov 2 '05 #7

P: n/a
On Wed, 02 Nov 2005 13:55:10 +0100, Tor Erik Sønvisen wrote:
Hi

I need a time and space efficient way of storing up to 6 million bits.
[inserts pinky in corner of mouth]

Six MILLION bits!!!

That's almost 750K. Are you sure your computer will handle that much data?

Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.


Could you be more vague if you tried? Searching for what? Does your data
have structure you can exploit? Can you put it in a tree structure? Are
you going to be inserting and deleting data?

If you can handle your six million bits in lots of eight, store them as a
character string.

If you can keep your data sorted, then you can do binary searches instead
of linear searches. If your data is hashable, you can store it in a
dictionary and potentially get up to constant-time search speeds.

Or forget about manipulating bits, and just store your data as bools in
a list.

Explain your problem a little better, and you may get some better advice.
--
Steven.

Nov 2 '05 #8

P: n/a
Tor Erik Sønvisen wrote:
Hi

I need a time and space efficient way of storing up to 6 million bits.
The most space-efficient way of storing bits is to use the bitwise
operators on an array of bytes:

import array

class BitList(object):
def __init__(self, data=None):
self._data = array.array('B')
self._length = 0
if data is not None:
self.extend(data)
def __len__(self):
return self._length
def __getitem__(self, index):
if index < 0:
index += self._length
if index < 0 or index >= self._length:
raise IndexError('BitList index out of range')
byte_index, bit_index = divmod(index, 8)
return int(bool(self._data[byte_index] & (1 << bit_index)))
def __setitem__(self, index, value):
if index < 0:
index += self._length
if index < 0 or index >= self._length:
raise IndexError('BitList index out of range')
byte_index, bit_index = divmod(index, 8)
byte = self._data[byte_index]
bitmask = 1 << bit_index
byte &= ~bitmask & 0xFF
if value:
byte |= bitmask
self._data[byte_index] = byte
def __repr__(self):
return 'BitList([%s])' % ', '.join(str(bit) for bit in self)
def append(self, value):
if not self._length % 8:
self._data.append(0)
self._length += 1
self[-1] = value
def extend(self, values):
for v in values:
self.append(v)
Time efficency is more important then space efficency
In that case, you're better off simply using a list of bools.
as I'm going to do searches through the bit-set.


What kind of searches?

Nov 2 '05 #9

P: n/a
BTW, it'd be 6 megabits or 750kb ;)
Six megabytes is pretty much nothing on a modern computer.


----== Posted via Newsgroups.com - Usenet Access to over 100,000 Newsgroups ==----
Get Anonymous, Uncensored, Access to West and East Coast Server Farms!
----== Highest Retention and Completion Rates! HTTP://WWW.NEWSGROUPS.COM ==----
Nov 3 '05 #10

P: n/a
Brandon K <pr***********@yahoo.com> wrote [inverting his topposting!]:
Six megabytes is pretty much nothing on a modern computer.
BTW, it'd be 6 megabits or 750kb ;)


....but Mike was proposing using one digit per bit, hence, 6 megabytes.
That makes it easy to search for bitpatterns with re or string.find; if
the bits were packed 8 to a byte, such searches would be very hard.
Alex
Nov 3 '05 #11

P: n/a

On 3 Nov 2005, at 05:03, Alex Martelli wrote:
Brandon K <pr***********@yahoo.com> wrote [inverting his topposting!]:

Six megabytes is pretty much nothing on a modern computer.


BTW, it'd be 6 megabits or 750kb ;)


...but Mike was proposing using one digit per bit, hence, 6 megabytes.
That makes it easy to search for bitpatterns with re or
string.find; if
the bits were packed 8 to a byte, such searches would be very hard.


They would just require some out-of-the-box thinking using character
arrays and stuff I think. It's definately still doable with regex's
if you really want to.
Nov 3 '05 #12

P: n/a
Alex Stapleton <al***@advfn.com> wrote:
...
Six megabytes is pretty much nothing on a modern computer.

BTW, it'd be 6 megabits or 750kb ;)


...but Mike was proposing using one digit per bit, hence, 6 megabytes.
That makes it easy to search for bitpatterns with re or
string.find; if
the bits were packed 8 to a byte, such searches would be very hard.


They would just require some out-of-the-box thinking using character
arrays and stuff I think. It's definately still doable with regex's
if you really want to.


"Doable", yes, of course -- that's pretty obvious, and I'm not sure
what's "out of the box" about it -- but orders of magnitude harder. For
example, to look for a fixed sequence of X bits, you'd have to search
for any of 8 sequences of slightly more than X/8 characters (where the
first and last are normally charsets) built by the possible shifts of
the bitsequence by 0, 1, ... 7 bits. I also suspect that performance
would badly suffer (dealing with 750 KB of data, plus all the auxiliary
stuff for Python and re, isn't going to fit in cache anyway). All in
all, doesn't feel worth pursuing, particularly when the OP mentioned
time mattering more than space right from the word go.
Alex
Nov 3 '05 #13

P: n/a
On Wed, 2 Nov 2005, Dan Bishop wrote:
Tor Erik Sønvisen wrote:
I need a time and space efficient way of storing up to 6 million bits.


The most space-efficient way of storing bits is to use the bitwise
operators on an array of bytes:


Actually, no, it's to xor all the bits together and store them in a single
boolean.

Getting them back out is kinda tricky though.
Time efficency is more important then space efficency


In that case, you're better off simply using a list of bools.


Unlikely - the indexing is a bit simpler, but the cache hit rate is going
to go through the floor.

tom

--
power to the people and the beats
Nov 3 '05 #14

P: n/a
Tom Anderson wrote:
On Wed, 2 Nov 2005, Dan Bishop wrote:
Tor Erik Sønvisen wrote:
I need a time and space efficient way of storing up to 6 million bits.


The most space-efficient way of storing bits is to use the bitwise
operators on an array of bytes:


Actually, no, it's to xor all the bits together and store them in a single
boolean.


I'd use 'or' rather than 'xor'. The or operator is more likely to yield
a '1' at the end of it, and a 1 is narrower than a 0, obviously making
it more efficient to store.

--
Ben Sizer

Nov 4 '05 #15

P: n/a

On 4 Nov 2005, at 10:26, Ben Sizer wrote:
Tom Anderson wrote:
On Wed, 2 Nov 2005, Dan Bishop wrote:

Tor Erik Sønvisen wrote:
I need a time and space efficient way of storing up to 6 million
bits.
The most space-efficient way of storing bits is to use the bitwise
operators on an array of bytes:


Actually, no, it's to xor all the bits together and store them in
a single
boolean.


I'd use 'or' rather than 'xor'. The or operator is more likely to
yield
a '1' at the end of it, and a 1 is narrower than a 0, obviously making
it more efficient to store.


<xahlee>
Typical gas guzzling, SUV driving american logic. A would obviously
use more POWER and hence INCREASE GLOBAL WARMING leading to the
ultimate DEATH of EVERYBODY you know and LOVE!
</xahlee>

Nov 4 '05 #16

This discussion thread is closed

Replies have been disabled for this discussion.