473,386 Members | 1,720 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Most efficient way of storing 1024*1024 bits

Hi

I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.

regards tores
Nov 2 '05 #1
15 2295
C ?

Tor Erik Sønvisen wrote:
Hi

I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.

regards tores


Nov 2 '05 #2
"Tor Erik Sønvisen" <to***@stud.cs.uit.no> writes:
I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.


Umm, what kind of searches do you want to do? For speed you want to
use built-in functions wherever you can, string.find and that kind of
thing. So choose your data format accordingly.
Nov 2 '05 #3
"Tor Erik Sønvisen" <to***@stud.cs.uit.no> writes:
I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.


Six megabytes is pretty much nothing on a modern computer. I'd store
the things as a string of "0" and "1", and then use .find (or maybe
the in keyword) for doing the searches.

This doesn't work very well if you're going to mutate the string,
though.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 2 '05 #4
Mike Meyer <mw*@mired.org> writes:
Six megabytes is pretty much nothing on a modern computer. I'd store
the things as a string of "0" and "1", and then use .find (or maybe
the in keyword) for doing the searches.

This doesn't work very well if you're going to mutate the string,
though.


You can use re.search on array.array byte vectors. I don't know how
the speed compares with string.find.
Nov 2 '05 #5
Paul Rubin <http://ph****@NOSPAM.invalid> wrote:
...
You can use re.search on array.array byte vectors. I don't know how
the speed compares with string.find.
Pretty well, though of course one should measure on representative cases
for one's specific application needs:

Helen:~ alex$ python -mtimeit -s'import re, array' -s'x=999999*"x"'
-s'x=x+"a"+x' \ 'x.find("a")'

100 loops, best of 3: 13.3 msec per loop

Helen:~ alex$ python -mtimeit -s'import re, array' -s'x=999999*"x"'
-s'x=x+"a"+x' 're.search("a", x)'
100 loops, best of 3: 8.73 msec per loop

Helen:~ alex$ python -mtimeit -s'import re, array' -s'x=999999*"x"'
-s'x=x+"a"+x' -s'x=array.array("c",x)' 're.search("a", x)'
100 loops, best of 3: 8.72 msec per loop
Alex
Nov 2 '05 #6
On Wed, 2 Nov 2005 13:55:10 +0100, "Tor Erik Sønvisen" <to***@stud.cs.uit.no> wrote:
Hi

I need a time and space efficient way of storing up to 6 million bits. Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.

Very dependent on what kind of "searches" -- e.g., 1024*1024 suggests the
possibility of two dimensions. Quad-trees? How sparse is the data? Etc.
What kinds of patterns are you going to search for?

Regards,
Bengt Richter
Nov 2 '05 #7
On Wed, 02 Nov 2005 13:55:10 +0100, Tor Erik Sønvisen wrote:
Hi

I need a time and space efficient way of storing up to 6 million bits.
[inserts pinky in corner of mouth]

Six MILLION bits!!!

That's almost 750K. Are you sure your computer will handle that much data?

Time
efficency is more important then space efficency as I'm going to do searches
through the bit-set.


Could you be more vague if you tried? Searching for what? Does your data
have structure you can exploit? Can you put it in a tree structure? Are
you going to be inserting and deleting data?

If you can handle your six million bits in lots of eight, store them as a
character string.

If you can keep your data sorted, then you can do binary searches instead
of linear searches. If your data is hashable, you can store it in a
dictionary and potentially get up to constant-time search speeds.

Or forget about manipulating bits, and just store your data as bools in
a list.

Explain your problem a little better, and you may get some better advice.
--
Steven.

Nov 2 '05 #8
Tor Erik Sønvisen wrote:
Hi

I need a time and space efficient way of storing up to 6 million bits.
The most space-efficient way of storing bits is to use the bitwise
operators on an array of bytes:

import array

class BitList(object):
def __init__(self, data=None):
self._data = array.array('B')
self._length = 0
if data is not None:
self.extend(data)
def __len__(self):
return self._length
def __getitem__(self, index):
if index < 0:
index += self._length
if index < 0 or index >= self._length:
raise IndexError('BitList index out of range')
byte_index, bit_index = divmod(index, 8)
return int(bool(self._data[byte_index] & (1 << bit_index)))
def __setitem__(self, index, value):
if index < 0:
index += self._length
if index < 0 or index >= self._length:
raise IndexError('BitList index out of range')
byte_index, bit_index = divmod(index, 8)
byte = self._data[byte_index]
bitmask = 1 << bit_index
byte &= ~bitmask & 0xFF
if value:
byte |= bitmask
self._data[byte_index] = byte
def __repr__(self):
return 'BitList([%s])' % ', '.join(str(bit) for bit in self)
def append(self, value):
if not self._length % 8:
self._data.append(0)
self._length += 1
self[-1] = value
def extend(self, values):
for v in values:
self.append(v)
Time efficency is more important then space efficency
In that case, you're better off simply using a list of bools.
as I'm going to do searches through the bit-set.


What kind of searches?

Nov 2 '05 #9
BTW, it'd be 6 megabits or 750kb ;)
Six megabytes is pretty much nothing on a modern computer.


----== Posted via Newsgroups.com - Usenet Access to over 100,000 Newsgroups ==----
Get Anonymous, Uncensored, Access to West and East Coast Server Farms!
----== Highest Retention and Completion Rates! HTTP://WWW.NEWSGROUPS.COM ==----
Nov 3 '05 #10
Brandon K <pr***********@yahoo.com> wrote [inverting his topposting!]:
Six megabytes is pretty much nothing on a modern computer.
BTW, it'd be 6 megabits or 750kb ;)


....but Mike was proposing using one digit per bit, hence, 6 megabytes.
That makes it easy to search for bitpatterns with re or string.find; if
the bits were packed 8 to a byte, such searches would be very hard.
Alex
Nov 3 '05 #11

On 3 Nov 2005, at 05:03, Alex Martelli wrote:
Brandon K <pr***********@yahoo.com> wrote [inverting his topposting!]:

Six megabytes is pretty much nothing on a modern computer.


BTW, it'd be 6 megabits or 750kb ;)


...but Mike was proposing using one digit per bit, hence, 6 megabytes.
That makes it easy to search for bitpatterns with re or
string.find; if
the bits were packed 8 to a byte, such searches would be very hard.


They would just require some out-of-the-box thinking using character
arrays and stuff I think. It's definately still doable with regex's
if you really want to.
Nov 3 '05 #12
Alex Stapleton <al***@advfn.com> wrote:
...
Six megabytes is pretty much nothing on a modern computer.

BTW, it'd be 6 megabits or 750kb ;)


...but Mike was proposing using one digit per bit, hence, 6 megabytes.
That makes it easy to search for bitpatterns with re or
string.find; if
the bits were packed 8 to a byte, such searches would be very hard.


They would just require some out-of-the-box thinking using character
arrays and stuff I think. It's definately still doable with regex's
if you really want to.


"Doable", yes, of course -- that's pretty obvious, and I'm not sure
what's "out of the box" about it -- but orders of magnitude harder. For
example, to look for a fixed sequence of X bits, you'd have to search
for any of 8 sequences of slightly more than X/8 characters (where the
first and last are normally charsets) built by the possible shifts of
the bitsequence by 0, 1, ... 7 bits. I also suspect that performance
would badly suffer (dealing with 750 KB of data, plus all the auxiliary
stuff for Python and re, isn't going to fit in cache anyway). All in
all, doesn't feel worth pursuing, particularly when the OP mentioned
time mattering more than space right from the word go.
Alex
Nov 3 '05 #13
On Wed, 2 Nov 2005, Dan Bishop wrote:
Tor Erik Sønvisen wrote:
I need a time and space efficient way of storing up to 6 million bits.


The most space-efficient way of storing bits is to use the bitwise
operators on an array of bytes:


Actually, no, it's to xor all the bits together and store them in a single
boolean.

Getting them back out is kinda tricky though.
Time efficency is more important then space efficency


In that case, you're better off simply using a list of bools.


Unlikely - the indexing is a bit simpler, but the cache hit rate is going
to go through the floor.

tom

--
power to the people and the beats
Nov 3 '05 #14
Tom Anderson wrote:
On Wed, 2 Nov 2005, Dan Bishop wrote:
Tor Erik Sønvisen wrote:
I need a time and space efficient way of storing up to 6 million bits.


The most space-efficient way of storing bits is to use the bitwise
operators on an array of bytes:


Actually, no, it's to xor all the bits together and store them in a single
boolean.


I'd use 'or' rather than 'xor'. The or operator is more likely to yield
a '1' at the end of it, and a 1 is narrower than a 0, obviously making
it more efficient to store.

--
Ben Sizer

Nov 4 '05 #15

On 4 Nov 2005, at 10:26, Ben Sizer wrote:
Tom Anderson wrote:
On Wed, 2 Nov 2005, Dan Bishop wrote:

Tor Erik Sønvisen wrote:
I need a time and space efficient way of storing up to 6 million
bits.
The most space-efficient way of storing bits is to use the bitwise
operators on an array of bytes:


Actually, no, it's to xor all the bits together and store them in
a single
boolean.


I'd use 'or' rather than 'xor'. The or operator is more likely to
yield
a '1' at the end of it, and a 1 is narrower than a 0, obviously making
it more efficient to store.


<xahlee>
Typical gas guzzling, SUV driving american logic. A would obviously
use more POWER and hence INCREASE GLOBAL WARMING leading to the
ultimate DEATH of EVERYBODY you know and LOVE!
</xahlee>

Nov 4 '05 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
by: cppaddict | last post by:
Hi, I am going to have a series of bit flags which I could store in an array, or as a string ("10011001"), or any other way. I want to be able to turn this series of bits into an int. I know...
2
by: Thomas Paul Diffenbach | last post by:
I'm trying to write a space efficient string (nul terminated array of char), storing the characters directly unless the number of characters is too large to be so stored, and storing a pointer to...
53
by: Zhiqiang Ye | last post by:
Hi, All I am reading FAQ of this group. I have a question about this: http://www.eskimo.com/~scs/C-faq/q7.31.html It says: " p = malloc(m * n); memset(p, 0, m * n); The zero fill is...
2
by: s.subbarayan | last post by:
Dear all, I have one peculiar problem with me for which I need ur inputs how to implement it: I have 2 registers each 8 bit wide. The first register stores: Register Map:...
14
by: vib | last post by:
Hi there, union UC32 { unsigned int L; unsigned short S; unsigned char C; } UC32; with the above structure, I am able to shift bits of C, into C, and C into C so on and so forth till C as...
2
by: James Dean | last post by:
Could anybody tell me the most efficient way to set the bits in a 1Bpp class. I am reading in byte lines of an image. The byte array tells me what what bits are turned on or off.....i do bit...
11
by: hoopsho | last post by:
Hi Everyone, I am trying to write a program that does a few things very fast and with efficient use of memory... a) I need to parse a space-delimited file that is really large, upwards fo a...
15
by: steve yee | last post by:
i want to detect if the compile is 32 bits or 64 bits in the source code itself. so different code are compiled respectively. how to do this?
3
by: DBuss | last post by:
I'm reading a stream of binary data off of a UDP socket. I've got 2 integers to read, one is four bytes long and is in buffer to buffer, the other is length 1 and in buffer. Doing the following...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.