Introduction
This week's tip describes a few old tricks that are almost forgotten by most
people around here. Sometimes there's no need for these tricks anymore because
processors nowadays are so fast and memory comes in abundance. But still, if
we implement an algorithm that is better, or more efficient, than another one,
those faster processors run the first algorithm faster than the other one. If
an algorithm takes less memory to run, we can run larger instances of a problem
to solve in our larger memory computers.
Larger and faster computers are still no excuse to implement slow and/or memory
hogging algorithms. The next paragraphs describe two little problems and the
way they can be solved. The first paragraph deals with speed and the following
paragraph deal with memory requirements without sacrificing speed requirements.
Bit counting
For a plethora of reasons people want to know how many bits are set in a byte
or word. A naive approach to count those bit in an int is the following:
Expand|Select|Wrap|Line Numbers
- public int count1(int word) {
- int ones= 0; // the number of one-bits in the 'word'
- for (int mask= 0x00000001; mask != 0; mask<<= 1)
- if ((word&mask) != 0) ones++;
- return ones;
- }
variable; if the corresponding bit in the word is set we increment the 'ones'
counter. The method returns the 'ones' variable when the loop has finished.
No matter the value of the word, the loop loops 32 times because an int contains
32 bits and all need to be checked.
We can improve on this a bit: when a one bit has been found, we set it back to
zero again. If no one bits remain in the word variable we're done. This is how
it can be implemented:
Expand|Select|Wrap|Line Numbers
- public int count2(int word) {
- int ones= 0; // the number of one-bits in the 'word'
- for (int mask= 0x00000001; word != 0; mask<<= 1)
- if ((word&mask) != 0) {
- word&= ~mask; // remove the bit
- ones++;
- }
- return ones;
- }
method certainly runs more efficient than the previous count1 method. How much
more efficient exactly? If we consider a random 32 bit pattern, 50% of those
bit patterns has their leftmost bit set to one (check it). So in 50% of all the
possible values of 'word', count2 doesn't run more efficient than method count1.
For 2^31 possible values count2 needs 32 iterations, for 2^30 possible values
count2 needs only 31 iterations etc. etc. all the way to 2^0 possible values
the count2 method needs zero iterations; that's for value 0 where the loop body
is skipped entirely.
Mathematically speaking, on average the count2 method needs:
sum(i= 0, 31: 2^i*(i+1))/2^32 == 31 iterations.
Big deal, on average we're saving one single iteration. We must be able to do
better than this. Have a look at the following notion:
For every bit pattern with at least one bit set to one, we can represent the
bit pattern as follows:
xxx ... xxxx1000 ... 0000
There are bits 'xxx' to the left of the 1 bit and zero or more 0 bits to the right
of the 1 bit.
If we subtract 1 from that bit pattern we obtain the bit pattern:
xxx ... xxxx0111 ... 1111
If we bitwise 'and' both bit patterns we get the bit pattern:
xxx ... xxxx0000 ... 0000
We have effectively removed the rightmost 1 bit from the original bit pattern.
We have done this:
Expand|Select|Wrap|Line Numbers
- pattern&= pattern-1;
Expand|Select|Wrap|Line Numbers
- public int count3(int word) {
- int ones= 0; // the number of one-bits in the 'word'
- for (; word != 0; ones++)
- word&= word-1;
- return ones;
- }
equals zero. On average in a 32 bit word there are 16 bits set to 1. So this
count3 method certainly does better than the count1 and count2 methods. count2
wasn't much better than count1 one, but count3 takes only half of the loop passes.
Can we do better than this? Sure, mathematically speaking we can: we simply build
a table with 2^32 elements such that table[i] == j if the bit pattern i contains
j bits set to 1. Sure, mathematicians are loonies: we can't build a table or array
with 2^32 elements; even nowadays we need humongous amounts of memory for that.
We can build such a table for the 256 byte values though; the table would look
something like this:
Expand|Select|Wrap|Line Numbers
- byte[] table= { 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4 ... 8 };
Expand|Select|Wrap|Line Numbers
- public int count4(int word) {
- int ones;
- for (int byte= 0; byte < 4; byte++) {
- ones+= table[word&0xff];
- word>>>= 8;
- }
- return ones;
- }
in an int using Java) so this algorithm takes only four loop passes ever. But
it does need a 256 byte table. Can we do without such a table and still do better
than the count3 method?
We need 2 bits to store the number of ones in a two bit wide word. There are
at most 2 bits equal to 1 in a two bits wide word and the value 2 needs two bits
by itself. We only need 6 bits to store a value in the range [0, 32]. At most
32 bits can be equal to one in a 32 bits wide word.
How can we find the number of bits set to one in a two bit wide word? Easy:
Expand|Select|Wrap|Line Numbers
- int word; // just two bits in here, zero or one
- int ones= (word&1)+(word>>>1);
store 16 two bit counters for the 16 two bit words that make up the single 32
bit word:
Expand|Select|Wrap|Line Numbers
- int word; // 32 bits wide word interpreted as 16 two bit words;
- int ones= (word&0x55555555)+((word&0xaaaaaaaa)>>>1);
of bits set to one if we interpret the 32 bit word value as 16 two bit small words.
All we need to do now is add those 16 two bit values in the 'ones' int and we
have the number of bits set to 1 in the word int.
We can add 16 two bit values stored in one 32 bit word using the same trick:
instead of the previous masks we use 0x33333333 and 0xccccccc to mask out the
16 two bit values and add them together again; we have 4 bits to store the sum.
But then again we can repeat this little procedure over and over again:
Expand|Select|Wrap|Line Numbers
- public int count5(int word) {
- word= (word&0x55555555)+((word&0xaaaaaaaa)>>> 1); // sum 1 bits
- word= (word&0x33333333)+((word&0xcccccccc)>>> 2); // sum 2 bits
- word= (word&0x0f0f0f0f)+((word&0xf0f0f0f0)>>> 4); // sum 4 bits;
- word= (word&0x00ff00ff)+((word&0xff00ff00)>>> 8); // sum 8 bits;
- word= (word&0x0000ffff)+((word&oxffff0000)>>>16); // sum 16 bits;
- return word;
- }
- count1: steps: 32
- count2: steps: 31 on average
- count3: steps: 16 on average
- count4: steps: 4 + 256 byte explicit table
- count5: steps: 5 + 40 byte implicit table
look like this then:
- count1: steps: n
- count2: steps: n-1 on average
- count3: steps: n/2 on average
- count4: steps: n/8 + 256 byte explicit table
- count5: steps: 2log(n) + n/8*2*2log(n) byte implicit table
requirements grows logarithmically with n. The first four algorithms grow
linearly proportional with n; pick your choice.
Memory manipulation
We like to move memory around almost continuously; it seems as if we can't make
up our mind where we want our data to be stored and we keep on dragging those
bytes around over and over again. For example: I wrote the first paragraph of
this article after I wrote this second paragraph.
Effecively I moved the second paragraph further down in memory to make room for
the first paragraph. And now I'm appending more data to the rest of the text.
Suppose I have a sequence of memory cell; for the examples I use plain text but
for the development of the algorithms I'll use a more abstract notion: memory
consists of cells that are indexed using the index numbers 0, 1, 2, ... n.
The first thing I want to be able to do is to swap the content of two cells i and j,
so this hypothetical memory implements the following interface:
Expand|Select|Wrap|Line Numbers
- public interface Swapeable {
- public void swap(int i, int j);
- }
memory content and two index values i and j
Expand|Select|Wrap|Line Numbers
- +---+---+---+---+---+---+---+---+---+---+---+---+
- | H | e | l | l | o | | W | o | r | l | d | ! |
- +---+---+---+---+---+---+---+---+---+---+---+---+
- ^ ^
- | |
- i j
Expand|Select|Wrap|Line Numbers
- public void reverse(Swapeable s, int i, int j) {
- for(; --j > i; i++)
- s.swap(i, j);
- }
Expand|Select|Wrap|Line Numbers
- +---+---+---+---+---+---+---+---+---+---+---+---+
- | H | e | l | o | W | | o | l | r | l | d | ! |
- +---+---+---+---+---+---+---+---+---+---+---+---+
- ^ ^
- | |
- i j
more we can do using just that simple reverse() method and not so many people
realize that fact.
Suppose I reverse this part:
Expand|Select|Wrap|Line Numbers
- +---+---+---+---+---+---+---+---+---+---+---+---+
- | H | e | l | l | o | | W | o | r | l | d | ! |
- +---+---+---+---+---+---+---+---+---+---+---+---+
- ^ ^
- | |
- i j
Expand|Select|Wrap|Line Numbers
- +---+---+---+---+---+---+---+---+---+---+---+---+
- | d | l | r | o | W | | o | l | l | e | H | ! |
- +---+---+---+---+---+---+---+---+---+---+---+---+
- ^ ^
- | |
- i j
Expand|Select|Wrap|Line Numbers
- +---+---+---+---+---+---+---+---+---+---+---+---+
- | d | l | r | o | W | | o | l | l | e | H | ! |
- +---+---+---+---+---+---+---+---+---+---+---+---+
- ^ ^
- | |
- i j
Expand|Select|Wrap|Line Numbers
- +---+---+---+---+---+---+---+---+---+---+---+---+
- | W | o | r | l | d | | o | l | l | e | H | ! |
- +---+---+---+---+---+---+---+---+---+---+---+---+
- ^ ^
- | |
- i j
Expand|Select|Wrap|Line Numbers
- +---+---+---+---+---+---+---+---+---+---+---+---+
- | W | o | r | l | d | | o | l | l | e | H | ! |
- +---+---+---+---+---+---+---+---+---+---+---+---+
- ^ ^
- | |
- i j
Expand|Select|Wrap|Line Numbers
- +---+---+---+---+---+---+---+---+---+---+---+---+
- | W | o | r | l | d | | H | e | l | l | o | ! |
- +---+---+---+---+---+---+---+---+---+---+---+---+
- ^ ^
- | |
- i j
of the memory: B1 B2 B3 where B1 == Hello, B2 == <space> and B3 == World. The
result is this B3 B2 B1. The '!' sign wasn't used nor touched. If block B2
contains more than one element I had to reverse that block as well.
Even more important: no additional temporary memory was used except for a single
memory cell used by the swap() method itself. In the early days when memory was
measured in Kilobytes or even single bytes instead of Gigabytes this little trick
was a precious little gem and used a lot. But what keeps you from still using it?
In the following pseudo code I use the notation 'b' for a block B reversed.
The steps needed are:
1: reverse B1 B2 B3 yielding b3 b2 b1
2: reverse b3 yielding B3 b2 b1
3: reverse b2 yielding B3 B2 b1
4: reverse b1 yielding B3 B2 B1
Here's the code for it:
Expand|Select|Wrap|Line Numbers
- // assume ibeg <= iend <= jbeg <= jend
- public void move(Swapeable s, int ibeg, int iend, int jbeg, int jend) {
- int ilen= iend-ibeg;
- int jlen= jend-jbeg;
- reverse(s, ibeg, jend);
- reverse(s, ibeg, ibeg+jlen);
- reverse(s, ibeg+jlen, jend-ilen);
- reverse(s, jend-ilen, jend);
- }
method interchanges two blocks of memory; the blocks don't need to be of the
same size; the first block needs to be to the left of the second block; it doesn't
take rocket science to extend this method so the two blocks can be located
anywhere in memory, i.e. the first block doesn't need to be located to the left
of the second block; (hint: write a little wrapper method).
Concluding remarks
Those two tricks I described here are proud members of my bag of (old) tricks
because they come to good use now and then although I do need to dust them off.
Maybe I'll describe a couple more (old) tricks in the near future. For now,
please understand the details of those two tricks; it's worth the time.
I hope we'll meet again here next week and
kind regards,
Jos