Object of Combination

By Bo Xu

Introduction

A combination of n things, taken s at a time, often referred as an

s-combination out of n, is a way to select a subset of size s from a

given set of size n. There are n!/(s!(n-s)!) ways to do this. Donald

E. Knuth gives several methods (algorithms) to generate all the

s-combinations in [1]. In such procedure-oriented way, each

s-combination is processed while it's being generated. In some

situations, this "generate-all-at-one-time" procedure is not properly

applicable. Instead, an s-combination on demanded is needed. In this

article, s-combination is handled as object so it bears the merits of

object-oriented programming, namely, usability and reusability.

In this article, we are going to:

Describe and implement three algorithms: increasing a given

s-combination to its next s-combination in their lexicographic order;

decreasing a given s-combination to its previous s-combination in

their lexicographic order; and getting the s-combination out of its

lexicographic number.

Define and implement the class Comb. An s-combination is represented

by an object of this class. Comb is a finite number system with its

objects ordered by the lexicographic order of the under-lying

s-combinations. Operator ++, --, and == are defined and the capability

of getting an object of random s-combination is provided.

Extend the class Comb to class template Comb<int S, int N> so that

the s-combination objects are typed with different s and n.

Further generalize Comb<int S, int N> to Comb<int S, int N, typename

T>. T is any class of super set of built-in type "unsigned int". With

T being a huge integer type, we are able to handle combinations of

large sets.

We represent the set of n things as a set of integers ranging from 0

to n-1, or {0, 1,
, n-1}. An integer array v with size s holds an

s-combination of n integers, and satisfies:

0 <= v[0] < v[1] <
< v[s-1] < n

An equivalent binary string of size n can also be used for an

s-combination of n:

b0 b1 b2 b3
bn-1

If bi is 1, integer i is selected otherwise it is 0, i is unselected.

There are s 1s in such a binary string.

All the s-combinations out of n can be arranged in lexicographic

order. Each combination has its lexicographic number from 0 to

n!/(s!(n-s)!) 1. In the integer array representation, v[0] is the

least significant position and v[s-1] is the most significant one. The

combination with lowest lexicographic number 0 has its v[0] = 0,
,

v[i] = i,
, v[s-1] = s 1. The highest lexicographic number

n!/(s!(n-s)!) 1 has v[0] = n s ,
, v[i] = n s + i ,
, v[s-1] =

n 1.

Here is the list of all combinations of s = 3, n = 6 in lexicographic

order:

Example 1

v[0] v[1] v[2] b0 b1 b2 b3 b4 b5 Lexicographic

0 1 2 1 1 1 0 0 0 0

0 1 3 1 1 0 1 0 0 1

0 2 3 1 0 1 1 0 0 2

1 2 3 0 1 1 1 0 0 3

0 1 4 1 1 0 0 1 0 4

0 2 4 1 0 1 0 1 0 5

1 2 4 0 1 1 0 1 0 6

0 3 4 1 0 0 1 1 0 7

1 3 4 0 1 0 1 1 0 8

2 3 4 0 0 1 1 1 0 9

0 1 5 1 1 0 0 0 1 10

0 2 5 1 0 1 0 0 1 11

1 2 5 0 1 1 0 0 1 12

0 3 5 1 0 0 1 0 1 13

1 3 5 0 1 0 1 0 1 14

2 3 5 0 0 1 1 0 1 15

0 4 5 1 0 0 0 1 1 16

1 4 5 0 1 0 0 1 1 17

2 4 5 0 0 1 0 1 1 18

3 4 5 0 0 0 1 1 1 19

Combination Generating Algorithms

Algorithm 1: Increase a given combination to its next combination in

the lexicographic order.

All s-combinations forms a finite set of size n!/(s!(n-s)!). The

combination with the highest lexicographic number, n!/(s!(n-s)!) 1,

has no normal successor that is lexicographical ly higher. We define

the successor of the highest combination to be the one of

lexicographic number 0. In Example 1, the combination 3 4 5 has

the highest lexicographic 19, increasing it will result to the

combination 0 1 2 of lexicographic 0.

In Listing 1, the function increaseComb(UI NT_VECTOR &v, unsigned int

n) increases the combination v. The argument "n" is the set size, and

the size of vector v, namely v.size( ), is s for s-combination out of

n. This algorithm is based on "Algorithm L, lexicographic

combinations" of D. E. Knuth's work [1].

Algorithm 2: Decrease a given combination to its previous combination

in the lexicographic order.

As for the increase, we define that the previous of the lowest

combination (lexicographic number 0) to be the highest combination of

lexicographic number n!/(s!(n-s)!) - 1. So, decreasing combination

0-1-2 results combination 3-4-5 in Example 1.

Listing 2 is the code.

Algorithm 3: Getting a combination out of its lexicographic number.

Given an integer lex, where 0 <= lex < n!(s!(n-s)!), function

lex2Comb( ) in Listing 3 generates an s-combination with lexicographic

number lex . First the combination in its binary format is determined.

The vector variable selector[] is used to hold the binary

representation in reverse order. For example, selector[0] =1 indicates

n 1 is selected, selector[0] = 0 indicates n-1 is not selected. In

the same way, selector[n-1] determines whether the number 0 is

selected or not. The function combCount(s, n) calculates the value of

n!/(s!(n-s)!).

This algorithm basically determines whether the current highest

number (initially, n 1) is selected or not:

case 1: lex is 0, the result combination is the lowest of "s out of

n". It is 0, 1,
, s 1. The problem is solved.

case 2: lex < combCount(s, n 1), this means n 1 is not included

in the resultant combination, so the problem scale is reduced to

finding the combination of "s out of n 1" with the lexicographic

number lex.

case 3: lex >= combCount(s, n 1), this means n -1 is chosen. The

problem scale is reduced to finding the combination of "s 1 out of n

1" with the lexicographic number lex combCount(s, n 1).

For understanding this algorithm, let us take a look at the "3 out of

6" combinations in Example 1. Given lex = 8. The following table shows

the process of finding the combination.

n s lex Highest number combCount(s, n 1) Comparison Result

6 3 8 5 10 lex < combCount(3, 5) 5 not selected

5 3 8 4 4 lex > combCount(3, 4) 4 selected

4 2 4 3 3 lex > combCount(2, 3) 3 selected

3 1 1 2 2 lex < combCount(1, 2) 2 not selected

2 1 1 1 1 lex == combCount(1, 1) 1 selected

Terminate here, since we already got three selections. The result

combination is 1, 3, 4.

The Comb Class

With these three algorithms implemented, we define the class Comb to

represent combinations in objects. Listing 4 is the declaration of

class Comb.

The constant S and constant N specify that the class is for

combinations "S out of N".

The actual numbers made up the combination are held in the data

member m_v which is a vector of size S.

The method getLex( ) returns the lexicographic number of the

combination. The way to calculate the lexicographic number out of its

combination is described as "Theorem L" in [1]. The algorithm is

pretty simple.

If the least significant element, m_v[0] gets its maximal value of N

S, the combination is the highest in lexicographic order. On the

other hand, if the most significant element, m_v[S 1], gets its

minimal vlaue of S 1, the combination is the lowest in lexicographic

order. The methods isMin( ) and isMax( ) reflect this fact.

The default copy constructor and the default assignment operator are

good enough for this class, so we do not need to define our own.

As shown in Listing 5, constructor, Comb( ), constructs an object of

lexicographic number 0, while Comb(unsigned int lex) constructs an

object of lexicographic number lex.

With the function increaseComb( ) and decreaseComb( ), the

implementation of operator++ and operator-- is straightforward as in

Listing 6.

The boolean operator "==" is implemented by comparing the member

vector m_v's elements of same significance. It seems that ">" and "<"

operators should also be defined in the same way. However, since our

definition of operator ++ rounds the highest combination to the lowest

one, given two objects c1 and c2, if c1 < c2 is true, ++c1 < ++ c2 is

not necessarily true when c2 is of max lexicographic number. For this

reason, there are no operator ">" nor "<" defined.

Use the operator ++, all the combinations can be listed with the

following code:

Comb c;

for ( ; !c.isMax( ); ++c)

c.print( );

c.print();

Given a random unsigned integer r, Comb c(r) makes a random

combination of "s out of n".

The Comb<S, N> Class

For the class Comb, there are two constants, S and N, which define the

combinations to be "S out of N". These two constants are specified

with preprocessor directive "#define". Obviously, this is not a

flexible way. First, each time when a different kind of combination is

required, the source code has to be modified to re-define these two

constants. Second, it is not possible to have objects of different

combinations in the same program. Thanks to C++ templates, we can make

class Comb a class template with S and N being the template

parameters.

Instead of declaring

#define S some-number

#define N some-number

class Comb {...};

We declare

template <unsigned int S, unsigned int N>

class Comb {...};

The class Comb becomes class template Comb<S, N>. All the method

definitions should be modified accordingly, for example, the prefix ++

operator, now, looks as:

template <unsigned int S, unsigned int N>

const Comb<S, N> Comb<S, N>::operator++( int) //postfix

{ Comb<S, N> oldValue = *this;

++(*this);

return oldValue;

}

To get an object of 3 out 6 combination, we define a variable:

Comb<3, 6> c1;

While to get a combination of 5 out of 10, we define:

Comb<5, 10> c2;

The following assignment will cause compile time error.

c1 = c2; //error, they are two different types.

One important fact is that the assertion (N >= S) && (S > 0) should be

held, otherwise, an instantiation of class template Comb<S, N> will

have undefined behavior. Ideally, the assertion failure could be

determined at compile time. Unfortunately there is no way to do so.

The current solution is let the constructors throw exceptions at run

time if the assertion failed. In doing so, The ANSI assert macro,

assert(N >= S && S > 0) is put at the beginning of the two

constructors.

The Comb<unsigned int, unsigned int, typename T> Class

The number of combinations grows fast as N increases and S gets close

to N/2. For example, N = 50, S = 6, the total number of combinations

is 15,890,700, while with N = 50, S = 10, it becomes 10,272,278,170. A

common 32-bit computer has unsigned integer up to 4,294,967,296. So,

the class Comb<10, 50> can not be handled with 32-bit unsigned

integer.

The function combCount(s, n) computing n!/(s!(n-s)!) is the one that

gets over-flowed. We can get rid of this function and reduce some

features of class template Comb<S, N>. Without combCount( ), the

function lex2Comb( ) in Listing 3 can not be implemented. In class

template Comb<S, N>, the following, one constructor and two methods,

can not be implemented and have to be removed:

explicit Comb(unsigned int lex);

const unsigned int getLex() const;

void setValue(const unsigned int lex);

With this feature-reduced class template, we can instantiate Comb<10,

50> and generate all the objects sequentially (that is a giant list).

A constructor, like Comb(int seed) may be added to construct an object

of random combination, but this can only get non-deterministic result.

If we want the full features of the combination class and the ability

to handle huge combinations, a bigger integer for the function

combCount( ) and related methods have to be used. Originally, the

return type of combCount( ) is "hard-coded" to be "unsigned int" as:

unsigned int combCount(unsig ned int s, unsigned int n) { ...... };

To make it possible for returning some other integral type, this code

can be changed as:

#typedef [some integral type, default is unsigned int] T;

T combCount(unsig ned int s, unsigned int n){ ...... };

However, using preprocessor directive #typedef has the same

inflexibility as mentioned before with #define: source code has to be

modified to use a different integral type and it is impossible to

handle different integral types in the same program. Again, thanks to

C++ template, we make T as template parameter. Listing 7 is the

definition of function template combCount<T>(un signed int s, unsigned

int n). With T be 64-bit or even greater integral type, we are able to

handle bigger combination classes.

Now, we declare

template <unsigned int S, unsigned int N, typename T = unsigned int>

class Comb {...};

The default type for T is "unsigned int". The three related functions

are declared as

explicit Comb(T lex);

const T getLex() const;

void setValue(const T lex);

The helper function lex2Comb( ) is defined as

template <typename T>

void lex2Comb(T lex, unsigned int s, unsigned int n, UINT_VECTOR &v){

.... };

Of cause, all the definitions of constructors and methods should be

re-written a little bit. The constructor Comb<T lex> looks as:

template <unsigned int S, unsigned int N, typename T>

Comb<S, N, T>::Comb(T lex)

{ assert((N >= S) && (S > 0));

m_v.resize(S);

lex2Comb<T>(lex , S, N, m_v);

}

On MS Windows, there is 64-bit integral type, "__int64". With this, we

can instantiate our class template to bigger combination class that

can not be handled with 32-bit integer. With the following definition,

Comb<10, 50, __int64> c(10000000000);

c is the 10,000,000,001t h combination of 10 out of 50. The output of

c.print( ) is

0 7 11 15 26 33 36 41 48 49

The rule of California Super Lotto Plus is "Pick five numbers from 1

to 47 and a MEGA number from 1 to 27". The total of combinations of 5

out of 47 is 1,533,939. This can be handled with T being the default

unsigned int. The following gets the 1,000,001th combination object of

5 out of 47:

Comb<5, 47> c(1000000);

c.print( );

The output is: 7 15 21 32 43. Our representation with range "0 to

46" is slightly different from the lotto's range of "1 to 47", so this

output should be interpreted as 8 16 22 33 44. You can try your

luck with this combination, but don't forget to pick a MEGA number

from 1 to 27.

Conclusions

C++ templates programming appears to be a difficult task. The

impressive complication of Standard Template Library (STL) makes an

ordinary programmer feel such a task is formidable. However, if we

just take C++ templates as a tool or mechanism to scale up or expand

already well-designed classes or algorithms, it can be very helpful.

In this article, hopefully, we not only provide a useful class but

also demonstrate the rational design path of using templates through

the evolution of class Comb to class template Comb<S, N, T>. The moral

is: in the initial design stage, forget about templates, focus on

making useful classes or algorithms, then take a close look at the

"#define" and "#typedef" items. If some of these items fluctuate as

required, use templates to expand the initial design by moving

fluctuating preprocessor directive items to be template parameters.

Notes and References

[1] Donald E. Knuth, The Art of Computer Programming , Volume 4,

Combinatorial Algorithms. Pre-Fascicle 2c (Generating all

combinations) http://www-cs-faculty.stanford.edu/~knuth/

[2] The companion source code has been built with Visual C++ 7 and

tested on Windows XP. You can get it for free by email request.

Listing 1

typedef std::vector<uns igned int> UINT_VECTOR;

void increaseComb(UI NT_VECTOR &v, unsigned int n)

{ unsigned int i;

unsigned int j = 0;

for (i = 0; i < v.size() - 1; i++)

{ if (v[i] + 1 == v[i+1])

v[i] = j++;

else break;

}

if (i >= v.size() - 1)//no j found

{ j = v.size() - 1;

if (v[j] >= n - 1) //if highest lex reached?

v[j] = j; //round to lowest lex

else v[j]++;

}else v[j]++;

}

Listing 2

void decreaseComb(UI NT_VECTOR &v, unsigned int n)

{ unsigned int i;

for (i = 0; i < v.size(); i++)

{ if (v[i] > i)

{ v[i]--;

break;

}

}

if (i == v.size()) //the lowest reached

{ i = v.size() - 1;

v[i] = n - 1; //round to highest

}

int j, k;

for (j = i - 1, k = 1 ; j >= 0; j--, k++)

v[j] = v[i] - k;

}

Listing 3

void lex2Comb(unsign ed int lex, unsigned int s, unsigned int n,

UINT_VECTOR &result)

{ std::vector<cha r> selector(n);

unsigned int i;

for (i = 0; i < n; i++)

selector[i] = 0;

unsigned int curLex = lex;

unsigned int curCombLen = s;

unsigned int nSelected = 0;

for (i = 0; (i < n) && (curLex > 0); i++){

unsigned int combCnt = CombCount(curCo mbLen, n - i - 1);

if (curLex < combCnt)

selector[i] = 0;

else {

selector[i] = 1;

nSelected++;

curLex -= combCnt;

curCombLen--;

}

}

if (nSelected < s){

// curLex == 0, set the rest to the lowest

for (i = n -1; nSelected < s; i--){

selector[i] = 1;

nSelected++;

}

}

//convert binary format to vector, selector is in reverse

result.resize(s ); //make sure we have enough size

unsigned int j = s - 1;

for (i = 0; i < n; i++){

if (selector[i] == 1){

result[j] = n - i - 1;

if (j == 0) break;

j--;

}

}

}

Listing 4

#define S ? //some positive integer constant

#define N ? //some positive integer constant that > S

class Comb {

public:

Comb();

explicit Comb(unsigned int lex);

Comb& operator++(); //prefix

const Comb operator++(int) ; //postfix

Comb& operator--(); //prefix

const Comb operator--(int); //postfix

const unsigned int getLex() const;

const bool operator==(cons t Comb& rhs) const;

const bool isMax() const {return m_v[0] == N - S;}

const bool isMin() const {return m_v[S - 1] == S - 1;}

void setValue(const unsigned int lex);

const UINT_VECTOR& getValue()const {return m_v;}

void print()const; //print elements in vector m_v

private:

UINT_VECTOR m_v;

};

Listing 5

Comb::Comb()

{ m_v.resize(S);

for (unsigned int i = 0; i < S; i++)

m_v[i] = i;

}

Comb::Comb(unsi gned int lex)

{ m_v.resize(S);

lex2Comb(lex, S, N, m_v);

}

Listing 6

Comb& Comb::operator+ +() //prefix

{ increaseComb(m_ v, N);

return *this;

}

//postfix has to return by value

const Comb Comb::operator+ +(int) //postfix

{ Comb oldValue = *this;

++(*this);

return oldValue;

}

Comb& Comb::operator--() //prefix

{ decreaseComb(m_ v, N);

return *this;

}

//postfix has to return by value

const Comb Comb::operator--(int) //postfix

{ Comb<S, N> oldValue = *this;

--(*this);

return oldValue;

}

Listing 7

template <typename T>

T combCount(unsig ned int s, unsigned int n)

{ if (s == 0) return 0;

if (n < s) return 0;

unsigned int k = n - s;

if (k == 0) return 1;

k = std::min(s, k);

T count = n;

for (unsigned int i = 1; i < k; i++)

count = count * (n - i) / (i + 1);

return count;

}