I was asked to write a Tip Of the Week; so here goes: a lot of topics are

started here in this forum (and a lot of other forums too) mentioning a

problem about sorting data.

Examples of two main problem scenarios are like this:

1) an array containing a first name and another array containing a last name

and possibly a third array containing the age of a person; how to sort the

arrays according to, say, the first name.

2) a List of objects where each object contains a first name, a last name and

an age. The list must be sorted according to, say, the last name.

The second scenario can easily be solved if you implement a Comparator;

something like the following will do fine:

Expand|Select|Wrap|Line Numbers

- public class PersonComparator implements Comparator<Person> {
- public int compare(Person l, Person r) {
- return l.getLastName().compareTo(r.getLastName());
- }
- }

single method needs to be implemented. The equals() method can be left out,

possibly at the cost of a small performance penalty.

You need to pass an instance of such a PersonComparator to either a:

a) Collections.sort()

b) Arrays.sort()

method, i.e. the first method sorts a List using a Comparator for the nitty

gritty comparison work, or, the second method sorts an array using this

Comparator.

But what about scenario 1 then? Of course scenario 1 was bad design to start

with but also of course it wasn't your design, it was a bit of legacy code

and you weren't around when the design's first implementation was deployed.

Even worse, the solution to scenario 2 only works for arrays or Lists.

Scenario 2 is like scenario 1 for other datastructures that need to be

sorted according to some sorting criteria. Those two methods (although useful)

aren't of much help for anything else that needs to be sorted.

Again there are two possible scenarios here (I call them A and B): either

A) transmogrify your data to an array or List and use the methods mentioned

above; after that re-transmogrify the sorted data to your original data and

go on.

or

B) come up with something else.

This little article is all about scenario B, i.e. we come up with something

else. What exactly do we need to know about any type of data to be sortable?

First we need to know how many data items are to be sorted. We also need to

know whether or not one data item is larger, equal or smaller than another data

item. Third we want to be able to interchange (or swap) two data items. Given

the first notion we'd like to be able to think of indices, i.e. data item 0,

data item 1 etc. etc.

The above notions simply beg for an interface definition. Here is a simple one:

Expand|Select|Wrap|Line Numbers

- public interface Sortable {
- int length();
- int compare(int i, int j);
- void swap(int i, int j);
- }

is able to sort a Sortable, it doesn't need to know anything at all about the

type of data it sorts and it doesn't care less how two data items are swapped

and it doesn't care less why one data item is considered to be less, equal or

greater than an other data item.

This little article shows a 'heap sort' sorting method that is very well capable

of doing the job. A heapsort method is no worse than the ever propagated quick

sort method and it runs around any other naive sorting methods big times.

The big-Oh of a heap sort method is equal or better than a quick sort method

and as an additional benefit it is stable w.r.t. that (in)famous big-Oh number.

Before I forget, it doesn't need any additional memory either proportional

to the size of the data to be sorted.

This article presents the heap sort algoritm in terms of the Sortable interface.

So when two data items need to be compared, the algorithm calls the compare()

method, when they need to be swapped, the Sortable can do it etc. etc.

Here is a definition:

a heap is a binary tree such that for any node P, its children L and R (if

present) are not less than P, i.e. compare(P, L) and compare(P, R) both return

a value >= 0.

Note that a leaf is a heap by itself (a leaf doesn't have children)

For (almost) no particular reason, here's a binary tree with ten nodes:

Expand|Select|Wrap|Line Numbers

- . 0
- / \
- 1 2
- / \ / \
- 3 4 5 6
- / \ /
- 7 8 9

properties (without proof):

1: the left kid of a node i is numbered 2*i+1;

2: the right kid of a node i is numbered 2*i+2;

3: if a node i > 0 has k kids, a node i-1 has at least k kids;

4: every node has as many kids as possible.

Those four notions define a 'complete' binary tree. Math folks like those

definitions. So a sequence 0, 1, 2 ... n-1 can be seen as a complete binary

tree.

Back to that heapsort thing again: suppose that a node P is larger than

both of its children L and R for every node in a tree (as sketched above).

Then the root of the tree is the largest element of the them all.

All leaves of a tree are heaps by definition, but what about the rest of the

tree nodes (nearer to the root of the tree)? The solution is simple: we can

build a heap out of a binary tree, given the fact that both the left and

right subtrees are heaps already; it works as follows:

Let P be a parent node and L and R the left and right children of P.

1: if P >= L and P >= R then we already have a heap, otherwise:

2: let M be the maximum of L and R;

3: swap P with that maximum M and continue from step 1 again.

For every complete binary tree with n nodes we know that the last n/2 nodes of

that tree are leaves and so they are heaps already. (Check this using my bad

little ASCII art above).

So we can turn a (sub)tree into a heap using the three line algoritm above.

Lets do it using the Sortable interface defined earlier; here goes:

Expand|Select|Wrap|Line Numbers

- // given a Sortable, build a heap starting at node p. There a n nodes in total
- private static void heapify(Sortable s, int p, int n) {
- for (int r, l= (p<<1)+1; l < n; p= l, l= (p<<1)+1) {
- // l is the maximum of l and r, the two subnodes of p
- if ((r= l+1) < n && s.compare(l, r) < 0) l= r;
- // check if parent p is less than maximum l
- if (s.compare(p, l) < 0) s.swap(p, l);
- else break;
- }
- }

small method above. Traditionally this part is called the 'phase 1' of the

heap sort algorithm. Why break with good traditions?

Expand|Select|Wrap|Line Numbers

- private static void phase1(Sortable s) {
- // heapify all the non-leaf nodes
- for (int n= s.length(), p= n/2; p >= 0; p--)
- heapify(s, p, n);
- }

to get everything sorted. The root of the tree is the largest element of all

the nodes. If we swap the root with the last element of the Sortable we have

this largest element in place (at the end). But now the tree is probably not

a heap anymore. Using our first method we can turn the tree into a heap again

but we ignore the last node (which was in place already).

We repeat the step above until the heap contains just one single node. This

node must be the smallest element of all elements in the Sortable. Here goes:

Expand|Select|Wrap|Line Numbers

- // sort the Sortable
- private static void phase2(Sortable s) {
- for (int n= s.length(); --n > 0; ) {
- s.swap(0, n); // put the root element in its place
- heapify(s, 0, n); // and restore the heap again
- }
- }

looks like this:

Expand|Select|Wrap|Line Numbers

- // driver for the worked methods
- public static Sortable sort(Sortable s) {
- phase1(s); // build initial heap
- phase2(s); // sort the sortable given the heap
- return s; // return the Sortable for convenience
- }

exceed 10,000 characters in length. See you soon and

kind regards,

Jos