472,995 Members | 1,520 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes and contribute your articles to a community of 472,995 developers and data experts.

Generic heap sort algorithm A

11,448 Expert 8TB

I was asked to write a Tip Of the Week; so here goes: a lot of topics are
started here in this forum (and a lot of other forums too) mentioning a
problem about sorting data.

Examples of two main problem scenarios are like this:

1) an array containing a first name and another array containing a last name
and possibly a third array containing the age of a person; how to sort the
arrays according to, say, the first name.

2) a List of objects where each object contains a first name, a last name and
an age. The list must be sorted according to, say, the last name.

The second scenario can easily be solved if you implement a Comparator;
something like the following will do fine:

Expand|Select|Wrap|Line Numbers
  1. public class PersonComparator implements Comparator<Person> {
  2.     public int compare(Person l, Person r) {
  3.         return l.getLastName().compareTo(r.getLastName());
  4.     }
  5. }
Note that according to the contract of the Comparable<T> interface just a
single method needs to be implemented. The equals() method can be left out,
possibly at the cost of a small performance penalty.

You need to pass an instance of such a PersonComparator to either a:

a) Collections.sort()
b) Arrays.sort()

method, i.e. the first method sorts a List using a Comparator for the nitty
gritty comparison work, or, the second method sorts an array using this

But what about scenario 1 then? Of course scenario 1 was bad design to start
with but also of course it wasn't your design, it was a bit of legacy code
and you weren't around when the design's first implementation was deployed.

Even worse, the solution to scenario 2 only works for arrays or Lists.
Scenario 2 is like scenario 1 for other datastructures that need to be
sorted according to some sorting criteria. Those two methods (although useful)
aren't of much help for anything else that needs to be sorted.

Again there are two possible scenarios here (I call them A and B): either

A) transmogrify your data to an array or List and use the methods mentioned
above; after that re-transmogrify the sorted data to your original data and
go on.


B) come up with something else.

This little article is all about scenario B, i.e. we come up with something
else. What exactly do we need to know about any type of data to be sortable?
First we need to know how many data items are to be sorted. We also need to
know whether or not one data item is larger, equal or smaller than another data
item. Third we want to be able to interchange (or swap) two data items. Given
the first notion we'd like to be able to think of indices, i.e. data item 0,
data item 1 etc. etc.

The above notions simply beg for an interface definition. Here is a simple one:
Expand|Select|Wrap|Line Numbers
  1. public interface Sortable {
  2.     int length();
  3.     int compare(int i, int j);
  4.     void swap(int i, int j);
  5. }
Note that, no matter how trivial this interface may look, if a sorting method
is able to sort a Sortable, it doesn't need to know anything at all about the
type of data it sorts and it doesn't care less how two data items are swapped
and it doesn't care less why one data item is considered to be less, equal or
greater than an other data item.

This little article shows a 'heap sort' sorting method that is very well capable
of doing the job. A heapsort method is no worse than the ever propagated quick
sort method and it runs around any other naive sorting methods big times.

The big-Oh of a heap sort method is equal or better than a quick sort method
and as an additional benefit it is stable w.r.t. that (in)famous big-Oh number.
Before I forget, it doesn't need any additional memory either proportional
to the size of the data to be sorted.

This article presents the heap sort algoritm in terms of the Sortable interface.
So when two data items need to be compared, the algorithm calls the compare()
method, when they need to be swapped, the Sortable can do it etc. etc.

Here is a definition:

a heap is a binary tree such that for any node P, its children L and R (if
present) are not less than P, i.e. compare(P, L) and compare(P, R) both return
a value >= 0.

Note that a leaf is a heap by itself (a leaf doesn't have children)

For (almost) no particular reason, here's a binary tree with ten nodes:
Expand|Select|Wrap|Line Numbers
  1. .                 0
  2.                /     \
  3.               1       2        
  4.             /   \   /   \
  5.            3    4   5    6
  6.           / \  / 
  7.           7 8  9
please forgive me my (lack of) ASCII art. This tree does show some interesting
properties (without proof):

1: the left kid of a node i is numbered 2*i+1;
2: the right kid of a node i is numbered 2*i+2;
3: if a node i > 0 has k kids, a node i-1 has at least k kids;
4: every node has as many kids as possible.

Those four notions define a 'complete' binary tree. Math folks like those
definitions. So a sequence 0, 1, 2 ... n-1 can be seen as a complete binary

Back to that heapsort thing again: suppose that a node P is larger than
both of its children L and R for every node in a tree (as sketched above).
Then the root of the tree is the largest element of the them all.

All leaves of a tree are heaps by definition, but what about the rest of the
tree nodes (nearer to the root of the tree)? The solution is simple: we can
build a heap out of a binary tree, given the fact that both the left and
right subtrees are heaps already; it works as follows:

Let P be a parent node and L and R the left and right children of P.

1: if P >= L and P >= R then we already have a heap, otherwise:
2: let M be the maximum of L and R;
3: swap P with that maximum M and continue from step 1 again.

For every complete binary tree with n nodes we know that the last n/2 nodes of
that tree are leaves and so they are heaps already. (Check this using my bad
little ASCII art above).

So we can turn a (sub)tree into a heap using the three line algoritm above.
Lets do it using the Sortable interface defined earlier; here goes:

Expand|Select|Wrap|Line Numbers
  1. // given a Sortable, build a heap starting at node p. There a n nodes in total
  2. private static void heapify(Sortable s, int p, int n) {
  4.     for (int r, l= (p<<1)+1; l < n; p= l, l= (p<<1)+1) {
  6.         // l is the maximum of l and r, the two subnodes of p 
  7.         if ((r= l+1) < n && s.compare(l, r) < 0) l= r;
  9.         // check if parent p is less than maximum l
  10.         if (s.compare(p, l) < 0) s.swap(p, l);
  11.         else break;
  12.     }
  13. }
This is all fine, but how to turn a Sortable into a heap? We'll just use the
small method above. Traditionally this part is called the 'phase 1' of the
heap sort algorithm. Why break with good traditions?

Expand|Select|Wrap|Line Numbers
  1. private static void phase1(Sortable s) {
  3.     // heapify all the non-leaf nodes         
  4.     for (int n= s.length(), p= n/2; p >= 0; p--)
  5.         heapify(s, p, n);
  6. }
The second phase, traditionally named 'phase 2' *ahem* uses a heap as follows
to get everything sorted. The root of the tree is the largest element of all
the nodes. If we swap the root with the last element of the Sortable we have
this largest element in place (at the end). But now the tree is probably not
a heap anymore. Using our first method we can turn the tree into a heap again
but we ignore the last node (which was in place already).

We repeat the step above until the heap contains just one single node. This
node must be the smallest element of all elements in the Sortable. Here goes:

Expand|Select|Wrap|Line Numbers
  1. // sort the Sortable
  2. private static void phase2(Sortable s) {
  4.     for (int n= s.length(); --n > 0; ) {
  5.         s.swap(0, n);         // put the root element in its place
  6.         heapify(s, 0, n);     // and restore the heap again
  7.     }        
  8. }
I don't think it comes as a surprise that the heap sort method itself simply
looks like this:

Expand|Select|Wrap|Line Numbers
  1. // driver for the worked methods
  2. public static Sortable sort(Sortable s) { 
  4.     phase1(s);     // build initial heap
  5.     phase2(s);     // sort the sortable given the heap
  7.     return s;     // return the Sortable for convenience
  8. }
We continue this little article in part 2; all this because an article cannot
exceed 10,000 characters in length. See you soon and

kind regards,

May 4 '07 #1
0 13196

Sign in to post your reply or Sign up for a free account.

Similar topics

by: darth | last post by:
Does anyone have heap sorting with queue(doublely linked list) in C? I have heap sort with array but with queue, it's somewhat hard.
by: Bonj | last post by:
hello this is a purely algorithmical question but if i have posted to an irrelevant group, apologies. can anyone point me at some good tutorials or info about the steps involved in creating a...
by: songie D | last post by:
I am looking to write a very fast algorithm to merge ranges. e.g. if it was input the following ranges (1,3), (4,6), (9, 12) it should return exactly the same thing. However if it's passed (1, 5),...
by: codergem | last post by:
The following code is for heap sort but its not working properly. According to my analysis the Logic is correct but still its not showing the correct output. Thanks. #include "stdafx.h"...
by: Raj | last post by:
We are on a db2 v8.2 (fix 8) with DPF & intra parllelism. Below are sort related configuration settings ...
by: Nick Valeontis | last post by:
I know how to use Icomparable. However, I can't figure out how to sort a generic linked list? (without writing the algorithm) Lets say I have something like this: class...
by: Woody Ling | last post by:
In 32 bits DB2 environment, is it meaningful to set sheapthres larger than 256MB for the following case.. 1. Intra-parallel is ON 2. Intra-parallel is OFF
by: Alan | last post by:
I was wondering if anyone had design advice on this. . . . I am doing some mathematical operations looking at different metrics for data (in objects) I have captured. The object class has several...
by: sophia.agnes | last post by:
Hi all, Is there a simple algorithm to implement in heapsort??? this is the program i found in my college D.S notebook #include<stdio.h> #include<stdlib.h> void heapsort (int, int);
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...
by: SueHopson | last post by:
Hi All, I'm trying to create a single code (run off a button that calls the Private Sub) for our parts list report that will allow the user to filter by either/both PartVendor and PartType. On...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.