472,992 Members | 3,152 Online

# Mergesort algorithm for linked lists

Hi folks!

Everyone knows how to sort arrays (e. g. quicksort, heapsort etc.)
For linked lists, mergesort is the typical choice.

While I was looking for a optimized implementation of mergesort for
linked lists, I couldn't find one. I read something about Mcilroy's
"Optimistic Merge Sort" and studied some implementation, but they
were for arrays. Does anybody know if Mcilroys optimization is applicable to
truly linked lists at all?

Eventually, I came up with my own implementation (see below) which seems to
perform quite well. The algorithm is my own idea - at least I don't know if
anybody else has come up with this before.

The algorithm is localized. It maintains a stack of sorting length and tries
to merge small pieces as soon as possible. This takes advantage of a CPU
caches.

Here are my questions about it:

*- Is my implementation new/genuine or has somebody else come up
* *with this before?

*- How does it compare with other optimized implementations (provided there
* *are such)? I would like to hear from better implementations!

*- When optimizing quicksort, one usually tries to use insertion sort
* *and the like for SMALL parts. I tried to do the same for the mergesort
* *below, i. e. sort small pieces of length ~ 10 with insertion sort.

* *It actually became slower :( I don't quite understand why.

Thanks for any comments!

Joerg

PS: You can find this implementation at
* * * * http://www.die-Schoens.de/prg/testsort.tgz

--------------------- snip ------------------------------
#include <stddef.h/* *for NULL **/

* int Key;
};

#define NATURAL /* *recognize natural runs **/

{
* struct {
* * linkedlist_t *list;
* * unsigned int size;
#ifdef NATURAL
* * unsigned int level;
#endif
* } stack[sizeof(int) * 8];
* int nstack;

* for(nstack = -1 ; ; ) {
* * linkedlist_t *plist, *qlist, *curr;
* * unsigned int psize, qsize;

* * /* *Can we or do we have to merge lists from the stack? **/
* * if(nstack < 1 || (
#ifdef NATURAL
* * * * * * * * * * * *stack[nstack - 1].level != stack[nstack].level
#else
* * * * * * * * * * * *stack[nstack - 1].size != stack[nstack].size
#endif
* * * * * * * * * * * *&& list != NULL)) {
* * * /* ** *Push element(s) on the stack *** */
* * * if(list == NULL)
* * * * /* *This is where we break the loop. We have nstack == 0 now **/
* * * * break;

* * * /* ** *Push lists of length 1 *** */
* * * curr = list;
* * * psize = 1;

* * * list = list->Next;

#ifdef NATURAL
* * * /* *Check for a natural run **/
* * * for(plist = curr ; list != NULL ; plist = list, list = list->Next,
++psize)
* * * * if(list->Key < plist->Key)
* * * * * break;

* * * /* *Properly terminate the run **/
* * * plist->Next = NULL;
#else
* * * /* *Properly terminate list **/
* * * curr->Next = NULL;
#endif

* * * /* ** *Push this run on the stack *** */
* * * ++nstack;
* * * stack[nstack].list = curr;
* * * stack[nstack].size = psize;
#ifdef NATURAL
* * * stack[nstack].level = 0;
#endif

* * * continue;
* * }

* * /* ** *Merge top two entries from stack *** */
* * plist = stack[nstack].list; psize = stack[nstack].size;
* * --nstack;
* * qlist = stack[nstack].list; qsize = stack[nstack].size;

* * /* ** *Set up new stack element. This also simplifies the main loop
below ** */
* * if(qlist->Key < plist->Key) {
* * * curr = qlist;
* * * qlist = qlist->Next; --qsize;
* * } else {
* * * curr = plist;
* * * plist = plist->Next; --psize;
* * }

* * stack[nstack].list = curr;

* * /* *Number of elements is just the sum **/
* * stack[nstack].size += stack[nstack + 1].size;
#ifdef NATURAL
* * stack[nstack].level++;
#endif

* * /* ** *Here is the main merging loop *** */
* * while(psize 0 && qsize 0) {
* * * if(qlist->Key < plist->Key) {
* * * * curr->Next = qlist; curr = qlist;
* * * * qlist = qlist->Next; * --qsize;
* * * } else {
* * * * curr->Next = plist; curr = plist;
* * * * plist = plist->Next;; *--psize;
* * * }
* * }

* * /* *Add remainder to result list **/
* * if(psize 0) {
* * * curr->Next = plist;
* * } else {
* * * curr->Next = qlist;
* * }
* }

* /* *Here we treat the case of list == NULL **/
* return nstack == 0 ? stack[nstack].list : NULL;
}
-------------------- snap -----------------------
Jan 22 '07 #1
51 8475
Joerg Schoen wrote:
>
Everyone knows how to sort arrays (e. g. quicksort, heapsort etc.)
For linked lists, mergesort is the typical choice.

While I was looking for a optimized implementation of mergesort for
linked lists, I couldn't find one. I read something about Mcilroy's
"Optimistic Merge Sort" and studied some implementation, but they
were for arrays. Does anybody know if Mcilroys optimization is
applicable to truly linked lists at all?

Eventually, I came up with my own implementation (see below) which
seems to perform quite well. The algorithm is my own idea - at least
I don't know if anybody else has come up with this before.

The algorithm is localized. It maintains a stack of sorting length
and tries to merge small pieces as soon as possible. This takes
advantage of a CPU caches.
.... snip code ...

Too long and complicated for me. There is an example in about 60
lines, including comments, in wdfreq.c, which in turn is an example
application in hashlib.zip. See:

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

Jan 22 '07 #2
"Joerg Schoen" <jo************@online.dewrote in message
news:ep**********@online.de...
>
- Is my implementation new/genuine or has somebody else come up
with this before?
It is unlikely that your implementation is unique. Somebody somewhere has
probably done it before. Searching and sorting are very well explored
topics, both theoretically and practically.

But even if your actual implementation of a small aspect of it is unique,
nobody will care, and your place in history is not assured.

Generally, people will only care if it solves a practical problem worth a
lot of money, or adds insight or a solution to a theoretical problem. If
you have a reimplementation of an O(n log n) sorting algorithm, nobody will
care.

Now, if you develop a general O(log n) sorting algorithm ... there will be
more listeners.

Merging using linked lists doesn't seem fundamentally different than merging
using arrays.

Related question: We discussed the "bubble-sort" in class today. I have
this new sort where I find the last element instead of the first--I think
I'll call it the "rock-sort". Will I get the Nobel prize or something?

Sorting has been well explored. Here is what you're up against:

http://en.wikipedia.org/wiki/Category:Sort_algorithms

--
David T. Ashley (dt*@e3ft.com)
http://gpl.e3ft.com (GPL Publications and Projects)
Jan 23 '07 #3
"David T. Ashley" wrote:
>
.... snip ...
>
Now, if you develop a general O(log n) sorting algorithm ... there
will be more listeners.
They exist. See Sedgewick. Need more hardware though.
>
Merging using linked lists doesn't seem fundamentally different
than merging using arrays.
You use it without disturbing/moving/copying the actual data, and
lists are often the most convenient input mechanism for unknown
volume. In addition mergesort has no worst case, and is stable.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Jan 23 '07 #4
David T. Ashley wrote:
But even if your actual implementation of a small aspect of it is unique,
nobody will care, and your place in history is not assured.
Wasn't looking for a "place in history", just for comments on my
implementation.

I disagree that nobody cares. I actually searched around to find useful
practical implementations of mergesorts for linked lists. What I found
were
only textbooks type of implemtations - nice but slow. That's why I
posted
this here to ask for comments and maybe find a better implementation.
Generally, people will only care if it solves a practical problem worth a
lot of money, or adds insight or a solution to a theoretical problem. If
you have a reimplementation of an O(n log n) sorting algorithm, nobody will
care.
Theoretically, the whole "mergesort" stuff is well known and boring.
But still the
question remains how to do in practical terms in an efficient way.

You contradict yourself. One time you state that only theoretical
insights might
interest people or if practical problems are solved. But here I attempt
to solve a
practical problem, namely to have a real-world/efficient implementation
list mergesort, and you state that "no one cares".
Merging using linked lists doesn't seem fundamentally different than merging
using arrays.
Except for the fact that the implementations are different. One of my
original
questions was if the "optimized merge sort" which is an efficient
implementation
of mergesort also works for linked lists. The only implementation I
found were for
arrays.
Related question: We discussed the "bubble-sort" in class today. I have
this new sort where I find the last element instead of the first--I think
I'll call it the "rock-sort". Will I get the Nobel prize or something?
It seems you are not having a good day and try to be rude all the time.
comment on the topic, i. e. my code and stop this useless comments.

Jan 23 '07 #5
CBFalconer wrote:
Too long and complicated for me. There is an example in about 60
lines, including comments, in wdfreq.c, which in turn is an example
application in hashlib.zip. See:

Thanks for the comment! Your implementation is basically the one from
Sedgewick's book. It has the disadvantage that you have to run through
the linked lists repeatedly to split it, but it is also localized, i.
e. takes

Jan 23 '07 #6

Joerg Schoen wrote:
- When optimizing quicksort, one usually tries to use insertion sort
and the like for SMALL parts. I tried to do the same for the mergesort
below, i. e. sort small pieces of length ~ 10 with insertion sort.

It actually became slower :( I don't quite understand why.
Using wikipedia's expected number-of-comparisons for randomized quick
sort (1.39 n lg2 n), insertion sort (n^2 / 4), and merge sort (n lg2 n
+ 1 - 1.26 n) says that for ten elements, quicksort would do 46
comparisons, insertion sort 25 comparisons, and merge sort 22
comparisons.

Also, for a large array, the insertion sort "helper" for quicksort can
be done as a single pass over the entire array at the end (each element
is already close to its final position) meaning the insertion sort
overhead is much less than the quicksort overhead for a bunch of very
small partitions.

For mergesort on a list, your insertion-sort helper has overhead almost
as large as the merge-sort overhead, since you have to do a bunch of
"little" insertion sorts before you can start merging.

Jan 23 '07 #7
<jo**@online.dewrote in message
David T. Ashley wrote:

It seems you are not having a good day and try to be rude all the time.
comment on the topic, i. e. my code and stop this useless comments.
One has to differentiate different types of contributions to the world:

a)Practical "make life" easier contributions: these are contributions that
are convenient, but that could be replicated by many others. A mergesort
implementation that uses linked lists instead of arrays is in this category.

b)Fundamental contributions: these have a lower probability of being
replicated by others. The original development of quicksort and to a lesser
degree mergesort is in this category.

Your contribution is in category (a) rather than (b). It has value, just
not a lot of value. The key issue is whether anyone seeking the solution
would spend more time searching for and porting your solution rather than
implementing it themselves from scratch. It is toss-up.

You might investigate www.snippets.org.

My intention was not to be rude.
--
David T. Ashley (dt*@e3ft.com)
http://gpl.e3ft.com (GPL Publications and Projects)
Jan 23 '07 #8
David T. Ashley schrieb:
One has to differentiate different types of contributions to the world:

a)Practical "make life" easier contributions: these are contributions that
are convenient, but that could be replicated by many others. A mergesort
implementation that uses linked lists instead of arrays is in this category.

b)Fundamental contributions: these have a lower probability of being
replicated by others. The original development of quicksort and to a lesser
degree mergesort is in this category.

Your contribution is in category (a) rather than (b). It has value, just
not a lot of value. The key issue is whether anyone seeking the solution
would spend more time searching for and porting your solution rather than
implementing it themselves from scratch. It is toss-up.
Of course I never thought to be in category (b). Being in (a), the
question for
me was: How to do mergesort on linked list efficiently? The sample code
in
Sedgewick and in other sources didn't look too "efficient" to me, so I
was
looking for a better one and eventually came up with my piece of code.

I don't think that my solution requires porting at all. Besides that,
it is more
efficient than the "standard" solution. Of course this only means
"faster by some
percentage" - it is still O(n * log(n)). I actually want to use it in a
real world
application where I am looking for a reasonable implementation of
mergesort.
You might investigate www.snippets.org.

My intention was not to be rude.
Sorry for misunderstanding you. I am not a native speaker - maybe this
is the
problem.

Jan 23 '07 #9
jo**@online.de wrote:
CBFalconer wrote:
>Too long and complicated for me. There is an example in about 60
lines, including comments, in wdfreq.c, which in turn is an example
application in hashlib.zip. See:

Thanks for the comment! Your implementation is basically the one from
Sedgewick's book. It has the disadvantage that you have to run through
the linked lists repeatedly to split it, but it is also localized, i.e.
takes advantage of caches.
Well, I read Sedgewick long ago :-). However I fail to see how you
can do a merge without first splitting.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

Jan 23 '07 #10
CBFalconer <cb********@yahoo.comwrites:
Too long and complicated for me. There is an example in about 60
lines, including comments, in wdfreq.c, which in turn is an example
application in hashlib.zip. See:

As long as we're posting links to linked list merge sorts, my
implementation is at
http://cvs.savannah.gnu.org/viewcvs/...pp&view=markup
in function ll_sort().

It's optimized for readability.

CBFalconer wrote in a later message:
Well, I read Sedgewick long ago :-). However I fail to see how you
can do a merge without first splitting.
My implementation doesn't do any splitting either. At least, I
don't think it does; you can tell me if it contains something
that you'd interpret to be splitting.
--
"Welcome to the wonderful world of undefined behavior, where the demons
are nasal and the DeathStation users are nervous." --Daniel Fox
Jan 23 '07 #11
On Mon, 22 Jan 2007 21:26:44 +0100, Joerg Schoen
<jo************@online.dewrote:
>Hi folks!

Everyone knows how to sort arrays (e. g. quicksort, heapsort etc.)
For linked lists, mergesort is the typical choice.

While I was looking for a optimized implementation of mergesort for
linked lists, I couldn't find one. I read something about Mcilroy's
"Optimistic Merge Sort" and studied some implementation, but they
were for arrays. Does anybody know if Mcilroys optimization is applicable to
truly linked lists at all?

Eventually, I came up with my own implementation (see below) which seems to
perform quite well. The algorithm is my own idea - at least I don't know if
anybody else has come up with this before.

The algorithm is localized. It maintains a stack of sorting length and tries
to merge small pieces as soon as possible. This takes advantage of a CPU
caches.

Here are my questions about it:

*- Is my implementation new/genuine or has somebody else come up
* *with this before?

*- How does it compare with other optimized implementations (provided there
* *are such)? I would like to hear from better implementations!

*- When optimizing quicksort, one usually tries to use insertion sort
* *and the like for SMALL parts. I tried to do the same for the mergesort
* *below, i. e. sort small pieces of length ~ 10 with insertion sort.

* *It actually became slower :( I don't quite understand why.

Thanks for any comments!

Joerg
[snip code]

Here are some suggestions: First of all, the proper newsgroup probably
should be comp.programming. Secondly, I would much prefer a more
extensive description of the actual algorithm. Your description is
quite vague; seemingly, the only way to understand what you are actually
doing is to read the code. Thirdly, your code is hard to read (in my
opinion, of course.) As a general rule code is easier to read if things
are cleanly indented and well lined up.
Here are some of the things that I find make it difficult to read.
First of all, in code for presentation it reads better if you get rid of
the ifdefs and replace them by normal tests on flags. As it is your
preprocessor tests break into the flow of the code.

Secondly I don't much like interspersing single line comments with code.
I suggest prefixing blocks of code with comment blocks that describe the
intent of the entire block. If do want to comment upon a single line I
prefer to have a sidebar comment. Stream of consciousness commenting is
helpful to the author but not to the reader.

Thirdly, my preference is that each variable be declared on a separate
line with a sidebar comment describing the variable. When someone else
is reading the code they don't have to puzzle out what each name means.

As a strictly C note, experience says do this

* * * if(list == NULL) break; /* Single statement if */

or this

* * * if(list == NULL) { /* If controlling block */
* * * * break;
}

but not this

* * * if(list == NULL) /* This is a maintenance trap */
* * * * break;

As a note on mergesorting linked list data, you don't need to scan the
list to get its length. Instead sort the first two elements. You now
have a sorted list of length 2. Sort the next two elements to get a
second sorted list; merge it with the first. You now have a sorted list
of length 4. Sort the next four elements and merge to get a list of
length 8. Repeat as needed. Season to taste.

I'm not sure how this compares with your algorithm because I haven't sat
down to figure out exactly what you're doing; however your algorithm
looks interesting.

Jan 23 '07 #12
cr*@tiac.net (Richard Harter) writes:
As a note on mergesorting linked list data, you don't need to scan the
list to get its length. Instead sort the first two elements. You now
have a sorted list of length 2. Sort the next two elements to get a
second sorted list; merge it with the first. You now have a sorted list
of length 4. Sort the next four elements and merge to get a list of
length 8. Repeat as needed. Season to taste.
Sure, that'll work, but it's somewhat brute force. I think that
a "natural" merge sort makes more sense: take the first two runs
of ascending values and merge them to produce the first output
run, and repeat that process until you've consumed the input.
Then start over from the top, until there's only a single output
run.
--
"I hope, some day, to learn to read.
It seems to be even harder than writing."
--Richard Heathfield
Jan 23 '07 #13
On Tue, 23 Jan 2007 12:56:02 -0800, Ben Pfaff <bl*@cs.stanford.edu>
wrote:
>cr*@tiac.net (Richard Harter) writes:
>As a note on mergesorting linked list data, you don't need to scan the
list to get its length. Instead sort the first two elements. You now
have a sorted list of length 2. Sort the next two elements to get a
second sorted list; merge it with the first. You now have a sorted list
of length 4. Sort the next four elements and merge to get a list of
length 8. Repeat as needed. Season to taste.

Sure, that'll work, but it's somewhat brute force. I think that
a "natural" merge sort makes more sense: take the first two runs
of ascending values and merge them to produce the first output
run, and repeat that process until you've consumed the input.
Then start over from the top, until there's only a single output
run.
A natural merge sort might make more sense but it's not an improvement,
or at least it isn't unless you have something clever up your sleeve.
See the discussion in Knuth on merge sorts. Natural gains when runs are
long and few and loses when runs are short and many. Offhand your
version is going to much more expensive (though still O(n log n))
because you have to rescan the runs.
Jan 23 '07 #14
Ben Pfaff wrote:
cr*@tiac.net (Richard Harter) writes:
>As a note on mergesorting linked list data, you don't need to scan the
list to get its length. Instead sort the first two elements. You now
have a sorted list of length 2. Sort the next two elements to get a
second sorted list; merge it with the first. You now have a sorted list
of length 4. Sort the next four elements and merge to get a list of
length 8. Repeat as needed. Season to taste.
That's basically what my code is doing, but it also takes advantage of
natural runs in the input! Richard Harter is right in noticing that I
should have described the algorithm more detailed. Sorry! I'll also improve
my indentation next time. Here now is a detailed explanation:

I break down the linked lists into natural runs (only if preprocessor
symbol "NATURAL" is defined). After collecting the first two runs, I merge
them into a bigger run. Then I collect the next two runs, merge them into a
bigger one. On the stack, I now have two bigger runs which I merge again.

Here is how the stack develops (separated letters denote different runs,
adjacent letters means merged runs, the comment at the end of the line
describes what happens before the next line):
# collect run
a # collect run
a b # merge
ab # collect run
ab c # collect run
ab c d # merge
ab cd # merge
abcd # collect run
abcd e # collect run
abcd e f # merge
abcd ef # collect run
abcd ef g # collect run
abcd ef g h # merge
abcd ef gh # merge
abcd efgh # merge
abcdefgh
....

The stack contains the start of the each run which is a NULL terminated
linked list and also the length of it. So there is enough information on
the stack to know the start and length of each run. Compared to other
implementations of mergesort the my implementation doesn't run through the
list a given number of steps just to find the start of the next run.

What I like about the implementation is that there is not much need for
treating special cases like ending lists when merging etc. That sounds
easy, but try to do it yourself and you see that it's a bit tricky!
Sure, that'll work, but it's somewhat brute force. I think that
a "natural" merge sort makes more sense: take the first two runs
of ascending values and merge them to produce the first output
run, and repeat that process until you've consumed the input.
Then start over from the top, until there's only a single output
run.
That is exactly the governing idea in my own implementation: Improve
mergesort by recognizing natural runs in the input.

There are two not so obvious problems with Ben's description:

One thing is that you need to call your comparison function very frequently
just to recognize runs you could already know about otherwise. Assume you
have just merged two runs of length 1000 and 2000 to a run of length 3000.
When you go over the list from the top, you need another 2999 comparisons
just to recognize the run of size 3000.

Note that this is easier in the simpler implementation cited by Richard: The
length of the runs is known to be exactly 2**n in the n-th iteration of the
outer loop.

The second problem in Ben's approach is non-locality: You have to run
repeatedly through the WHOLE input and each time increase the size of the
runs. This will render your CPU cache rather useless. In the test program
on my webpage (http://www.die-schoens.de/prg/testsort.tgz), I compare the
localized version with a non-localized one. The runtime of the localized
one is better by 35%! That's seems relevant to me for real world
application!

Of course, as said before, the algorithm is O(n * log(n)), we are just
talking about minimizing the constant factor in front of this.

Jan 23 '07 #15
Ben Pfaff wrote:
CBFalconer <cb********@yahoo.comwrites:
.... snip ...
>http://cvs.savannah.gnu.org/viewcvs/...pp&view=markup
in function ll_sort().
>
.... snip ...
>
>Well, I read Sedgewick long ago :-). However I fail to see how you
can do a merge without first splitting.

My implementation doesn't do any splitting either. At least, I
don't think it does; you can tell me if it contains something
that you'd interpret to be splitting.
To merge something you need two things to merge. You start with
one list. I can imagine a technique where you use the end of a run
to form the end of a split, and then try to merge with the
remainder. I don't think such an algorithm will be nlogn,
especially in the worst case, and preserving stability will require
care. List splitting has a very taut inner loop.

I haven't looked at your implementation so far.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Jan 23 '07 #16
cr*@tiac.net (Richard Harter) writes:
On Tue, 23 Jan 2007 12:56:02 -0800, Ben Pfaff <bl*@cs.stanford.edu>
wrote:
>>Sure, that'll work, but it's somewhat brute force. I think that
a "natural" merge sort makes more sense: take the first two runs
of ascending values and merge them to produce the first output
run, and repeat that process until you've consumed the input.
Then start over from the top, until there's only a single output
run.

A natural merge sort might make more sense but it's not an improvement,
or at least it isn't unless you have something clever up your sleeve.
See the discussion in Knuth on merge sorts. Natural gains when runs are
long and few and loses when runs are short and many. Offhand your
version is going to much more expensive (though still O(n log n))
because you have to rescan the runs.
I'll keep that in mind as I have the need for optimization in
future, thanks for pointing it out.
--
"It would be a much better example of undefined behavior
if the behavior were undefined."
--Michael Rubenstein
Jan 23 '07 #17
On Tue, 23 Jan 2007 23:08:27 +0100, Joerg Schoen
<jo************@online.dewrote:
>Ben Pfaff wrote:
>cr*@tiac.net (Richard Harter) writes:
>>As a note on mergesorting linked list data, you don't need to scan the
list to get its length. Instead sort the first two elements. You now
have a sorted list of length 2. Sort the next two elements to get a
second sorted list; merge it with the first. You now have a sorted list
of length 4. Sort the next four elements and merge to get a list of
length 8. Repeat as needed. Season to taste.

That's basically what my code is doing, but it also takes advantage of
natural runs in the input! Richard Harter is right in noticing that I
should have described the algorithm more detailed. Sorry! I'll also improve
my indentation next time. Here now is a detailed explanation:

I break down the linked lists into natural runs (only if preprocessor
symbol "NATURAL" is defined). After collecting the first two runs, I merge
them into a bigger run. Then I collect the next two runs, merge them into a
bigger one. On the stack, I now have two bigger runs which I merge again.

Here is how the stack develops (separated letters denote different runs,
adjacent letters means merged runs, the comment at the end of the line
describes what happens before the next line):
# collect run
a # collect run
a b # merge
ab # collect run
ab c # collect run
ab c d # merge
ab cd # merge
abcd # collect run
abcd e # collect run
abcd e f # merge
abcd ef # collect run
abcd ef g # collect run
abcd ef g h # merge
abcd ef gh # merge
abcd efgh # merge
abcdefgh
...

The stack contains the start of the each run which is a NULL terminated
linked list and also the length of it. So there is enough information on
the stack to know the start and length of each run. Compared to other
implementations of mergesort the my implementation doesn't run through the
list a given number of steps just to find the start of the next run.

What I like about the implementation is that there is not much need for
treating special cases like ending lists when merging etc. That sounds
easy, but try to do it yourself and you see that it's a bit tricky!
>Sure, that'll work, but it's somewhat brute force. I think that
a "natural" merge sort makes more sense: take the first two runs
of ascending values and merge them to produce the first output
run, and repeat that process until you've consumed the input.
Then start over from the top, until there's only a single output
run.

That is exactly the governing idea in my own implementation: Improve
mergesort by recognizing natural runs in the input.

There are two not so obvious problems with Ben's description:

One thing is that you need to call your comparison function very frequently
just to recognize runs you could already know about otherwise. Assume you
have just merged two runs of length 1000 and 2000 to a run of length 3000.
When you go over the list from the top, you need another 2999 comparisons
just to recognize the run of size 3000.

Note that this is easier in the simpler implementation cited by Richard: The
length of the runs is known to be exactly 2**n in the n-th iteration of the
outer loop.

The second problem in Ben's approach is non-locality: You have to run
repeatedly through the WHOLE input and each time increase the size of the
runs. This will render your CPU cache rather useless. In the test program
on my webpage (http://www.die-schoens.de/prg/testsort.tgz), I compare the
localized version with a non-localized one. The runtime of the localized
one is better by 35%! That's seems relevant to me for real world
application!

Of course, as said before, the algorithm is O(n * log(n)), we are just
talking about minimizing the constant factor in front of this.

Well done. Actually the algorithm I described is is closer to your
algorithm than to Ben's. IIANM Ben's is actually O(n^2) worse case if I
am not missing a fine point. An implementation of my algorithm has a
little gotcha - you have to be careful about recognizing the end of
data. I shall have to think about it a bit, but I think both can be
written nicely as recursive routines.
Jan 23 '07 #18
cr*@tiac.net (Richard Harter) writes:
Well done. Actually the algorithm I described is is closer to your
algorithm than to Ben's. IIANM Ben's is actually O(n^2) worse case if I
am not missing a fine point.
I don't think my code has an O(n^2) worst case, even if it is
slower than an "unnatural" merge sort. What do you think could
trigger an O(n^2) case in my code?
--
"Programmers have the right to be ignorant of many details of your code
and still make reasonable changes."
--Kernighan and Plauger, _Software Tools_
Jan 23 '07 #19
Joerg Schoen wrote:
>
.... snip ...
>
Of course, as said before, the algorithm is O(n * log(n)), we are
just talking about minimizing the constant factor in front of this.
I think that you will find that an initially reverse sorted list
will eat up stack and time. You are doing a lot of copying, which
is totally unnecessary. Remember, records can be big.

--
"The mere formulation of a problem is far more often essential
than its solution, which may be merely a matter of mathematical
or experimental skill. To raise new questions, new possibilities,
to regard old problems from a new angle requires creative
imagination and and marks real advances in science."
-- Albert Einstein
Jan 24 '07 #20
On Tue, 23 Jan 2007 14:54:09 -0800, Ben Pfaff <bl*@cs.stanford.edu>
wrote:
>cr*@tiac.net (Richard Harter) writes:
>Well done. Actually the algorithm I described is is closer to your
algorithm than to Ben's. IIANM Ben's is actually O(n^2) worse case if I
am not missing a fine point.

I don't think my code has an O(n^2) worst case, even if it is
slower than an "unnatural" merge sort. What do you think could
trigger an O(n^2) case in my code?
I think I take it back; there's an ambiguity in the wording of your
description. If you eat the runs one at a time the worst case is O(n^2)
with a reversed list. Upon reflection I suspect that what you meant was
to each a pair of runs, then another pair of runs, etc. That would
indeed be O(n log n). You didn't specify what you were doing with your
output runs. If you simply plop them back in place, that's expensive.
However you can use a stack (that's what Joerg Schoen is doing), or with
a bit of cheap cleverness you can do a clean recursive routine.
Jan 24 '07 #21
On Tue, 23 Jan 2007 18:12:00 -0500, CBFalconer <cb********@yahoo.com>
wrote:
>Joerg Schoen wrote:
>>
... snip ...
>>
Of course, as said before, the algorithm is O(n * log(n)), we are
just talking about minimizing the constant factor in front of this.

I think that you will find that an initially reverse sorted list
will eat up stack and time. You are doing a lot of copying, which
is totally unnecessary. Remember, records can be big.
Perhaps you've forgotten that he is talking about sorting linked lists.
There is no copying involved here, just changing links. Moreover the
maximum stack depth is O(log n). It is true that a initially reverse
sorted list is worst case for an ascending runs mergesort; however the
penalty lies in excessive comparisons.

Jan 24 '07 #22
On 24 Jan., 00:12, CBFalconer <cbfalco...@yahoo.comwrote:
will eat up stack and time. You are doing a lot of copying, which
is totally unnecessary. Remember, records can be big.
As Richard pointed out, I am not doing copying at all, just changing
links. In the test program on my page, I am comparing the mergesort
with other implementations. I use (1) random numbers (2) random numbers
with each value appearing multiple times (3) already sorted list (4)
inverse
sorted list.

It turns out that the implementation behaves well in all cases: It
excells in case (3)
since it recognizes the whole input as one big run and is done. Case
(4) is
"worst case" for the one which tries to recognize natural runs. It is
by ~ 1%
or so slower than the one that collects runs of length 1 and then
doubles on each
iteration. The reason is that the recognition of natural runs only is
done ONCE
over the input, so the "added penalty" is O(n) and the factor is only
the cost of
a comparison.

When time permits (during the weekend most likely), I will put Ben's
implementation
and a "standard text book" one for comparison.

Jan 24 '07 #23
On 24 Jan., 01:17, c...@tiac.net (Richard Harter) wrote:
On Tue, 23 Jan 2007 14:54:09 -0800, Ben Pfaff <b...@cs.stanford.edu>
wrote:
However you can use a stack (that's what Joerg Schoen is doing), or with
a bit of cheap cleverness you can do a clean recursive routine.
Triggered by this discussion I realized that I can still improve the
code and
get rid of keeping the lengths of the runs on the stack. So well, here
is my
"improved version". I have also increased the indentation, so it should
be
better readable. Moreover, the preprocessor stuff only appears once
now.

I still would like to know how to do it with a recursive routine,
currently I have
no idea about that.

------------------------ snip ----------------------------------------
#include <stddef.h/* for NULL */

int Key;
};

#define NATURAL /* recognize natural runs */

{
struct {
unsigned int level;
} stack[sizeof(int) * 8];
int nstack;

for(nstack = -1 ; ; ) {
linkedlist_t *plist, *qlist, *curr;

/* Can we or do we have to merge lists from the stack? */
if(nstack < 1 || (stack[nstack - 1].level !=
stack[nstack].level && list != NULL)) {
/* ** Push element(s) on the stack ** */
if(list == NULL)
/* This is where we break the loop. We have nstack ==
0 now */
break;

/* ** Push lists of length 1 ** */
curr = list;
list = list->Next;

#ifdef NATURAL
/* Check for a natural run */
for(plist = curr ; list != NULL ; plist = list, list =
list->Next)
if(list->Key < plist->Key)
break;

/* Properly terminate the run */
plist->Next = NULL;
#else
/* Properly terminate list */
curr->Next = NULL;
#endif

/* ** Push this run on the stack ** */
++nstack;
stack[nstack].list = curr;
stack[nstack].level = 0;

continue;
}

/* ** Merge top two entries from stack ** */
plist = stack[nstack].list;
--nstack;
qlist = stack[nstack].list;

/* ** Set up new stack element. This also simplifies the main
loop below ** */
if(qlist->Key < plist->Key) {
curr = qlist; qlist = qlist->Next;
} else {
curr = plist; plist = plist->Next;
}

stack[nstack].list = curr;
stack[nstack].level++;

/* ** Here is the main merging loop ** */
while(plist != NULL && qlist != NULL) {
if(qlist->Key < plist->Key) {
curr->Next = qlist; curr = qlist; qlist = qlist->Next;
} else {
curr->Next = plist; curr = plist; plist = plist->Next;
}
}

/* Add remainder to result list */
curr->Next = plist != NULL ? plist : qlist;
}

/* Here we treat the case of list == NULL */
return nstack == 0 ? stack[nstack].list : NULL;
}
------------------------ snap ----------------------------------------

Jan 24 '07 #24
Richard Harter wrote:
CBFalconer <cb********@yahoo.comwrote:
>Joerg Schoen wrote:

... snip ...
>>>
Of course, as said before, the algorithm is O(n * log(n)), we are
just talking about minimizing the constant factor in front of this.

I think that you will find that an initially reverse sorted list
will eat up stack and time. You are doing a lot of copying, which
is totally unnecessary. Remember, records can be big.

Perhaps you've forgotten that he is talking about sorting linked lists.
There is no copying involved here, just changing links. Moreover the
maximum stack depth is O(log n). It is true that a initially reverse
sorted list is worst case for an ascending runs mergesort; however the
penalty lies in excessive comparisons.
He is talking about building stacks, which involves copying
something somewhere. A normal mergesort is basically iterative,
and only diddles pointers. The stack depth is at most log N. At
least as I recall his description.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Jan 24 '07 #25
On 24 Jan., 09:51, CBFalconer <cbfalco...@yahoo.comwrote:
He is talking about building stacks, which involves copying
something somewhere.
Well, if you call setting variables "copying", I agree that I do copy
data.
But then every program does. The stack in my implementation only
contains a pointer to a list and a stack level, nothing more. No actual
"data" is copied.
A normal mergesort is basically iterative,
and only diddles pointers. The stack depth is at most log N. At
least as I recall his description.
There are recursive implementations and iterative ones of mergesort.

As detailed above, the pure iterative ones are repeatedly running
through
the whole list and have the disadvantage of not being localized thus
not taking advantage of a memory cache. Moreover, there is a lot of
"idle running along the list" in it.

The recursive version of mergesort is localized, but again needs some
idle running.

Jan 24 '07 #26
Ben Pfaff wrote:
>
CBFalconer <cb********@yahoo.comwrites:
Too long and complicated for me. There is an example in about 60
lines, including comments, in wdfreq.c, which in turn is an example
application in hashlib.zip. See:

As long as we're posting links to linked list merge sorts, my
implementation is at
http://cvs.savannah.gnu.org/viewcvs/...pp&view=markup
in function ll_sort().

It's optimized for readability.
This is what I use:

/* BEGIN list_type.h */

#ifndef H_LIST_TYPE_H
#define H_LIST_TYPE_H

struct list_node {
struct list_node *next;
void *data;
};

typedef struct list_node list_type;

int (*compar)(const list_type *, const list_type *));

#endif

/* END list_type.h */

/* BEGIN list_type.c */

#include <stddef.h>

#include "list_type.h"

static int list_sorted(list_type *list,
int (*compar)(const list_type *, const list_type *));
static long unsigned node_count(list_type *head);
static list_type *node_sort (list_type *head, long unsigned count,
int (*compar)(const list_type *, const list_type *));
static list_type *list_split(list_type *head, long unsigned count);
static list_type *list_merge(list_type *head, list_type *tail,
int (*compar)(const list_type *, const list_type *));

int (*compar)(const list_type *, const list_type *))
{
return
}

static int list_sorted(list_type *list,
int (*compar)(const list_type *, const list_type *))
{
if (list != NULL) {
while (list -next != NULL) {
if (compar(list, list -next) 0) {
break;
}
list = list -next;
}
}
return list == NULL || list -next == NULL;
}

static long unsigned node_count(list_type *head)
{
long unsigned count;

for (count = 0; head != NULL; head = head -next) {
++count;
}
return count;
}

static list_type *node_sort(list_type *head, long unsigned count,
int (*compar)(const list_type *, const list_type *))
{
long unsigned half;
list_type *tail;

if (count 1) {
half = count / 2;
tail = list_split(head, half);
tail = node_sort(tail, count - half, compar);
}
}

static list_type *list_split(list_type *head, long unsigned count)
{
list_type *tail;

while (--count != 0) {
}
tail = head -next;
head -next = NULL;
return tail;
}

static list_type *list_merge(list_type *head, list_type *tail,
int (*compar)(const list_type *, const list_type *))
{
list_type *list, *sorted, **node;

node = compar(head, tail) 0 ? &tail : &head;
sorted = list = *node;
*node = sorted -next;
while (*node != NULL) {
node = compar(head, tail) 0 ? &tail : &head;
sorted -next = *node;
sorted = *node;
*node = sorted -next;
}
sorted -next = head != NULL ? head : tail;
return list;
}

/* END list_type.c */
--
pete
Jan 24 '07 #27
Joerg Schoen wrote:
>
Hi folks!

Everyone knows how to sort arrays (e. g. quicksort, heapsort etc.)
For linked lists, mergesort is the typical choice.

While I was looking for a optimized implementation of mergesort for
linked lists, I couldn't find one.
--------------------- snip ------------------------------
#include <stddef.h/* for NULL */

int Key;
};

#define NATURAL /* recognize natural runs */

The only lists that I have ever been paid to sort,
were lists of text that needed to be sorted into alphabetical order.

--
pete
Jan 24 '07 #28
jo**@online.de wrote:
>
.... snip ...
>
Triggered by this discussion I realized that I can still improve
the code and get rid of keeping the lengths of the runs on the
stack. So well, here is my "improved version". I have also
increased the indentation, so it should be better readable.
Moreover, the preprocessor stuff only appears once now.

I still would like to know how to do it with a recursive routine,
currently I have no idea about that.
I suggest you pack this up with your earlier algorithm
documentation and post it somewhere in a zip or tgz file.
Alternatively (and poorer) post them together here in one article.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Jan 24 '07 #29
Richard Harter wrote:
See the discussion in Knuth on merge sorts.
Natural gains when runs are
long and few and loses when runs are short and many. Offhand your
version is going to much more expensive (though still O(n log n))
because you have to rescan the runs.
When I've done timed tests and comparison counts
for mergesorting lists,
I've found that for random order lists,
that checking for ordered sublists, is bad.

In my experience, when I have had to sort lists,
the situations have been that
the lists had a high probability of already being sorted,
so, I check them once to see if they're sorted and
then if they're not, I don't look for sorted subsections any more.

On the other hand, when quicksorting arrays of integers,
in timed tests I have found
that checking for ordered subarrays, pays off.

--
pete
Jan 24 '07 #30

On 24 Jan 2007 00:44:09 -0800, jo**@online.de wrote:
>On 24 Jan., 01:17, c...@tiac.net (Richard Harter) wrote:
>On Tue, 23 Jan 2007 14:54:09 -0800, Ben Pfaff <b...@cs.stanford.edu>
wrote:
However you can use a stack (that's what Joerg Schoen is doing), or with
a bit of cheap cleverness you can do a clean recursive routine.

Triggered by this discussion I realized that I can still improve the
code and
get rid of keeping the lengths of the runs on the stack. So well, here
is my
"improved version". I have also increased the indentation, so it should
be
better readable. Moreover, the preprocessor stuff only appears once
now.

I still would like to know how to do it with a recursive routine,
currently I have
no idea about that.
As a note I don't see this algorithm in Knuth. IMHO it is quite elegant
and has applications beyond mergesorting. I doubt that it is new; it is
too simple. However I've never seen it in an algorithms text. I
propose to call the general idea Schoen's algorithm.

Here is the outline of the iterative/recursive version in pseudocode.
There are two routines, alpha and beta. Alpha is the iterative outer
loop that is executed log n times. Beta is the recursive part; it
implements the mergesort algorithm as outlined by each of us.

/* each call to beta consumes exponentially growing pieces of L,
sorts them. and adds them to the output.
*/

function alpha (list L) returns list
level = 0
out = nil
while (L)
append beta(level,L) to out
level ++
end while
return out

/* At level k 0 beta makes two calls to beta at k-1, merges the
result,
and returns the merged list. At level 0 it extracts two runs from L,
merges them, and returns the merged list. At any point if there is a

nil return from the first call/extract the second call/extract is not
*/

function beta (int level, list L) returns list
if (not L) return nil
if (level <= 0)
run_1 = extract(L)
if (not L) return run_1
run_2 = extract(L)
else
run_1 = beta(level-1, L)
run_2 = beta(level-1, L)
return merge(run_1,run_2)

The extract function could either extract an ascending run or a
single item from L; the extracted items are removed from L. The
merge function merges two sorted lists and returns the merged list.

One important point in this pseudo code is that L is global to the
various functions. How to avoid this is, AFICT, language dependent.
In procedural languages it is probably best to manually simulate
recursion as (in effect) Joerg Schoen did. I'm not sure exactly how to
implement it in a functional language, but it should be easy.
Jan 24 '07 #31
Eventually, I came up with my own implementation (see below) which seems
to
perform quite well. The algorithm is my own idea - at least I don't know
if
anybody else has come up with this before.
It is known. This is natural mergesort. Robert Sedgewick talked about it in
one of his books.
You can't do a natural mergesort for arrays but it is dead simple for linked
lists.
You exploit any pre-existing order anywhere. I would be exceptionally
exceptionally surprised if it is not discussed in Kuth's 3rd volume
somewhere.

Best of all, the sort will be O(N) if it is already sorted and it does well
on data that has partial orders.
The worst case for natural mergesort is if the data is in reverse order to
start with. There are no "runs" then.
I would be tempted to do a hybrid where all runs were at least 4 elements no
matter what.
This would be to guard against the reverse order distribution to start with.

Stephen Howe
Jan 25 '07 #32
On Thu, 25 Jan 2007 19:38:12 -0000, "Stephen Howe"
<sjhoweATdialDOTpipexDOTcomwrote:
>Eventually, I came up with my own implementation (see below) which seems
to
>perform quite well. The algorithm is my own idea - at least I don't know
if
>anybody else has come up with this before.

It is known. This is natural mergesort. Robert Sedgewick talked about it in
one of his books.
You can't do a natural mergesort for arrays but it is dead simple for linked
lists.
You exploit any pre-existing order anywhere. I would be exceptionally
exceptionally surprised if it is not discussed in Kuth's 3rd volume
somewhere.
Um, no, you miss the essential point. The uniqueness lies in the way in
which he accesses the data, not in the fact that he is using natural
mergesort. In fact he conditionalizes whether it is natural or not. It
isn't true that you can't do a natural mergesort for arrays.
>
Best of all, the sort will be O(N) if it is already sorted and it does well
on data that has partial orders.
The worst case for natural mergesort is if the data is in reverse order to
start with. There are no "runs" then.
I would be tempted to do a hybrid where all runs were at least 4 elements no
matter what.
This would be to guard against the reverse order distribution to start with.

Stephen Howe

Jan 25 '07 #33
CBFalconer wrote:
jo**@online.de wrote:
>>
Thanks for the comment! Your implementation is basically the one from
Sedgewick's book. It has the disadvantage that you have to run through
the linked lists repeatedly to split it, but it is also localized, i.e.
takes advantage of caches.

Well, I read Sedgewick long ago :-). However I fail to see how you
can do a merge without first splitting.
Split and merge as you go. See TAOCP (vol III) exercise
5.2.4-17, where the idea is credited to John McCarthy. The
basic McCarthy method has peaks of inefficiency when the list
length is one greater than a power of two (more generally, when
the binary representation of the node count has 1-bits that are
separated by runs of 0-bits), but I've found[*] a simple way to
fix them without losing much speed in the nicer cases. Code
available on request.
[*] I also "found" McCarthy's method for myself before
reading it in Knuth, and for all I know others have found "my"
improvement before I did. The original is a truly pretty
algorithm, a real "Why didn't I think of that?" piece of work.
Highly recommended!

--
Eric Sosman
es*****@acm-dot-org.invalid

Jan 26 '07 #34
Eric Sosman wrote:
CBFalconer wrote:
>jo**@online.de wrote:
>>>
Thanks for the comment! Your implementation is basically the one
from Sedgewick's book. It has the disadvantage that you have to
run through the linked lists repeatedly to split it, but it is
also localized, i.e. takes advantage of caches.

Well, I read Sedgewick long ago :-). However I fail to see how
you can do a merge without first splitting.

Split and merge as you go. See TAOCP (vol III) exercise
5.2.4-17, where the idea is credited to John McCarthy. The
basic McCarthy method has peaks of inefficiency when the list
length is one greater than a power of two (more generally, when
the binary representation of the node count has 1-bits that are
separated by runs of 0-bits), but I've found[*] a simple way to
fix them without losing much speed in the nicer cases. Code
available on request.
[*] I also "found" McCarthy's method for myself before
reading it in Knuth, and for all I know others have found "my"
improvement before I did. The original is a truly pretty
algorithm, a real "Why didn't I think of that?" piece of work.
Highly recommended!
My Knuth is buried in boxes somewhere. Sounds as if the split
mechanism is something like take alternate entries, and my instinct
tells me that this will disturb the stability of the sort, which in
turn is one of the most attractive features of mergesort.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Jan 26 '07 #35
On Fri, 26 Jan 2007 09:42:06 -0500, Eric Sosman
<es*****@acm-dot-org.invalidwrote:
>CBFalconer wrote:
>jo**@online.de wrote:
>>>
Thanks for the comment! Your implementation is basically the one from
Sedgewick's book. It has the disadvantage that you have to run through
the linked lists repeatedly to split it, but it is also localized, i.e.
takes advantage of caches.

Well, I read Sedgewick long ago :-). However I fail to see how you
can do a merge without first splitting.

Split and merge as you go. See TAOCP (vol III) exercise
5.2.4-17, where the idea is credited to John McCarthy. The
basic McCarthy method has peaks of inefficiency when the list
length is one greater than a power of two (more generally, when
the binary representation of the node count has 1-bits that are
separated by runs of 0-bits), but I've found[*] a simple way to
fix them without losing much speed in the nicer cases. Code
available on request.
[*] I also "found" McCarthy's method for myself before
reading it in Knuth, and for all I know others have found "my"
improvement before I did. The original is a truly pretty
algorithm, a real "Why didn't I think of that?" piece of work.
Highly recommended!
Good catch. I knew it had to be in Knuth somewhere, but some how I
missed it. That is what Schoen is doing, modulo minor variations.
Jan 26 '07 #36
CBFalconer wrote:
Eric Sosman wrote:
>CBFalconer wrote:
>>jo**@online.de wrote:
Thanks for the comment! Your implementation is basically the one
from Sedgewick's book. It has the disadvantage that you have to
run through the linked lists repeatedly to split it, but it is
also localized, i.e. takes advantage of caches.
Well, I read Sedgewick long ago :-). However I fail to see how
you can do a merge without first splitting.
Split and merge as you go. See TAOCP (vol III) exercise
5.2.4-17, where the idea is credited to John McCarthy. The
basic McCarthy method has peaks of inefficiency when the list
length is one greater than a power of two (more generally, when
the binary representation of the node count has 1-bits that are
separated by runs of 0-bits), but I've found[*] a simple way to
fix them without losing much speed in the nicer cases. Code
available on request.

[*] I also "found" McCarthy's method for myself before
reading it in Knuth, and for all I know others have found "my"
improvement before I did. The original is a truly pretty
algorithm, a real "Why didn't I think of that?" piece of work.
Highly recommended!

My Knuth is buried in boxes somewhere. Sounds as if the split
mechanism is something like take alternate entries, and my instinct
tells me that this will disturb the stability of the sort, which in
turn is one of the most attractive features of mergesort.
Unbury. "Take alternate entries" just changes the pattern
of splitting (unstably), not the number of split passes. It
makes the same number of node visits (= link traversals) as
"split in the middle" does.

The nice thing about McCarthy's "split mechanism" is that it
makes one count them one pass over the full list instead of the
lg(N) passes of "split in the middle" or "alternate entries."
The resulting sort is stable (assuming you don't do anything dumb).

The un-nice thing about McCarthy's method is that after
the initial phase of perfectly-balanced power-of-two merges
you're left with a cleanup phase in which the merges are not
so well balanced. Sorting a 1025-node list, for example, uses
perfectly-balanced merges to sort the first 1024 items, then
performs a 1024-plus-1 merge to insert the final node: 512.9990+
comparisons (average) to choose among just 1026 possible outcomes.
Inserting another "stage" into McCarthy produces

one list each of 512, 256, ..., 4 nodes
two lists of 2 nodes
one list of 1 node

.... just before the cleanup phase. The cleanup merges are then
2+1, 2+3, 4+5, 8+9, ..., 512+513, each as close to balanced as
an odd number of nodes can be.

--
Eric Sosman
es*****@acm-dot-org.invalid
Jan 26 '07 #37
Eric Sosman wrote:
Unbury. "Take alternate entries" just changes the pattern
of splitting (unstably), not the number of split passes. It
makes the same number of node visits (= link traversals) as
"split in the middle" does.

The nice thing about McCarthy's "split mechanism" is that it
makes one count them one pass over the full list instead of the
lg(N) passes of "split in the middle" or "alternate entries."
The resulting sort is stable (assuming you don't do anything dumb).

The un-nice thing about McCarthy's method is that after
the initial phase of perfectly-balanced power-of-two merges
you're left with a cleanup phase in which the merges are not
so well balanced. Sorting a 1025-node list, for example, uses
perfectly-balanced merges to sort the first 1024 items, then
performs a 1024-plus-1 merge to insert the final node: 512.9990+
comparisons (average) to choose among just 1026 possible outcomes.
Inserting another "stage" into McCarthy produces

one list each of 512, 256, ..., 4 nodes
two lists of 2 nodes
one list of 1 node

... just before the cleanup phase. The cleanup merges are then
2+1, 2+3, 4+5, 8+9, ..., 512+513, each as close to balanced as
an odd number of nodes can be.
Sounds interesting, though I don't understand it fully and
I don't have access to any copy of Knuth's books. Code might
be interesting, though. What about posting it or making it

If I get it correctly, the algorithm you are talking about
is unstable, works well only for powers of two (except with
some special modification to avoid this) and doesn't
recognize natural runs in the input? I would like to test
it against mine to see how it performs.

Jan 26 '07 #38
On Fri, 26 Jan 2007 10:51:13 -0500, CBFalconer <cb********@yahoo.com>
wrote:
>Eric Sosman wrote:
>CBFalconer wrote:
>>jo**@online.de wrote:

Thanks for the comment! Your implementation is basically the one
from Sedgewick's book. It has the disadvantage that you have to
run through the linked lists repeatedly to split it, but it is
also localized, i.e. takes advantage of caches.

Well, I read Sedgewick long ago :-). However I fail to see how
you can do a merge without first splitting.

Split and merge as you go. See TAOCP (vol III) exercise
5.2.4-17, where the idea is credited to John McCarthy. The
basic McCarthy method has peaks of inefficiency when the list
length is one greater than a power of two (more generally, when
the binary representation of the node count has 1-bits that are
separated by runs of 0-bits), but I've found[*] a simple way to
fix them without losing much speed in the nicer cases. Code
available on request.

[*] I also "found" McCarthy's method for myself before
reading it in Knuth, and for all I know others have found "my"
improvement before I did. The original is a truly pretty
algorithm, a real "Why didn't I think of that?" piece of work.
Highly recommended!

My Knuth is buried in boxes somewhere. Sounds as if the split
mechanism is something like take alternate entries, and my instinct
tells me that this will disturb the stability of the sort, which in
turn is one of the most attractive features of mergesort.
You're thinking of his algorithm L. He does split the list using
alternate entries. I think you're right about L not being stable,
though there may be some hidden trick in his description. (I really
hate Knuth's algorithm descriptions.) However the McCarthy algorithm is
stable.

Jan 26 '07 #39
On Fri, 26 Jan 2007 20:42:45 +0100, Joerg Schoen
<jo************@online.dewrote:
>Eric Sosman wrote:
> Unbury. "Take alternate entries" just changes the pattern
of splitting (unstably), not the number of split passes. It
makes the same number of node visits (= link traversals) as
"split in the middle" does.

The nice thing about McCarthy's "split mechanism" is that it
makes one count them one pass over the full list instead of the
lg(N) passes of "split in the middle" or "alternate entries."
The resulting sort is stable (assuming you don't do anything dumb).

The un-nice thing about McCarthy's method is that after
the initial phase of perfectly-balanced power-of-two merges
you're left with a cleanup phase in which the merges are not
so well balanced. Sorting a 1025-node list, for example, uses
perfectly-balanced merges to sort the first 1024 items, then
performs a 1024-plus-1 merge to insert the final node: 512.9990+
comparisons (average) to choose among just 1026 possible outcomes.
Inserting another "stage" into McCarthy produces

one list each of 512, 256, ..., 4 nodes
two lists of 2 nodes
one list of 1 node

... just before the cleanup phase. The cleanup merges are then
2+1, 2+3, 4+5, 8+9, ..., 512+513, each as close to balanced as
an odd number of nodes can be.

Sounds interesting, though I don't understand it fully and
I don't have access to any copy of Knuth's books. Code might
be interesting, though. What about posting it or making it

If I get it correctly, the algorithm you are talking about
is unstable, works well only for powers of two (except with
some special modification to avoid this) and doesn't
recognize natural runs in the input? I would like to test
it against mine to see how it performs.
You don't get it correctly. You are implementing McCarthy's algorithm.
He's talking about the same thing that you are doing, except that he is
making a slight modification at the end to avoid some final inefficient
merges. His version is stable. Recognizing natural runs in the input
is irrelevant. The basic unit in McCarthy's algorithm is an ascending
run. It can be a run of length 1 or a natural ascending run.

Jan 26 '07 #40
On Fri, 26 Jan 2007 13:04:00 -0500, Eric Sosman
<es*****@acm-dot-org.invalidwrote:

[snip clever cleanup of MCrthy's algorithm]

Very pretty.
Can you describe the general case or is it an exercise for the reader?

Jan 26 '07 #41
Joerg Schoen wrote:
Eric Sosman wrote:
> Unbury. "Take alternate entries" just changes the pattern
of splitting (unstably), not the number of split passes. It
makes the same number of node visits (= link traversals) as
"split in the middle" does.

The nice thing about McCarthy's "split mechanism" is that it
makes one count them one pass over the full list instead of the
lg(N) passes of "split in the middle" or "alternate entries."
The resulting sort is stable (assuming you don't do anything dumb).

The un-nice thing about McCarthy's method is that after
the initial phase of perfectly-balanced power-of-two merges
you're left with a cleanup phase in which the merges are not
so well balanced. Sorting a 1025-node list, for example, uses
perfectly-balanced merges to sort the first 1024 items, then
performs a 1024-plus-1 merge to insert the final node: 512.9990+
comparisons (average) to choose among just 1026 possible outcomes.
Inserting another "stage" into McCarthy produces

one list each of 512, 256, ..., 4 nodes
two lists of 2 nodes
one list of 1 node

... just before the cleanup phase. The cleanup merges are then
2+1, 2+3, 4+5, 8+9, ..., 512+513, each as close to balanced as
an odd number of nodes can be.

Sounds interesting, though I don't understand it fully and
I don't have access to any copy of Knuth's books. Code might
be interesting, though. What about posting it or making it

If I get it correctly, the algorithm you are talking about
is unstable, works well only for powers of two (except with
some special modification to avoid this) and doesn't
recognize natural runs in the input? I would like to test
it against mine to see how it performs.
#include <stddef.h>
#include <limits.h>

typedef struct unknown Node;
#define NEXT(ptr) *(Node**)((char*)(ptr) + offset)
#define MAXBITS (sizeof(Node*) * CHAR_BIT)

static void *
merge(Node *p, Node *q,
size_t offset,
int (*compare)(const void *, const void *))
/*
* Internal routine to merge two non-empty sorted lists. If elements
* of the two lists compare equal, those from the `p' list will precede
* those from the `q' list in the merged output.
*/
{

for (;;) {
if (compare(p, q) <= 0) {
*tail = p;
tail = &NEXT(p);
if ( (p = NEXT(p)) == NULL ) {
*tail = q;
break;
}
}
else {
*tail = q;
tail = &NEXT(q);
if ( (q = NEXT(q)) == NULL ) {
*tail = p;
break;
}
}
}
}

void * /* returns: pointer to new list head */
listsort(
void *head, /* first item in original list */
size_t offset, /* byte offset of link field in each item */
int (*compare) /* item comparison function */
(const void *, const void *))
/*
* listsort() rearranges the links of a singly-linked NULL-terminated
* list so they traverse the items in an order defined by a caller-
* provided comparison function, and returns a pointer to the first
* item in the rearranged list. The comparison function accepts two
* item pointers and returns a negative, zero, or positive integer to
* indicate that the first item compares less than, equal to, or greater
* than the second. The sort is stable.
*/
{
Node *list[MAXBITS-1]; /* sorted sub-lists of 2,4,8,... items */
Node *p, *q;
int bits, maxbits;

list[ maxbits = 0 ] = NULL;
while ( (p = head) != NULL ) {

if ( (q = NEXT(p)) == NULL )
break;

if (compare(p, q) <= 0) {
NEXT(q) = NULL;
}
else {
NEXT(q) = p;
NEXT(p) = NULL;
p = q;
}

for (bits = 0; (q = list[bits]) != NULL; ) {
list[bits] = NULL;
p = merge(q, p, offset, compare);
if (++bits maxbits) {
maxbits = bits;
break;
}
}
list[bits] = p;
}

for (bits = 0; bits <= maxbits; ++bits) {
if ( (p = list[bits]) != NULL )
head = (head == NULL) ? p : merge(p, head, offset, compare);
}
}

Notes:

1) This is packaged as a type-blind "generic" list sort, the
only assumptions being that the links are struct pointers and
that a list's final node has a NULL link. A type-aware version
(with knowledge of the nodes' structure and able to compare nodes
in-line) should run faster.

2) The sort doesn't actually need a comparison function that
returns -ve/zero/+ve: a Boolean `precedes' function would do as
well and could be faster. In this sort, I decided to stick with
a qsort-and-bsearch style, mostly because that's what people are
accustomed to.

3) This code implements the basic McCarthy method without
protection against badly unbalanced merges in the cleanup phase.

3a) Well, not the "most basic" McCarthy: It sorts two-node
lists in-line instead of merging two one-node lists.

3b) Knuth suggests that "we can sort groups of, say, 16 items
using straight insertion," but in tests I've conducted and on the
machines available to me that doesn't seem to be a good idea. (He
makes the suggestion in connection with merge using arrays instead
of lists, and perhaps things would be different in that setting.)

4) The McCarthy algorithm may be more understandable if you
note the parallel to the process of counting in binary. When N
items (or item pairs, with the 3a optimization) have been consumed,
the elements of list[] correspond to the binary representation of
N: each zero bit corresponds to a NULL in list[], and each one bit
to a non-NULL link to a sorted sub-list of N items (2*N with 3a).
Consuming the next item (or pair) "increments" N by working from
right to left: if the low-order bit is zero, the new item lands
in list[0]. Otherwise, it is merged with list[0], which becomes
NULL, and then the new two-item (two-pair) list "carries" to the
list[1] place, and so on.

5) To avoid wildly unbalanced merges during cleanup, use two
arrays alist[] and blist[] instead of the single array list[].
The "incrementing" process first tries to fill the NULL link in
alist[0], but if it's non-NULL it uses blist[0] instead. If
both are non-NULL, it merges those two lists, deposits the new
sub-list in alist[0] and NULLs blist[0], then "carries" the merge
result one place to the left:

for (bits = 0; ; p = q) {
if (alist[bits] == NULL) {
alist[bits] = p;
break;
}
if (blist[bits] == NULL) {
blist[bits] = p;
break;
}
q = merge(alist[bits], blist[bits], offset, compare);
alist[bits] = p;
blist[bits] = NULL;
if (++bits maxbits) {
maxbits = bits;
alist[bits] = q;
blist[bits] = NULL;
break;
}
}

In my tests, this method is just a hair slower than the original
on lists of length 2^k or 2^k-1, but shows a substantial gain for
lengths like 2^k+1. It is the fastest "general-purpose" list
merge I know.

6) All the versions mentioned above are "straight" merges.
"Natural" merges that make use of the order already present in
the input list can beat the pants off them *if* the list is in
very good order already. However, in my tests the pre-existing
order must be very good indeed for natural to beat straight;
Knuth's remark in the answer to exercise 5.2.4-12

"We may conclude that natural merging is preferable to
straight merging when linked allocation is being used,
although it was inferior for sequential allocation."

is not supported by my data. (Hypothesis: Knuth was thinking
about the number of comparisons during the merges themselves,
but forgot about the N-1 comparisons required to discover the
boundaries between the initial runs. On "random" input, a
natural merge uses N-1 comparisons to create approximately N/2
initial runs, while a straight merge uses only N/2 comparisons
to make the same amount of progress.)

7) I've been working on and off -- mostly off -- on a paper
describing test results for various kinds of linked-list merges,
but I'm not ready to "publish," however informally. When last
I set it aside, I was having a hard time writing a linked-list
Quicksort that didn't exhibit pathological behavior. Maybe if
I convince myself it's just not possible, I'll finish the damn'
thing and post it.

8) An interesting (and possibly discouraging) observation:
Efficient linked-list sorting is relatively unimportant! First,
sorting a linked list brings very little "algorithmic advantage"
the way sorting an array does: In a sorted array you can use
things like binary search, but sorting a linked list offers no
similar gain. (Sorting cuts the average time for an unsuccessful
search in half, but if N is large there are better ways to search.)
Second, linked lists are "unfriendly" to the memory implementations
of contemporary systems -- bad locality, cache thrashing, etc. --
so a good programmer usually tries to keep them short; if N is
small, even an inefficient sort will be quick. Yes, there are
circumstances where an efficient linked-list sort is needed, but
IMHO they are (or should be) on the rare side.

--
Eric Sosman
es*****@acm-dot-org.invalid
Jan 26 '07 #42
Eric Sosman wrote:
3b) Knuth suggests that "we can sort groups of, say, 16 items
using straight insertion," but in tests I've conducted and on the
machines available to me that doesn't seem to be a good idea. (He
makes the suggestion in connection with merge using arrays instead
of lists, and perhaps things would be different in that setting.)
I agree. As mentioned above, I made the same observation. Earlier
in this threadm, wa**@stoner.com gave a useful explanations on this
phenomenon
4) The McCarthy algorithm may be more understandable if you
note the parallel to the process of counting in binary. When N
items (or item pairs, with the 3a optimization) have been consumed,
the elements of list[] correspond to the binary representation of
N: each zero bit corresponds to a NULL in list[], and each one bit
to a non-NULL link to a sorted sub-list of N items (2*N with 3a).
Consuming the next item (or pair) "increments" N by working from
right to left: if the low-order bit is zero, the new item lands
in list[0]. Otherwise, it is merged with list[0], which becomes
NULL, and then the new two-item (two-pair) list "carries" to the
list[1] place, and so on.
I can now see that my implementation is basically a reformulation of
McCarthys algorithm. Thanks for revealing this! Your implementations is
more clever, though it takes a bit longer to comprehend. Thanks for the
explanation!
5) To avoid wildly unbalanced merges during cleanup, use two
arrays alist[] and blist[] instead of the single array list[].
The "incrementing" process first tries to fill the NULL link in
alist[0], but if it's non-NULL it uses blist[0] instead. If
both are non-NULL, it merges those two lists, deposits the new
sub-list in alist[0] and NULLs blist[0], then "carries" the merge
result one place to the left:
.. snip ..
In my tests, this method is just a hair slower than the original
on lists of length 2^k or 2^k-1, but shows a substantial gain for
lengths like 2^k+1. It is the fastest "general-purpose" list
merge I know.
You don't mention how you modify the final "cleanup" loop. I did it like
that:

for(bits = 0 ; bits <= maxbits ; ++bits) {
if((p = alist[bits]) != NULL)
head = (head == NULL) ? p : merge(p, head, offset, compare);

if((p = blist[bits]) != NULL)
head = (head == NULL) ? p : merge(p, head, offset, compare);
}

Maybe there is another / better way? At least the sorting then works.
But I cannot confirm your tests. Here are my results for three
different lengths. (1) sorts a random list, (2) random list with each
value 10 times, (3) an ordered list, and (4) and invers ordered list.

I compare my implementation against the code you supplied ("McCarthy")
and the improved version ("McCarthy II").

Length | mine | McCarthy | McCarthy II|
--------+---------+-----------+------------+
4*10^6 | | | |
(1) | 7.6 | 7.55 | 8.26 |
(2) | 7.67 | 7.63 | 8.32 |
(3) | 0.64 | 0.63 | 0.73 |
(4) | 0.59 | 0.58 | 0.88 |
--------+---------+-----------+------------+
2^22 | | | |
(1) | 8.02 | 7.97 | 8.27 |
(2) | 8.09 | 8.04 | 8.34 |
(3) | 0.67 | 0.65 | 0.77 |
(4) | 0.63 | 0.63 | 0.81 |
--------+---------+-----------+------------+
2^22+1 | | | |
(1) | 8.66 | 8.63 | 8.25 |
(2) | 9.17 | 9.11 | 8.31 |
(3) | 0.79 | 0.77 | 0.74 |
(4) | 0.62 | 0.64 | 0.79 |
--------+---------+-----------+------------+

As you can see, "McCarthy" and "mine" are basically the same, yours is a
bit faster in all cases.

But the improved one with two lists "alist" and "blist" is slower all
the time except for the 2^n+1 case.
6) All the versions mentioned above are "straight" merges.
"Natural" merges that make use of the order already present in
the input list can beat the pants off them *if* the list is in
very good order already. However, in my tests the pre-existing
order must be very good indeed for natural to beat straight;
Knuth's remark in the answer to exercise 5.2.4-12

"We may conclude that natural merging is preferable to
straight merging when linked allocation is being used,
although it was inferior for sequential allocation."

is not supported by my data. (Hypothesis: Knuth was thinking
about the number of comparisons during the merges themselves,
but forgot about the N-1 comparisons required to discover the
boundaries between the initial runs. On "random" input, a
natural merge uses N-1 comparisons to create approximately N/2
initial runs, while a straight merge uses only N/2 comparisons
to make the same amount of progress.)
I disagree. In my tests, the natural merges are always faster or only
slightly slower. Please note that the additional compares to check for
natural runs is only O(N) which looks affordable to me.

As mentioned before in this thread, the natural merge excels in case the
list is sorted and looses a few percent in the worst case, i. e. invers
ordered list.
8) An interesting (and possibly discouraging) observation:
Efficient linked-list sorting is relatively unimportant! First,
sorting a linked list brings very little "algorithmic advantage"
the way sorting an array does: In a sorted array you can use
things like binary search, but sorting a linked list offers no
similar gain. (Sorting cuts the average time for an unsuccessful
search in half, but if N is large there are better ways to search.)
Second, linked lists are "unfriendly" to the memory implementations
of contemporary systems -- bad locality, cache thrashing, etc. --
so a good programmer usually tries to keep them short; if N is
small, even an inefficient sort will be quick. Yes, there are
circumstances where an efficient linked-list sort is needed, but
IMHO they are (or should be) on the rare side.
Agreed, linked list sorting is a bit questionable, so one shouldn't put
too much effort into it. But I actually have some cases where I deal
with doubly linked lists as a usefull general purpose data structure in
a large application program and also want to provide a means to sort it
(e. g. before generating output of the list). Here I wanted to have an
implementation I shouldn't be ashamed off :)

Regarding locality, the McCarthy algortihm is still the best one can
achieve.

Jan 27 '07 #43
Joerg Schoen wrote:
Eric Sosman wrote:
.... snip ...
>
>8) An interesting (and possibly discouraging) observation:
Efficient linked-list sorting is relatively unimportant! First,
sorting a linked list brings very little "algorithmic advantage"
the way sorting an array does: In a sorted array you can use
things like binary search, but sorting a linked list offers no
similar gain. (Sorting cuts the average time for an unsuccessful
search in half, but if N is large there are better ways to search.)
Second, linked lists are "unfriendly" to the memory implementations
of contemporary systems -- bad locality, cache thrashing, etc. --
so a good programmer usually tries to keep them short; if N is
small, even an inefficient sort will be quick. Yes, there are
circumstances where an efficient linked-list sort is needed, but
IMHO they are (or should be) on the rare side.

Agreed, linked list sorting is a bit questionable, so one shouldn't
put too much effort into it. But I actually have some cases where I
deal with doubly linked lists as a usefull general purpose data
structure in a large application program and also want to provide a
means to sort it (e. g. before generating output of the list). Here
I wanted to have an implementation I shouldn't be ashamed off :)
If you want to search a table of some form, a hashtable is the most
efficient, since O(1) beats O(logN) by a wide margin. However you
may want to dump the table in sorted order, which can be awkward.
The cure is to be able to form a list of the table content, sort
it, and dump from that. That is easily implemented by simply
including a single spare pointer in the table items. See the
demonstration programs for hashlib on my site. The hash table has
to have a means of walking its content. The overall process is
O(NlogN).

As an extension, you can include two spare pointers and form a
tree. This would probably be the best method when you want to
search for a range, rather than a specific item.

This brings up another consideration - worst case timing. While
quicksort is slighly faster than mergesort, it has a horrible
O(N*N) worst case. With mergesort, the average run time and the
worst case are both O(NlogN). This again makes it an optimum
choice for real-time systems (not to mention stability). AFAIK no
in-place sort can do better (the in place eliminates radix sort).

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
Jan 27 '07 #44
Joerg Schoen wrote:

[in E-mail and on Usenet; either is fine, but both risks confusion]
[...]
You don't mention how you modify the final "cleanup" loop. I did it like
that:

for(bits = 0 ; bits <= maxbits ; ++bits) {
if((p = alist[bits]) != NULL)
head = (head == NULL) ? p : merge(p, head, offset, compare);

if((p = blist[bits]) != NULL)
head = (head == NULL) ? p : merge(p, head, offset, compare);
}
That will sort, but it ruins stability. Exchange the two
`if' statements and stability is restored. (Stability is not
the most important characteristic of a sort, but when it can be
had for free there's no reason not to preserve it.)
I compare my implementation against the code you supplied ("McCarthy")
and the improved version ("McCarthy II").

Length | mine | McCarthy | McCarthy II|
--------+---------+-----------+------------+
4*10^6 | | | |
(1) | 7.6 | 7.55 | 8.26 |
(2) | 7.67 | 7.63 | 8.32 |
(3) | 0.64 | 0.63 | 0.73 |
(4) | 0.59 | 0.58 | 0.88 |
--------+---------+-----------+------------+
2^22 | | | |
(1) | 8.02 | 7.97 | 8.27 |
(2) | 8.09 | 8.04 | 8.34 |
(3) | 0.67 | 0.65 | 0.77 |
(4) | 0.63 | 0.63 | 0.81 |
--------+---------+-----------+------------+
2^22+1 | | | |
(1) | 8.66 | 8.63 | 8.25 |
(2) | 9.17 | 9.11 | 8.31 |
(3) | 0.79 | 0.77 | 0.74 |
(4) | 0.62 | 0.64 | 0.79 |
--------+---------+-----------+------------+

As you can see, "McCarthy" and "mine" are basically the same, yours is a
bit faster in all cases.

But the improved one with two lists "alist" and "blist" is slower all
the time except for the 2^n+1 case.
My tests concentrated on much shorter lists -- as I mentioned
up-thread, I believe it is usually a bad idea to sort very long
lists (or even to build them). I used twenty-seven assorted list
lengths from 17 to 6144, with keys in unbiased order or biased
toward rising or falling order, with "fast" and "slow" comparison
functions, on five machines covering multiple generations of x86
and UltraSPARC architectures, and with three different compilers.
I also made an effort to scatter the list nodes in memory, trying
to make the links bounce around across different pages as a real
list might (a list whose nodes all come from one big array may
exhibit too much cache locality).

Twenty-seven sort implementations of six algorithms were tested
(as I mentioned earlier, I feel there are a few more that ought to
be tried for completeness' sake). The testing used a "tournament"
format: Try all the implementations on a few test combinations,
eliminate the slowest few, try the survivors on more combinations,
prune again, and so on. Six methods made it to the final round
every time; eight methods never survived the first round.

Modified McCarthy was the overall winner in five of the seven
tested configurations, taking a second and a third place in the
other two. Basic McCarthy won in one configuration, with two
seconds, three thirds, and one did-not-place. Other high finishers
were a Modified McCarthy that insertion-sorted groups of four nodes
prior to merging (one first place, one second, one third) and a
modified natural merge (three second places).

My choice of test data may have given Modified McCarthy an
unfair advantage: nine of the tested lengths were of the form
2^k+1, and another nine looked like 3*2^k (== 2^(k+1) + 2^k).
In my zeal to avoid the unbalanced merges of Basic McCarthy I may
have given too much emphasis to lengths it doesn't like and thus
shown Modified McCarthy in too favorable a light. The choice of
test data is often the weakest point -- or at least the most
debatable -- in a timing investigation.
>Knuth's remark in the answer to exercise 5.2.4-12

"We may conclude that natural merging is preferable to
straight merging when linked allocation is being used,
although it was inferior for sequential allocation."

is not supported by my data. (Hypothesis: Knuth was thinking
about the number of comparisons during the merges themselves,
but forgot about the N-1 comparisons required to discover the
boundaries between the initial runs. On "random" input, a
natural merge uses N-1 comparisons to create approximately N/2
initial runs, while a straight merge uses only N/2 comparisons
to make the same amount of progress.)

I disagree. In my tests, the natural merges are always faster or only
slightly slower. Please note that the additional compares to check for
natural runs is only O(N) which looks affordable to me.
Both N-1 and floor(N/2) are O(N), so Big-Oh doesn't tell enough
of the story.
As mentioned before in this thread, the natural merge excels in case the
list is sorted and looses a few percent in the worst case, i. e. invers
ordered list.
On a reversed list, natural merge makes N-1 comparisons to form
N one-node sub-lists. Thereafter, it follows exactly the same pattern
of merges a straight merge would. In the worst case, then, natural
merge makes N-1 more comparisons than straight merge.

--
Eric Sosman
es*****@acm-dot-org.invalid
Jan 28 '07 #45
Eric Sosman wrote:
[in E-mail and on Usenet; either is fine, but both risks confusion]
wasn't sure you read usenet regularly. I'll promise not doing it again.
That will sort, but it ruins stability. Exchange the two
`if' statements and stability is restored. (Stability is not
the most important characteristic of a sort, but when it can be
had for free there's no reason not to preserve it.)
Thanks for the hint. I didn't think much of it, just tried to get the
program to work.
>As you can see, "McCarthy" and "mine" are basically the same, yours
is a bit faster in all cases.

But the improved one with two lists "alist" and "blist" is slower all
the time except for the 2^n+1 case.

My tests concentrated on much shorter lists -- as I mentioned
up-thread, I believe it is usually a bad idea to sort very long
lists (or even to build them). I used twenty-seven assorted list
lengths from 17 to 6144, with keys in unbiased order or biased
toward rising or falling order, with "fast" and "slow" comparison
functions, on five machines covering multiple generations of x86
and UltraSPARC architectures, and with three different compilers.
I also made an effort to scatter the list nodes in memory, trying
to make the links bounce around across different pages as a real
list might (a list whose nodes all come from one big array may
exhibit too much cache locality).
Agreed that long linked lists and sorting them is odd, but for the sake
of testing the algorithm it sounds OK to me. Moreover, long lists test
locality. Maybe I should also disperse my lists throughout memory to
see the effect. But my understanding of a CPU cache is that the lower
bits of the address are used as cache addresses, so e. g. putting each
node at the start of a page would trash the cache. Having them in an
array should have the same result as wildly dispersing them in memory.
I'll try it and let you know.

Still it looks odd to me that for an "arbitrary" number like 4 * 10^6
the results become worse for the "improved McCarthy".

I have also tested my implementation on a variety of hardware including
AIX, HP-UX, SunOS, etc. If time permits, I will also test yours.
Twenty-seven sort implementations of six algorithms were tested
(as I mentioned earlier, I feel there are a few more that ought to
be tried for completeness' sake). The testing used a "tournament"
format: Try all the implementations on a few test combinations,
eliminate the slowest few, try the survivors on more combinations,
prune again, and so on. Six methods made it to the final round
every time; eight methods never survived the first round.

Modified McCarthy was the overall winner in five of the seven
tested configurations, taking a second and a third place in the
other two. Basic McCarthy won in one configuration, with two
seconds, three thirds, and one did-not-place. Other high finishers
were a Modified McCarthy that insertion-sorted groups of four nodes
prior to merging (one first place, one second, one third) and a
modified natural merge (three second places).

My choice of test data may have given Modified McCarthy an
unfair advantage: nine of the tested lengths were of the form
2^k+1, and another nine looked like 3*2^k (== 2^(k+1) + 2^k).
In my zeal to avoid the unbalanced merges of Basic McCarthy I may
have given too much emphasis to lengths it doesn't like and thus
shown Modified McCarthy in too favorable a light. The choice of
test data is often the weakest point -- or at least the most
debatable -- in a timing investigation.
What about trying all possible lengths in a range from 2^k .. 2^k+1 or
so and produce a graph that compares different algorithms?

I agree that your choice maybe favoured modified McCarthy too much.
>I disagree. In my tests, the natural merges are always faster or only
slightly slower. Please note that the additional compares to check
for natural runs is only O(N) which looks affordable to me.

Both N-1 and floor(N/2) are O(N), so Big-Oh doesn't tell enough
of the story.
I don't quite get this. Mergesort is O(N * log(N)), adding another O(N)
only increases the leading factor but the higher N becomes the more
irrelevant gets this additional contribution.
On a reversed list, natural merge makes N-1 comparisons to form
N one-node sub-lists. Thereafter, it follows exactly the same pattern
of merges a straight merge would. In the worst case, then, natural
merge makes N-1 more comparisons than straight merge.
Yes, but that's neglicible for an O(N * log(N)) algorithm and improves
it in real cases with partial order.

Jan 28 '07 #46
CBFalconer wrote:
If you want to search a table of some form, a hashtable is the most
efficient, since O(1) beats O(logN) by a wide margin. However you
may want to dump the table in sorted order, which can be awkward.
The cure is to be able to form a list of the table content, sort
it, and dump from that. That is easily implemented by simply
including a single spare pointer in the table items. See the
demonstration programs for hashlib on my site. The hash table has
to have a means of walking its content. The overall process is
O(NlogN).
Thanks for pointing this out! The actual situation is that the
(existing) application program I have in mind already uses linked lists
frequently for typical data structures. I now want to provide
an "improved" implementation and also want to offer an easy way of
sorting it prior to output.

I'll take your suggestion like that: I'll have another spare pointer in
the linked list which can be used to generate a hash table from the
list. This becomes handy if the list is to be searched for individual
entries frequently.
As an extension, you can include two spare pointers and form a
tree. This would probably be the best method when you want to
search for a range, rather than a specific item.
I already have two pointers (doubly linked lists), add another for
hashing the whole thing like suggested above. Maybe I offer a function
that converts the doubly-linked list into a balanced tree. Would be a
nice excercise for red-black trees I suppose.
This brings up another consideration - worst case timing. While
quicksort is slighly faster than mergesort, it has a horrible
O(N*N) worst case. With mergesort, the average run time and the
worst case are both O(NlogN). This again makes it an optimum
choice for real-time systems (not to mention stability). AFAIK no
in-place sort can do better (the in place eliminates radix sort).
You are right, but I usually live with the "worst case timing" of
quicksort, believing that this only appears in rare cases. I am not too
paranoid - it's just that everybody's after me :)

Jan 28 '07 #47
Joerg Schoen wrote:
Eric Sosman wrote:
>Joerg Schoen wrote:
>>I disagree. In my tests, the natural merges are always faster or only
slightly slower. Please note that the additional compares to check
for natural runs is only O(N) which looks affordable to me.
Both N-1 and floor(N/2) are O(N), so Big-Oh doesn't tell enough
of the story.

I don't quite get this. Mergesort is O(N * log(N)), adding another O(N)
only increases the leading factor but the higher N becomes the more
irrelevant gets this additional contribution.
For sufficiently large N, only the leading term and its
coefficient are important. But N is not always "sufficiently
large," and the lower-order terms can be significant or even
dominant when N is small. That's why Quicksort implementations
usually resort to insertion sort for short sub-files: even though
insertion sort is O(N*N), its simplicity typically gives a much
smaller coefficient than that of Quicksort's O(N*ln(N)), so it
winds up being faster for small N.

My interest was not in sorting lists of four million nodes,
but in developing a "general-purpose" utility that works well
over a wide range of list lengths and doesn't behave too badly
if the caller-supplied comparison function is sluggish. The
exercise has given me a renewed appreciation of the difficulties
and compromises faced by the developer of "general-purpose"
routines (and has increased my already healthy skepticism about
the practicality of code re-use). It is likely that I'd have
come to different conclusions if I'd started with a different
notion of how a "general-purpose" list-sorter would be used.

In the tests I ran, one variation of natural merge performed
respectably. But three variations of straight merge (all based
on McCarthy) did just a little better. YMMV.
> On a reversed list, natural merge makes N-1 comparisons to form
N one-node sub-lists. Thereafter, it follows exactly the same pattern
of merges a straight merge would. In the worst case, then, natural
merge makes N-1 more comparisons than straight merge.

Yes, but that's neglicible for an O(N * log(N)) algorithm and improves
it in real cases with partial order.
I once tried to calculate the break-even point, the average
run length where natural merge's shallower tree repaid the up-front
investment of N-1 comparisons. IIRC (this was a while ago, and the
details of the calculation elude me at the moment), the average run
length needed to exceed 3.2 or so for natural merge to win over
straight merge. That may not seem like a lot, but it is extremely
rare in "random" input permutations. Only 15111 of the 362880
permutations of nine distinct keys have three or fewer runs and
hence an average run length of three or more; that's a little less
than 4.2%. For ten keys, only 1.3% of the permutations have three
or fewer runs, and the percentages keep diminishing. For even a
moderate N, very few permutations have fewer than N/3.2 runs.

On the other hand, it must be admitted that real input lists
are probably not "truly random." We're back to the difficult
question of developing general-purpose software for "typical"
cases, without any solid notion of what "typical" is. YM, as
I said before, MV.

--
Eric Sosman
es*****@acm-dot-org.invalid

Jan 28 '07 #48
Joerg Schoen wrote:
CBFalconer wrote:
>If you want to search a table of some form, a hashtable is the most
efficient, since O(1) beats O(logN) by a wide margin. However you
may want to dump the table in sorted order, which can be awkward.
The cure is to be able to form a list of the table content, sort
it, and dump from that. That is easily implemented by simply
including a single spare pointer in the table items. See the
demonstration programs for hashlib on my site. The hash table has
to have a means of walking its content. The overall process is
O(NlogN).

Thanks for pointing this out! The actual situation is that the
(existing) application program I have in mind already uses linked
lists frequently for typical data structures. I now want to provide
an "improved" implementation and also want to offer an easy way of
sorting it prior to output.

I'll take your suggestion like that: I'll have another spare pointer
in the linked list which can be used to generate a hash table from
the list. This becomes handy if the list is to be searched for
individual entries frequently.
Take a look at hashlib. All the mechanisms are in there for
forming a list from the table contents. Once you have formed a
list it will remain, ignoring new additions to the table (but
beware deletions if you destroy the deleted entries, which is up to
you).

The above assumes you are able to use GPLd software. If not, I can
always be contacted for another license.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>
Jan 29 '07 #49
Um, no, you miss the essential point. The uniqueness lies in the way in
which he accesses the data, not in the fact that he is using natural
mergesort. In fact he conditionalizes whether it is natural or not.
I saw that. But that was via the pre-processor.
Whatever way he compiles it, its behaviour does not change at run-time.
I will re-check mind you.
It isn't true that you can't do a natural mergesort for arrays.
Okay, I will go with that. I overstated the case. But lets go over that.

To do natural mergesort with arrays requires an auxilary array where item
records
- the start of a run
- how many elements in the run
as there may be no other place to record that information

And there are 2 algorithms with natural mergesort:
- there is the recursive version
- there is the iterative version

I dont see how to do a recursive version with natural mergesort for arrays.
Because consider the first run - it could be the only run or it could be one
of many.
How many function calls are made recursively depends on the number of runs
and that is not known in advance or as your are going along.

On the iterative version of natural mergesort - you can make 1 pass and
record the index and run length.
After that, you merge adjacent entries and keep doing so until you have 1
run.
Technically the auxilary array could be O(N) in size in everything is in
reverse order.
The overhead of having to maintain an auxiliary array is quite substantial
but probably less than standard merge sort.

Stephen Howe
Jan 29 '07 #50

This thread has been closed and replies have been disabled. Please start a new discussion.