
Hi folks!
Everyone knows how to sort arrays (e. g. quicksort, heapsort etc.)
For linked lists, mergesort is the typical choice.
While I was looking for a optimized implementation of mergesort for
linked lists, I couldn't find one. I read something about Mcilroy's
"Optimistic Merge Sort" and studied some implementation, but they
were for arrays. Does anybody know if Mcilroys optimization is applicable to
truly linked lists at all?
Eventually, I came up with my own implementation (see below) which seems to
perform quite well. The algorithm is my own idea  at least I don't know if
anybody else has come up with this before.
The algorithm is localized. It maintains a stack of sorting length and tries
to merge small pieces as soon as possible. This takes advantage of a CPU
caches.
Here are my questions about it:
* Is my implementation new/genuine or has somebody else come up
* *with this before?
* How does it compare with other optimized implementations (provided there
* *are such)? I would like to hear from better implementations!
* When optimizing quicksort, one usually tries to use insertion sort
* *and the like for SMALL parts. I tried to do the same for the mergesort
* *below, i. e. sort small pieces of length ~ 10 with insertion sort.
* *It actually became slower :( I don't quite understand why.
Thanks for any comments!
Joerg
PS: You can find this implementation at
* * * * http://www.dieSchoens.de/prg/testsort.tgz
 snip 
#include <stddef.h/* *for NULL **/
typedef struct linkedlist_t linkedlist_t;
struct linkedlist_t {
* linkedlist_t *Next;
* int Key;
};
#define NATURAL /* *recognize natural runs **/
linkedlist_t *mergesort(linkedlist_t *list)
{
* struct {
* * linkedlist_t *list;
* * unsigned int size;
#ifdef NATURAL
* * unsigned int level;
#endif
* } stack[sizeof(int) * 8];
* int nstack;
* for(nstack = 1 ; ; ) {
* * linkedlist_t *plist, *qlist, *curr;
* * unsigned int psize, qsize;
* * /* *Can we or do we have to merge lists from the stack? **/
* * if(nstack < 1  (
#ifdef NATURAL
* * * * * * * * * * * *stack[nstack  1].level != stack[nstack].level
#else
* * * * * * * * * * * *stack[nstack  1].size != stack[nstack].size
#endif
* * * * * * * * * * * *&& list != NULL)) {
* * * /* ** *Push element(s) on the stack *** */
* * * if(list == NULL)
* * * * /* *This is where we break the loop. We have nstack == 0 now **/
* * * * break;
* * * /* ** *Push lists of length 1 *** */
* * * curr = list;
* * * psize = 1;
* * * list = list>Next;
#ifdef NATURAL
* * * /* *Check for a natural run **/
* * * for(plist = curr ; list != NULL ; plist = list, list = list>Next,
++psize)
* * * * if(list>Key < plist>Key)
* * * * * break;
* * * /* *Properly terminate the run **/
* * * plist>Next = NULL;
#else
* * * /* *Properly terminate list **/
* * * curr>Next = NULL;
#endif
* * * /* ** *Push this run on the stack *** */
* * * ++nstack;
* * * stack[nstack].list = curr;
* * * stack[nstack].size = psize;
#ifdef NATURAL
* * * stack[nstack].level = 0;
#endif
* * * continue;
* * }
* * /* ** *Merge top two entries from stack *** */
* * plist = stack[nstack].list; psize = stack[nstack].size;
* * nstack;
* * qlist = stack[nstack].list; qsize = stack[nstack].size;
* * /* ** *Set up new stack element. This also simplifies the main loop
below ** */
* * if(qlist>Key < plist>Key) {
* * * curr = qlist;
* * * qlist = qlist>Next; qsize;
* * } else {
* * * curr = plist;
* * * plist = plist>Next; psize;
* * }
* * stack[nstack].list = curr;
* * /* *Number of elements is just the sum **/
* * stack[nstack].size += stack[nstack + 1].size;
#ifdef NATURAL
* * stack[nstack].level++;
#endif
* * /* ** *Here is the main merging loop *** */
* * while(psize 0 && qsize 0) {
* * * if(qlist>Key < plist>Key) {
* * * * curr>Next = qlist; curr = qlist;
* * * * qlist = qlist>Next; * qsize;
* * * } else {
* * * * curr>Next = plist; curr = plist;
* * * * plist = plist>Next;; *psize;
* * * }
* * }
* * /* *Add remainder to result list **/
* * if(psize 0) {
* * * curr>Next = plist;
* * } else {
* * * curr>Next = qlist;
* * }
* }
* /* *Here we treat the case of list == NULL **/
* return nstack == 0 ? stack[nstack].list : NULL;
}
 snap   
Share:

Joerg Schoen wrote:
>
Everyone knows how to sort arrays (e. g. quicksort, heapsort etc.)
For linked lists, mergesort is the typical choice.
While I was looking for a optimized implementation of mergesort for
linked lists, I couldn't find one. I read something about Mcilroy's
"Optimistic Merge Sort" and studied some implementation, but they
were for arrays. Does anybody know if Mcilroys optimization is
applicable to truly linked lists at all?
Eventually, I came up with my own implementation (see below) which
seems to perform quite well. The algorithm is my own idea  at least
I don't know if anybody else has come up with this before.
The algorithm is localized. It maintains a stack of sorting length
and tries to merge small pieces as soon as possible. This takes
advantage of a CPU caches.
.... snip code ...
Too long and complicated for me. There is an example in about 60
lines, including comments, in wdfreq.c, which in turn is an example
application in hashlib.zip. See:
<http://cbfalconer.home.att.net/download/>

<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
"A man who is right every time is not likely to do very much."
 Francis Crick, codiscover of DNA
"There is nothing more amazing than stupidity in action."
 Thomas Matthews   
"Joerg Schoen" <jo************@online.dewrote in message
news:ep**********@online.de...
>
 Is my implementation new/genuine or has somebody else come up
with this before?
It is unlikely that your implementation is unique. Somebody somewhere has
probably done it before. Searching and sorting are very well explored
topics, both theoretically and practically.
But even if your actual implementation of a small aspect of it is unique,
nobody will care, and your place in history is not assured.
Generally, people will only care if it solves a practical problem worth a
lot of money, or adds insight or a solution to a theoretical problem. If
you have a reimplementation of an O(n log n) sorting algorithm, nobody will
care.
Now, if you develop a general O(log n) sorting algorithm ... there will be
more listeners.
Merging using linked lists doesn't seem fundamentally different than merging
using arrays.
Related question: We discussed the "bubblesort" in class today. I have
this new sort where I find the last element instead of the firstI think
I'll call it the "rocksort". Will I get the Nobel prize or something?
Sorting has been well explored. Here is what you're up against: http://en.wikipedia.org/wiki/Category:Sort_algorithms

David T. Ashley (dt*@e3ft.com) http://www.e3ft.com (Consulting Home Page) http://www.dtashley.com (Personal Home Page) http://gpl.e3ft.com (GPL Publications and Projects)   
"David T. Ashley" wrote:
>
.... snip ...
>
Now, if you develop a general O(log n) sorting algorithm ... there
will be more listeners.
They exist. See Sedgewick. Need more hardware though.
>
Merging using linked lists doesn't seem fundamentally different
than merging using arrays.
You use it without disturbing/moving/copying the actual data, and
lists are often the most convenient input mechanism for unknown
volume. In addition mergesort has no worst case, and is stable.

<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
"A man who is right every time is not likely to do very much."
 Francis Crick, codiscover of DNA
"There is nothing more amazing than stupidity in action."
 Thomas Matthews   
David T. Ashley wrote:
But even if your actual implementation of a small aspect of it is unique,
nobody will care, and your place in history is not assured.
Wasn't looking for a "place in history", just for comments on my
implementation.
I disagree that nobody cares. I actually searched around to find useful
practical implementations of mergesorts for linked lists. What I found
were
only textbooks type of implemtations  nice but slow. That's why I
posted
this here to ask for comments and maybe find a better implementation.
Generally, people will only care if it solves a practical problem worth a
lot of money, or adds insight or a solution to a theoretical problem. If
you have a reimplementation of an O(n log n) sorting algorithm, nobody will
care.
Theoretically, the whole "mergesort" stuff is well known and boring.
But still the
question remains how to do in practical terms in an efficient way.
You contradict yourself. One time you state that only theoretical
insights might
interest people or if practical problems are solved. But here I attempt
to solve a
practical problem, namely to have a realworld/efficient implementation
of linked
list mergesort, and you state that "no one cares".
Merging using linked lists doesn't seem fundamentally different than merging
using arrays.
Except for the fact that the implementations are different. One of my
original
questions was if the "optimized merge sort" which is an efficient
implementation
of mergesort also works for linked lists. The only implementation I
found were for
arrays.
Related question: We discussed the "bubblesort" in class today. I have
this new sort where I find the last element instead of the firstI think
I'll call it the "rocksort". Will I get the Nobel prize or something?
It seems you are not having a good day and try to be rude all the time.
Please
comment on the topic, i. e. my code and stop this useless comments.   
CBFalconer wrote:
Too long and complicated for me. There is an example in about 60
lines, including comments, in wdfreq.c, which in turn is an example
application in hashlib.zip. See:
<http://cbfalconer.home.att.net/download/>
Thanks for the comment! Your implementation is basically the one from
Sedgewick's book. It has the disadvantage that you have to run through
the linked lists repeatedly to split it, but it is also localized, i.
e. takes
advantage of caches.   
Joerg Schoen wrote:
 When optimizing quicksort, one usually tries to use insertion sort
and the like for SMALL parts. I tried to do the same for the mergesort
below, i. e. sort small pieces of length ~ 10 with insertion sort.
It actually became slower :( I don't quite understand why.
Using wikipedia's expected numberofcomparisons for randomized quick
sort (1.39 n lg2 n), insertion sort (n^2 / 4), and merge sort (n lg2 n
+ 1  1.26 n) says that for ten elements, quicksort would do 46
comparisons, insertion sort 25 comparisons, and merge sort 22
comparisons.
Also, for a large array, the insertion sort "helper" for quicksort can
be done as a single pass over the entire array at the end (each element
is already close to its final position) meaning the insertion sort
overhead is much less than the quicksort overhead for a bunch of very
small partitions.
For mergesort on a list, your insertionsort helper has overhead almost
as large as the mergesort overhead, since you have to do a bunch of
"little" insertion sorts before you can start merging.   
<jo**@online.dewrote in message
news:11**********************@j27g2000cwj.googlegr oups.com...
David T. Ashley wrote:
It seems you are not having a good day and try to be rude all the time.
Please
comment on the topic, i. e. my code and stop this useless comments.
One has to differentiate different types of contributions to the world:
a)Practical "make life" easier contributions: these are contributions that
are convenient, but that could be replicated by many others. A mergesort
implementation that uses linked lists instead of arrays is in this category.
b)Fundamental contributions: these have a lower probability of being
replicated by others. The original development of quicksort and to a lesser
degree mergesort is in this category.
Your contribution is in category (a) rather than (b). It has value, just
not a lot of value. The key issue is whether anyone seeking the solution
would spend more time searching for and porting your solution rather than
implementing it themselves from scratch. It is tossup.
You might investigate www.snippets.org.
My intention was not to be rude.

David T. Ashley (dt*@e3ft.com) http://www.e3ft.com (Consulting Home Page) http://www.dtashley.com (Personal Home Page) http://gpl.e3ft.com (GPL Publications and Projects)   
David T. Ashley schrieb:
One has to differentiate different types of contributions to the world:
a)Practical "make life" easier contributions: these are contributions that
are convenient, but that could be replicated by many others. A mergesort
implementation that uses linked lists instead of arrays is in this category.
b)Fundamental contributions: these have a lower probability of being
replicated by others. The original development of quicksort and to a lesser
degree mergesort is in this category.
Your contribution is in category (a) rather than (b). It has value, just
not a lot of value. The key issue is whether anyone seeking the solution
would spend more time searching for and porting your solution rather than
implementing it themselves from scratch. It is tossup.
Of course I never thought to be in category (b). Being in (a), the
question for
me was: How to do mergesort on linked list efficiently? The sample code
in
Sedgewick and in other sources didn't look too "efficient" to me, so I
was
looking for a better one and eventually came up with my piece of code.
I don't think that my solution requires porting at all. Besides that,
it is more
efficient than the "standard" solution. Of course this only means
"faster by some
percentage"  it is still O(n * log(n)). I actually want to use it in a
real world
application where I am looking for a reasonable implementation of
mergesort.
You might investigate www.snippets.org.
My intention was not to be rude.
Sorry for misunderstanding you. I am not a native speaker  maybe this
is the
problem.    jo**@online.de wrote:
CBFalconer wrote:
>Too long and complicated for me. There is an example in about 60 lines, including comments, in wdfreq.c, which in turn is an example application in hashlib.zip. See:
<http://cbfalconer.home.att.net/download/>
Thanks for the comment! Your implementation is basically the one from
Sedgewick's book. It has the disadvantage that you have to run through
the linked lists repeatedly to split it, but it is also localized, i.e.
takes advantage of caches.
Well, I read Sedgewick long ago :). However I fail to see how you
can do a merge without first splitting.

<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
"A man who is right every time is not likely to do very much."
 Francis Crick, codiscover of DNA
"There is nothing more amazing than stupidity in action."
 Thomas Matthews   
CBFalconer <cb********@yahoo.comwrites:
Too long and complicated for me. There is an example in about 60
lines, including comments, in wdfreq.c, which in turn is an example
application in hashlib.zip. See:
<http://cbfalconer.home.att.net/download/>
As long as we're posting links to linked list merge sorts, my
implementation is at http://cvs.savannah.gnu.org/viewcvs/...pp&view=markup
in function ll_sort().
It's optimized for readability.
CBFalconer wrote in a later message:
Well, I read Sedgewick long ago :). However I fail to see how you
can do a merge without first splitting.
My implementation doesn't do any splitting either. At least, I
don't think it does; you can tell me if it contains something
that you'd interpret to be splitting.

"Welcome to the wonderful world of undefined behavior, where the demons
are nasal and the DeathStation users are nervous." Daniel Fox   
On Mon, 22 Jan 2007 21:26:44 +0100, Joerg Schoen
<jo************@online.dewrote:
>Hi folks!
Everyone knows how to sort arrays (e. g. quicksort, heapsort etc.) For linked lists, mergesort is the typical choice.
While I was looking for a optimized implementation of mergesort for linked lists, I couldn't find one. I read something about Mcilroy's "Optimistic Merge Sort" and studied some implementation, but they were for arrays. Does anybody know if Mcilroys optimization is applicable to truly linked lists at all?
Eventually, I came up with my own implementation (see below) which seems to perform quite well. The algorithm is my own idea  at least I don't know if anybody else has come up with this before.
The algorithm is localized. It maintains a stack of sorting length and tries to merge small pieces as soon as possible. This takes advantage of a CPU caches.
Here are my questions about it:
* Is my implementation new/genuine or has somebody else come up * *with this before?
* How does it compare with other optimized implementations (provided there * *are such)? I would like to hear from better implementations!
* When optimizing quicksort, one usually tries to use insertion sort * *and the like for SMALL parts. I tried to do the same for the mergesort * *below, i. e. sort small pieces of length ~ 10 with insertion sort.
* *It actually became slower :( I don't quite understand why.
Thanks for any comments!
Joerg
[snip code]
Here are some suggestions: First of all, the proper newsgroup probably
should be comp.programming. Secondly, I would much prefer a more
extensive description of the actual algorithm. Your description is
quite vague; seemingly, the only way to understand what you are actually
doing is to read the code. Thirdly, your code is hard to read (in my
opinion, of course.) As a general rule code is easier to read if things
are cleanly indented and well lined up.
Here are some of the things that I find make it difficult to read.
First of all, in code for presentation it reads better if you get rid of
the ifdefs and replace them by normal tests on flags. As it is your
preprocessor tests break into the flow of the code.
Secondly I don't much like interspersing single line comments with code.
I suggest prefixing blocks of code with comment blocks that describe the
intent of the entire block. If do want to comment upon a single line I
prefer to have a sidebar comment. Stream of consciousness commenting is
helpful to the author but not to the reader.
Thirdly, my preference is that each variable be declared on a separate
line with a sidebar comment describing the variable. When someone else
is reading the code they don't have to puzzle out what each name means.
As a strictly C note, experience says do this
* * * if(list == NULL) break; /* Single statement if */
or this
* * * if(list == NULL) { /* If controlling block */
* * * * break;
}
but not this
* * * if(list == NULL) /* This is a maintenance trap */
* * * * break;
As a note on mergesorting linked list data, you don't need to scan the
list to get its length. Instead sort the first two elements. You now
have a sorted list of length 2. Sort the next two elements to get a
second sorted list; merge it with the first. You now have a sorted list
of length 4. Sort the next four elements and merge to get a list of
length 8. Repeat as needed. Season to taste.
I'm not sure how this compares with your algorithm because I haven't sat
down to figure out exactly what you're doing; however your algorithm
looks interesting.    cr*@tiac.net (Richard Harter) writes:
As a note on mergesorting linked list data, you don't need to scan the
list to get its length. Instead sort the first two elements. You now
have a sorted list of length 2. Sort the next two elements to get a
second sorted list; merge it with the first. You now have a sorted list
of length 4. Sort the next four elements and merge to get a list of
length 8. Repeat as needed. Season to taste.
Sure, that'll work, but it's somewhat brute force. I think that
a "natural" merge sort makes more sense: take the first two runs
of ascending values and merge them to produce the first output
run, and repeat that process until you've consumed the input.
Then start over from the top, until there's only a single output
run.

"I hope, some day, to learn to read.
It seems to be even harder than writing."
Richard Heathfield   
On Tue, 23 Jan 2007 12:56:02 0800, Ben Pfaff <bl*@cs.stanford.edu>
wrote:
>cr*@tiac.net (Richard Harter) writes:
>As a note on mergesorting linked list data, you don't need to scan the list to get its length. Instead sort the first two elements. You now have a sorted list of length 2. Sort the next two elements to get a second sorted list; merge it with the first. You now have a sorted list of length 4. Sort the next four elements and merge to get a list of length 8. Repeat as needed. Season to taste.
Sure, that'll work, but it's somewhat brute force. I think that a "natural" merge sort makes more sense: take the first two runs of ascending values and merge them to produce the first output run, and repeat that process until you've consumed the input. Then start over from the top, until there's only a single output run.
A natural merge sort might make more sense but it's not an improvement,
or at least it isn't unless you have something clever up your sleeve.
See the discussion in Knuth on merge sorts. Natural gains when runs are
long and few and loses when runs are short and many. Offhand your
version is going to much more expensive (though still O(n log n))
because you have to rescan the runs.   
Ben Pfaff wrote: cr*@tiac.net (Richard Harter) writes:
>As a note on mergesorting linked list data, you don't need to scan the list to get its length. Instead sort the first two elements. You now have a sorted list of length 2. Sort the next two elements to get a second sorted list; merge it with the first. You now have a sorted list of length 4. Sort the next four elements and merge to get a list of length 8. Repeat as needed. Season to taste.
That's basically what my code is doing, but it also takes advantage of
natural runs in the input! Richard Harter is right in noticing that I
should have described the algorithm more detailed. Sorry! I'll also improve
my indentation next time. Here now is a detailed explanation:
I break down the linked lists into natural runs (only if preprocessor
symbol "NATURAL" is defined). After collecting the first two runs, I merge
them into a bigger run. Then I collect the next two runs, merge them into a
bigger one. On the stack, I now have two bigger runs which I merge again.
Here is how the stack develops (separated letters denote different runs,
adjacent letters means merged runs, the comment at the end of the line
describes what happens before the next line):
# collect run
a # collect run
a b # merge
ab # collect run
ab c # collect run
ab c d # merge
ab cd # merge
abcd # collect run
abcd e # collect run
abcd e f # merge
abcd ef # collect run
abcd ef g # collect run
abcd ef g h # merge
abcd ef gh # merge
abcd efgh # merge
abcdefgh
....
The stack contains the start of the each run which is a NULL terminated
linked list and also the length of it. So there is enough information on
the stack to know the start and length of each run. Compared to other
implementations of mergesort the my implementation doesn't run through the
list a given number of steps just to find the start of the next run.
What I like about the implementation is that there is not much need for
treating special cases like ending lists when merging etc. That sounds
easy, but try to do it yourself and you see that it's a bit tricky!
Sure, that'll work, but it's somewhat brute force. I think that
a "natural" merge sort makes more sense: take the first two runs
of ascending values and merge them to produce the first output
run, and repeat that process until you've consumed the input.
Then start over from the top, until there's only a single output
run.
That is exactly the governing idea in my own implementation: Improve
mergesort by recognizing natural runs in the input.
There are two not so obvious problems with Ben's description:
One thing is that you need to call your comparison function very frequently
just to recognize runs you could already know about otherwise. Assume you
have just merged two runs of length 1000 and 2000 to a run of length 3000.
When you go over the list from the top, you need another 2999 comparisons
just to recognize the run of size 3000.
Note that this is easier in the simpler implementation cited by Richard: The
length of the runs is known to be exactly 2**n in the nth iteration of the
outer loop.
The second problem in Ben's approach is nonlocality: You have to run
repeatedly through the WHOLE input and each time increase the size of the
runs. This will render your CPU cache rather useless. In the test program
on my webpage ( http://www.dieschoens.de/prg/testsort.tgz), I compare the
localized version with a nonlocalized one. The runtime of the localized
one is better by 35%! That's seems relevant to me for real world
application!
Of course, as said before, the algorithm is O(n * log(n)), we are just
talking about minimizing the constant factor in front of this.   
Ben Pfaff wrote:
CBFalconer <cb********@yahoo.comwrites:
.... snip ...
>http://cvs.savannah.gnu.org/viewcvs/...pp&view=markup
in function ll_sort().
>
.... snip ...
>
>Well, I read Sedgewick long ago :). However I fail to see how you can do a merge without first splitting.
My implementation doesn't do any splitting either. At least, I
don't think it does; you can tell me if it contains something
that you'd interpret to be splitting.
To merge something you need two things to merge. You start with
one list. I can imagine a technique where you use the end of a run
to form the end of a split, and then try to merge with the
remainder. I don't think such an algorithm will be nlogn,
especially in the worst case, and preserving stability will require
care. List splitting has a very taut inner loop.
I haven't looked at your implementation so far.

<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
"A man who is right every time is not likely to do very much."
 Francis Crick, codiscover of DNA
"There is nothing more amazing than stupidity in action."
 Thomas Matthews    cr*@tiac.net (Richard Harter) writes:
On Tue, 23 Jan 2007 12:56:02 0800, Ben Pfaff <bl*@cs.stanford.edu>
wrote:
>>Sure, that'll work, but it's somewhat brute force. I think that a "natural" merge sort makes more sense: take the first two runs of ascending values and merge them to produce the first output run, and repeat that process until you've consumed the input. Then start over from the top, until there's only a single output run.
A natural merge sort might make more sense but it's not an improvement,
or at least it isn't unless you have something clever up your sleeve.
See the discussion in Knuth on merge sorts. Natural gains when runs are
long and few and loses when runs are short and many. Offhand your
version is going to much more expensive (though still O(n log n))
because you have to rescan the runs.
I'll keep that in mind as I have the need for optimization in
future, thanks for pointing it out.

"It would be a much better example of undefined behavior
if the behavior were undefined."
Michael Rubenstein   
On Tue, 23 Jan 2007 23:08:27 +0100, Joerg Schoen
<jo************@online.dewrote:
>Ben Pfaff wrote:
>cr*@tiac.net (Richard Harter) writes:
>>As a note on mergesorting linked list data, you don't need to scan the list to get its length. Instead sort the first two elements. You now have a sorted list of length 2. Sort the next two elements to get a second sorted list; merge it with the first. You now have a sorted list of length 4. Sort the next four elements and merge to get a list of length 8. Repeat as needed. Season to taste.
That's basically what my code is doing, but it also takes advantage of natural runs in the input! Richard Harter is right in noticing that I should have described the algorithm more detailed. Sorry! I'll also improve my indentation next time. Here now is a detailed explanation:
I break down the linked lists into natural runs (only if preprocessor symbol "NATURAL" is defined). After collecting the first two runs, I merge them into a bigger run. Then I collect the next two runs, merge them into a bigger one. On the stack, I now have two bigger runs which I merge again.
Here is how the stack develops (separated letters denote different runs, adjacent letters means merged runs, the comment at the end of the line describes what happens before the next line):
# collect run a # collect run a b # merge ab # collect run ab c # collect run ab c d # merge ab cd # merge abcd # collect run abcd e # collect run abcd e f # merge abcd ef # collect run abcd ef g # collect run abcd ef g h # merge abcd ef gh # merge abcd efgh # merge abcdefgh ...
The stack contains the start of the each run which is a NULL terminated linked list and also the length of it. So there is enough information on the stack to know the start and length of each run. Compared to other implementations of mergesort the my implementation doesn't run through the list a given number of steps just to find the start of the next run.
What I like about the implementation is that there is not much need for treating special cases like ending lists when merging etc. That sounds easy, but try to do it yourself and you see that it's a bit tricky!
>Sure, that'll work, but it's somewhat brute force. I think that a "natural" merge sort makes more sense: take the first two runs of ascending values and merge them to produce the first output run, and repeat that process until you've consumed the input. Then start over from the top, until there's only a single output run.
That is exactly the governing idea in my own implementation: Improve mergesort by recognizing natural runs in the input.
There are two not so obvious problems with Ben's description:
One thing is that you need to call your comparison function very frequently just to recognize runs you could already know about otherwise. Assume you have just merged two runs of length 1000 and 2000 to a run of length 3000. When you go over the list from the top, you need another 2999 comparisons just to recognize the run of size 3000.
Note that this is easier in the simpler implementation cited by Richard: The length of the runs is known to be exactly 2**n in the nth iteration of the outer loop.
The second problem in Ben's approach is nonlocality: You have to run repeatedly through the WHOLE input and each time increase the size of the runs. This will render your CPU cache rather useless. In the test program on my webpage (http://www.dieschoens.de/prg/testsort.tgz), I compare the localized version with a nonlocalized one. The runtime of the localized one is better by 35%! That's seems relevant to me for real world application!
Of course, as said before, the algorithm is O(n * log(n)), we are just talking about minimizing the constant factor in front of this.
Well done. Actually the algorithm I described is is closer to your
algorithm than to Ben's. IIANM Ben's is actually O(n^2) worse case if I
am not missing a fine point. An implementation of my algorithm has a
little gotcha  you have to be careful about recognizing the end of
data. I shall have to think about it a bit, but I think both can be
written nicely as recursive routines.    cr*@tiac.net (Richard Harter) writes:
Well done. Actually the algorithm I described is is closer to your
algorithm than to Ben's. IIANM Ben's is actually O(n^2) worse case if I
am not missing a fine point.
I don't think my code has an O(n^2) worst case, even if it is
slower than an "unnatural" merge sort. What do you think could
trigger an O(n^2) case in my code?

"Programmers have the right to be ignorant of many details of your code
and still make reasonable changes."
Kernighan and Plauger, _Software Tools_   
Joerg Schoen wrote:
>
.... snip ...
>
Of course, as said before, the algorithm is O(n * log(n)), we are
just talking about minimizing the constant factor in front of this.
I think that you will find that an initially reverse sorted list
will eat up stack and time. You are doing a lot of copying, which
is totally unnecessary. Remember, records can be big.

"The mere formulation of a problem is far more often essential
than its solution, which may be merely a matter of mathematical
or experimental skill. To raise new questions, new possibilities,
to regard old problems from a new angle requires creative
imagination and and marks real advances in science."
 Albert Einstein   
On Tue, 23 Jan 2007 14:54:09 0800, Ben Pfaff <bl*@cs.stanford.edu>
wrote:
>cr*@tiac.net (Richard Harter) writes:
>Well done. Actually the algorithm I described is is closer to your algorithm than to Ben's. IIANM Ben's is actually O(n^2) worse case if I am not missing a fine point.
I don't think my code has an O(n^2) worst case, even if it is slower than an "unnatural" merge sort. What do you think could trigger an O(n^2) case in my code?
I think I take it back; there's an ambiguity in the wording of your
description. If you eat the runs one at a time the worst case is O(n^2)
with a reversed list. Upon reflection I suspect that what you meant was
to each a pair of runs, then another pair of runs, etc. That would
indeed be O(n log n). You didn't specify what you were doing with your
output runs. If you simply plop them back in place, that's expensive.
However you can use a stack (that's what Joerg Schoen is doing), or with
a bit of cheap cleverness you can do a clean recursive routine.   
On Tue, 23 Jan 2007 18:12:00 0500, CBFalconer <cb********@yahoo.com>
wrote:
>Joerg Schoen wrote:
>>
... snip ...
>> Of course, as said before, the algorithm is O(n * log(n)), we are just talking about minimizing the constant factor in front of this.
I think that you will find that an initially reverse sorted list will eat up stack and time. You are doing a lot of copying, which is totally unnecessary. Remember, records can be big.
Perhaps you've forgotten that he is talking about sorting linked lists.
There is no copying involved here, just changing links. Moreover the
maximum stack depth is O(log n). It is true that a initially reverse
sorted list is worst case for an ascending runs mergesort; however the
penalty lies in excessive comparisons.   
On 24 Jan., 00:12, CBFalconer <cbfalco...@yahoo.comwrote:
will eat up stack and time. You are doing a lot of copying, which
is totally unnecessary. Remember, records can be big.
As Richard pointed out, I am not doing copying at all, just changing
links. In the test program on my page, I am comparing the mergesort
with other implementations. I use (1) random numbers (2) random numbers
with each value appearing multiple times (3) already sorted list (4)
inverse
sorted list.
It turns out that the implementation behaves well in all cases: It
excells in case (3)
since it recognizes the whole input as one big run and is done. Case
(4) is
"worst case" for the one which tries to recognize natural runs. It is
by ~ 1%
or so slower than the one that collects runs of length 1 and then
doubles on each
iteration. The reason is that the recognition of natural runs only is
done ONCE
over the input, so the "added penalty" is O(n) and the factor is only
the cost of
a comparison.
When time permits (during the weekend most likely), I will put Ben's
implementation
and a "standard text book" one for comparison.   
On 24 Jan., 01:17, c...@tiac.net (Richard Harter) wrote:
On Tue, 23 Jan 2007 14:54:09 0800, Ben Pfaff <b...@cs.stanford.edu>
wrote:
However you can use a stack (that's what Joerg Schoen is doing), or with
a bit of cheap cleverness you can do a clean recursive routine.
Triggered by this discussion I realized that I can still improve the
code and
get rid of keeping the lengths of the runs on the stack. So well, here
is my
"improved version". I have also increased the indentation, so it should
be
better readable. Moreover, the preprocessor stuff only appears once
now.
I still would like to know how to do it with a recursive routine,
currently I have
no idea about that.
 snip 
#include <stddef.h/* for NULL */
typedef struct linkedlist_t linkedlist_t;
struct linkedlist_t {
linkedlist_t *Next;
int Key;
};
#define NATURAL /* recognize natural runs */
linkedlist_t *mergesort(linkedlist_t *list)
{
struct {
linkedlist_t *list;
unsigned int level;
} stack[sizeof(int) * 8];
int nstack;
for(nstack = 1 ; ; ) {
linkedlist_t *plist, *qlist, *curr;
/* Can we or do we have to merge lists from the stack? */
if(nstack < 1  (stack[nstack  1].level !=
stack[nstack].level && list != NULL)) {
/* ** Push element(s) on the stack ** */
if(list == NULL)
/* This is where we break the loop. We have nstack ==
0 now */
break;
/* ** Push lists of length 1 ** */
curr = list;
list = list>Next;
#ifdef NATURAL
/* Check for a natural run */
for(plist = curr ; list != NULL ; plist = list, list =
list>Next)
if(list>Key < plist>Key)
break;
/* Properly terminate the run */
plist>Next = NULL;
#else
/* Properly terminate list */
curr>Next = NULL;
#endif
/* ** Push this run on the stack ** */
++nstack;
stack[nstack].list = curr;
stack[nstack].level = 0;
continue;
}
/* ** Merge top two entries from stack ** */
plist = stack[nstack].list;
nstack;
qlist = stack[nstack].list;
/* ** Set up new stack element. This also simplifies the main
loop below ** */
if(qlist>Key < plist>Key) {
curr = qlist; qlist = qlist>Next;
} else {
curr = plist; plist = plist>Next;
}
stack[nstack].list = curr;
stack[nstack].level++;
/* ** Here is the main merging loop ** */
while(plist != NULL && qlist != NULL) {
if(qlist>Key < plist>Key) {
curr>Next = qlist; curr = qlist; qlist = qlist>Next;
} else {
curr>Next = plist; curr = plist; plist = plist>Next;
}
}
/* Add remainder to result list */
curr>Next = plist != NULL ? plist : qlist;
}
/* Here we treat the case of list == NULL */
return nstack == 0 ? stack[nstack].list : NULL;
}
 snap    
Richard Harter wrote:
CBFalconer <cb********@yahoo.comwrote:
>Joerg Schoen wrote:
... snip ...
>>> Of course, as said before, the algorithm is O(n * log(n)), we are just talking about minimizing the constant factor in front of this.
I think that you will find that an initially reverse sorted list will eat up stack and time. You are doing a lot of copying, which is totally unnecessary. Remember, records can be big.
Perhaps you've forgotten that he is talking about sorting linked lists.
There is no copying involved here, just changing links. Moreover the
maximum stack depth is O(log n). It is true that a initially reverse
sorted list is worst case for an ascending runs mergesort; however the
penalty lies in excessive comparisons.
He is talking about building stacks, which involves copying
something somewhere. A normal mergesort is basically iterative,
and only diddles pointers. The stack depth is at most log N. At
least as I recall his description.

<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
"A man who is right every time is not likely to do very much."
 Francis Crick, codiscover of DNA
"There is nothing more amazing than stupidity in action."
 Thomas Matthews   
On 24 Jan., 09:51, CBFalconer <cbfalco...@yahoo.comwrote:
He is talking about building stacks, which involves copying
something somewhere.
Well, if you call setting variables "copying", I agree that I do copy
data.
But then every program does. The stack in my implementation only
contains a pointer to a list and a stack level, nothing more. No actual
"data" is copied.
A normal mergesort is basically iterative,
and only diddles pointers. The stack depth is at most log N. At
least as I recall his description.
There are recursive implementations and iterative ones of mergesort.
As detailed above, the pure iterative ones are repeatedly running
through
the whole list and have the disadvantage of not being localized thus
not taking advantage of a memory cache. Moreover, there is a lot of
"idle running along the list" in it.
The recursive version of mergesort is localized, but again needs some
idle running.   
Ben Pfaff wrote:
>
CBFalconer <cb********@yahoo.comwrites:
Too long and complicated for me. There is an example in about 60
lines, including comments, in wdfreq.c, which in turn is an example
application in hashlib.zip. See:
<http://cbfalconer.home.att.net/download/>
As long as we're posting links to linked list merge sorts, my
implementation is at http://cvs.savannah.gnu.org/viewcvs/...pp&view=markup
in function ll_sort().
It's optimized for readability.
This is what I use:
/* BEGIN list_type.h */
#ifndef H_LIST_TYPE_H
#define H_LIST_TYPE_H
struct list_node {
struct list_node *next;
void *data;
};
typedef struct list_node list_type;
list_type *list_sort(list_type *head,
int (*compar)(const list_type *, const list_type *));
#endif
/* END list_type.h */
/* BEGIN list_type.c */
#include <stddef.h>
#include "list_type.h"
static int list_sorted(list_type *list,
int (*compar)(const list_type *, const list_type *));
static long unsigned node_count(list_type *head);
static list_type *node_sort (list_type *head, long unsigned count,
int (*compar)(const list_type *, const list_type *));
static list_type *list_split(list_type *head, long unsigned count);
static list_type *list_merge(list_type *head, list_type *tail,
int (*compar)(const list_type *, const list_type *));
list_type *list_sort(list_type *head,
int (*compar)(const list_type *, const list_type *))
{
return
!list_sorted(head, compar)
? node_sort(head, node_count(head), compar)
: head;
}
static int list_sorted(list_type *list,
int (*compar)(const list_type *, const list_type *))
{
if (list != NULL) {
while (list next != NULL) {
if (compar(list, list next) 0) {
break;
}
list = list next;
}
}
return list == NULL  list next == NULL;
}
static long unsigned node_count(list_type *head)
{
long unsigned count;
for (count = 0; head != NULL; head = head next) {
++count;
}
return count;
}
static list_type *node_sort(list_type *head, long unsigned count,
int (*compar)(const list_type *, const list_type *))
{
long unsigned half;
list_type *tail;
if (count 1) {
half = count / 2;
tail = list_split(head, half);
tail = node_sort(tail, count  half, compar);
head = node_sort(head, half, compar);
head = list_merge(head, tail, compar);
}
return head;
}
static list_type *list_split(list_type *head, long unsigned count)
{
list_type *tail;
while (count != 0) {
head = head next;
}
tail = head next;
head next = NULL;
return tail;
}
static list_type *list_merge(list_type *head, list_type *tail,
int (*compar)(const list_type *, const list_type *))
{
list_type *list, *sorted, **node;
node = compar(head, tail) 0 ? &tail : &head;
sorted = list = *node;
*node = sorted next;
while (*node != NULL) {
node = compar(head, tail) 0 ? &tail : &head;
sorted next = *node;
sorted = *node;
*node = sorted next;
}
sorted next = head != NULL ? head : tail;
return list;
}
/* END list_type.c */

pete   
Joerg Schoen wrote:
>
Hi folks!
Everyone knows how to sort arrays (e. g. quicksort, heapsort etc.)
For linked lists, mergesort is the typical choice.
While I was looking for a optimized implementation of mergesort for
linked lists, I couldn't find one.
 snip 
#include <stddef.h/* for NULL */
typedef struct linkedlist_t linkedlist_t;
struct linkedlist_t {
linkedlist_t *Next;
int Key;
};
#define NATURAL /* recognize natural runs */
linkedlist_t *mergesort(linkedlist_t *list)
The only lists that I have ever been paid to sort,
were lists of text that needed to be sorted into alphabetical order.

pete    jo**@online.de wrote:
>
.... snip ...
>
Triggered by this discussion I realized that I can still improve
the code and get rid of keeping the lengths of the runs on the
stack. So well, here is my "improved version". I have also
increased the indentation, so it should be better readable.
Moreover, the preprocessor stuff only appears once now.
I still would like to know how to do it with a recursive routine,
currently I have no idea about that.
I suggest you pack this up with your earlier algorithm
documentation and post it somewhere in a zip or tgz file.
Alternatively (and poorer) post them together here in one article.

<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
"A man who is right every time is not likely to do very much."
 Francis Crick, codiscover of DNA
"There is nothing more amazing than stupidity in action."
 Thomas Matthews   
Richard Harter wrote:
See the discussion in Knuth on merge sorts.
Natural gains when runs are
long and few and loses when runs are short and many. Offhand your
version is going to much more expensive (though still O(n log n))
because you have to rescan the runs.
When I've done timed tests and comparison counts
for mergesorting lists,
I've found that for random order lists,
that checking for ordered sublists, is bad.
In my experience, when I have had to sort lists,
the situations have been that
the lists had a high probability of already being sorted,
so, I check them once to see if they're sorted and
then if they're not, I don't look for sorted subsections any more.
On the other hand, when quicksorting arrays of integers,
in timed tests I have found
that checking for ordered subarrays, pays off.

pete   
comp.programming added
On 24 Jan 2007 00:44:09 0800, jo**@online.de wrote:
>On 24 Jan., 01:17, c...@tiac.net (Richard Harter) wrote:
>On Tue, 23 Jan 2007 14:54:09 0800, Ben Pfaff <b...@cs.stanford.edu> wrote: However you can use a stack (that's what Joerg Schoen is doing), or with a bit of cheap cleverness you can do a clean recursive routine.
Triggered by this discussion I realized that I can still improve the code and get rid of keeping the lengths of the runs on the stack. So well, here is my "improved version". I have also increased the indentation, so it should be better readable. Moreover, the preprocessor stuff only appears once now.
I still would like to know how to do it with a recursive routine, currently I have no idea about that.
As a note I don't see this algorithm in Knuth. IMHO it is quite elegant
and has applications beyond mergesorting. I doubt that it is new; it is
too simple. However I've never seen it in an algorithms text. I
propose to call the general idea Schoen's algorithm.
Here is the outline of the iterative/recursive version in pseudocode.
There are two routines, alpha and beta. Alpha is the iterative outer
loop that is executed log n times. Beta is the recursive part; it
implements the mergesort algorithm as outlined by each of us.
/* each call to beta consumes exponentially growing pieces of L,
sorts them. and adds them to the output.
*/
function alpha (list L) returns list
level = 0
out = nil
while (L)
append beta(level,L) to out
level ++
end while
return out
/* At level k 0 beta makes two calls to beta at k1, merges the
result,
and returns the merged list. At level 0 it extracts two runs from L,
merges them, and returns the merged list. At any point if there is a
nil return from the first call/extract the second call/extract is not
made.
*/
function beta (int level, list L) returns list
if (not L) return nil
if (level <= 0)
run_1 = extract(L)
if (not L) return run_1
run_2 = extract(L)
else
run_1 = beta(level1, L)
run_2 = beta(level1, L)
return merge(run_1,run_2)
The extract function could either extract an ascending run or a
single item from L; the extracted items are removed from L. The
merge function merges two sorted lists and returns the merged list.
One important point in this pseudo code is that L is global to the
various functions. How to avoid this is, AFICT, language dependent.
In procedural languages it is probably best to manually simulate
recursion as (in effect) Joerg Schoen did. I'm not sure exactly how to
implement it in a functional language, but it should be easy.   
Eventually, I came up with my own implementation (see below) which seems
to
perform quite well. The algorithm is my own idea  at least I don't know
if
anybody else has come up with this before.
It is known. This is natural mergesort. Robert Sedgewick talked about it in
one of his books.
You can't do a natural mergesort for arrays but it is dead simple for linked
lists.
You exploit any preexisting order anywhere. I would be exceptionally
exceptionally surprised if it is not discussed in Kuth's 3rd volume
somewhere.
Best of all, the sort will be O(N) if it is already sorted and it does well
on data that has partial orders.
The worst case for natural mergesort is if the data is in reverse order to
start with. There are no "runs" then.
I would be tempted to do a hybrid where all runs were at least 4 elements no
matter what.
This would be to guard against the reverse order distribution to start with.
Stephen Howe   
On Thu, 25 Jan 2007 19:38:12 0000, "Stephen Howe"
<sjhoweATdialDOTpipexDOTcomwrote:
>Eventually, I came up with my own implementation (see below) which seems
to
>perform quite well. The algorithm is my own idea  at least I don't know
if
>anybody else has come up with this before.
It is known. This is natural mergesort. Robert Sedgewick talked about it in one of his books. You can't do a natural mergesort for arrays but it is dead simple for linked lists. You exploit any preexisting order anywhere. I would be exceptionally exceptionally surprised if it is not discussed in Kuth's 3rd volume somewhere.
Um, no, you miss the essential point. The uniqueness lies in the way in
which he accesses the data, not in the fact that he is using natural
mergesort. In fact he conditionalizes whether it is natural or not. It
isn't true that you can't do a natural mergesort for arrays.
> Best of all, the sort will be O(N) if it is already sorted and it does well on data that has partial orders. The worst case for natural mergesort is if the data is in reverse order to start with. There are no "runs" then. I would be tempted to do a hybrid where all runs were at least 4 elements no matter what. This would be to guard against the reverse order distribution to start with.
Stephen Howe
  
CBFalconer wrote:
jo**@online.de wrote:
>> Thanks for the comment! Your implementation is basically the one from Sedgewick's book. It has the disadvantage that you have to run through the linked lists repeatedly to split it, but it is also localized, i.e. takes advantage of caches.
Well, I read Sedgewick long ago :). However I fail to see how you
can do a merge without first splitting.
Split and merge as you go. See TAOCP (vol III) exercise
5.2.417, where the idea is credited to John McCarthy. The
basic McCarthy method has peaks of inefficiency when the list
length is one greater than a power of two (more generally, when
the binary representation of the node count has 1bits that are
separated by runs of 0bits), but I've found[*] a simple way to
fix them without losing much speed in the nicer cases. Code
available on request.
[*] I also "found" McCarthy's method for myself before
reading it in Knuth, and for all I know others have found "my"
improvement before I did. The original is a truly pretty
algorithm, a real "Why didn't I think of that?" piece of work.
Highly recommended!

Eric Sosman es*****@acmdotorg.invalid   
Eric Sosman wrote:
CBFalconer wrote:
>jo**@online.de wrote:
>>> Thanks for the comment! Your implementation is basically the one from Sedgewick's book. It has the disadvantage that you have to run through the linked lists repeatedly to split it, but it is also localized, i.e. takes advantage of caches.
Well, I read Sedgewick long ago :). However I fail to see how you can do a merge without first splitting.
Split and merge as you go. See TAOCP (vol III) exercise
5.2.417, where the idea is credited to John McCarthy. The
basic McCarthy method has peaks of inefficiency when the list
length is one greater than a power of two (more generally, when
the binary representation of the node count has 1bits that are
separated by runs of 0bits), but I've found[*] a simple way to
fix them without losing much speed in the nicer cases. Code
available on request. [*] I also "found" McCarthy's method for myself before
reading it in Knuth, and for all I know others have found "my"
improvement before I did. The original is a truly pretty
algorithm, a real "Why didn't I think of that?" piece of work.
Highly recommended!
My Knuth is buried in boxes somewhere. Sounds as if the split
mechanism is something like take alternate entries, and my instinct
tells me that this will disturb the stability of the sort, which in
turn is one of the most attractive features of mergesort.

<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
"A man who is right every time is not likely to do very much."
 Francis Crick, codiscover of DNA
"There is nothing more amazing than stupidity in action."
 Thomas Matthews   
On Fri, 26 Jan 2007 09:42:06 0500, Eric Sosman
<es*****@acmdotorg.invalidwrote:
>CBFalconer wrote:
>jo**@online.de wrote:
>>> Thanks for the comment! Your implementation is basically the one from Sedgewick's book. It has the disadvantage that you have to run through the linked lists repeatedly to split it, but it is also localized, i.e. takes advantage of caches.
Well, I read Sedgewick long ago :). However I fail to see how you can do a merge without first splitting.
Split and merge as you go. See TAOCP (vol III) exercise 5.2.417, where the idea is credited to John McCarthy. The basic McCarthy method has peaks of inefficiency when the list length is one greater than a power of two (more generally, when the binary representation of the node count has 1bits that are separated by runs of 0bits), but I've found[*] a simple way to fix them without losing much speed in the nicer cases. Code available on request. [*] I also "found" McCarthy's method for myself before reading it in Knuth, and for all I know others have found "my" improvement before I did. The original is a truly pretty algorithm, a real "Why didn't I think of that?" piece of work. Highly recommended!
Good catch. I knew it had to be in Knuth somewhere, but some how I
missed it. That is what Schoen is doing, modulo minor variations.   
CBFalconer wrote:
Eric Sosman wrote:
>CBFalconer wrote:
>>jo**@online.de wrote: Thanks for the comment! Your implementation is basically the one from Sedgewick's book. It has the disadvantage that you have to run through the linked lists repeatedly to split it, but it is also localized, i.e. takes advantage of caches. Well, I read Sedgewick long ago :). However I fail to see how you can do a merge without first splitting.
Split and merge as you go. See TAOCP (vol III) exercise 5.2.417, where the idea is credited to John McCarthy. The basic McCarthy method has peaks of inefficiency when the list length is one greater than a power of two (more generally, when the binary representation of the node count has 1bits that are separated by runs of 0bits), but I've found[*] a simple way to fix them without losing much speed in the nicer cases. Code available on request.
[*] I also "found" McCarthy's method for myself before reading it in Knuth, and for all I know others have found "my" improvement before I did. The original is a truly pretty algorithm, a real "Why didn't I think of that?" piece of work. Highly recommended!
My Knuth is buried in boxes somewhere. Sounds as if the split
mechanism is something like take alternate entries, and my instinct
tells me that this will disturb the stability of the sort, which in
turn is one of the most attractive features of mergesort.
Unbury. "Take alternate entries" just changes the pattern
of splitting (unstably), not the number of split passes. It
makes the same number of node visits (= link traversals) as
"split in the middle" does.
The nice thing about McCarthy's "split mechanism" is that it
makes one count them one pass over the full list instead of the
lg(N) passes of "split in the middle" or "alternate entries."
The resulting sort is stable (assuming you don't do anything dumb).
The unnice thing about McCarthy's method is that after
the initial phase of perfectlybalanced poweroftwo merges
you're left with a cleanup phase in which the merges are not
so well balanced. Sorting a 1025node list, for example, uses
perfectlybalanced merges to sort the first 1024 items, then
performs a 1024plus1 merge to insert the final node: 512.9990+
comparisons (average) to choose among just 1026 possible outcomes.
Inserting another "stage" into McCarthy produces
one list each of 512, 256, ..., 4 nodes
two lists of 2 nodes
one list of 1 node
.... just before the cleanup phase. The cleanup merges are then
2+1, 2+3, 4+5, 8+9, ..., 512+513, each as close to balanced as
an odd number of nodes can be.

Eric Sosman es*****@acmdotorg.invalid   
Eric Sosman wrote:
Unbury. "Take alternate entries" just changes the pattern
of splitting (unstably), not the number of split passes. It
makes the same number of node visits (= link traversals) as
"split in the middle" does.
The nice thing about McCarthy's "split mechanism" is that it
makes one count them one pass over the full list instead of the
lg(N) passes of "split in the middle" or "alternate entries."
The resulting sort is stable (assuming you don't do anything dumb).
The unnice thing about McCarthy's method is that after
the initial phase of perfectlybalanced poweroftwo merges
you're left with a cleanup phase in which the merges are not
so well balanced. Sorting a 1025node list, for example, uses
perfectlybalanced merges to sort the first 1024 items, then
performs a 1024plus1 merge to insert the final node: 512.9990+
comparisons (average) to choose among just 1026 possible outcomes.
Inserting another "stage" into McCarthy produces
one list each of 512, 256, ..., 4 nodes
two lists of 2 nodes
one list of 1 node
... just before the cleanup phase. The cleanup merges are then
2+1, 2+3, 4+5, 8+9, ..., 512+513, each as close to balanced as
an odd number of nodes can be.
Sounds interesting, though I don't understand it fully and
I don't have access to any copy of Knuth's books. Code might
be interesting, though. What about posting it or making it
available for download?
If I get it correctly, the algorithm you are talking about
is unstable, works well only for powers of two (except with
some special modification to avoid this) and doesn't
recognize natural runs in the input? I would like to test
it against mine to see how it performs.   
On Fri, 26 Jan 2007 10:51:13 0500, CBFalconer <cb********@yahoo.com>
wrote:
>Eric Sosman wrote:
>CBFalconer wrote:
>>jo**@online.de wrote:
Thanks for the comment! Your implementation is basically the one from Sedgewick's book. It has the disadvantage that you have to run through the linked lists repeatedly to split it, but it is also localized, i.e. takes advantage of caches.
Well, I read Sedgewick long ago :). However I fail to see how you can do a merge without first splitting.
Split and merge as you go. See TAOCP (vol III) exercise 5.2.417, where the idea is credited to John McCarthy. The basic McCarthy method has peaks of inefficiency when the list length is one greater than a power of two (more generally, when the binary representation of the node count has 1bits that are separated by runs of 0bits), but I've found[*] a simple way to fix them without losing much speed in the nicer cases. Code available on request.
[*] I also "found" McCarthy's method for myself before reading it in Knuth, and for all I know others have found "my" improvement before I did. The original is a truly pretty algorithm, a real "Why didn't I think of that?" piece of work. Highly recommended!
My Knuth is buried in boxes somewhere. Sounds as if the split mechanism is something like take alternate entries, and my instinct tells me that this will disturb the stability of the sort, which in turn is one of the most attractive features of mergesort.
You're thinking of his algorithm L. He does split the list using
alternate entries. I think you're right about L not being stable,
though there may be some hidden trick in his description. (I really
hate Knuth's algorithm descriptions.) However the McCarthy algorithm is
stable.   
On Fri, 26 Jan 2007 20:42:45 +0100, Joerg Schoen
<jo************@online.dewrote:
>Eric Sosman wrote:
> Unbury. "Take alternate entries" just changes the pattern of splitting (unstably), not the number of split passes. It makes the same number of node visits (= link traversals) as "split in the middle" does.
The nice thing about McCarthy's "split mechanism" is that it makes one count them one pass over the full list instead of the lg(N) passes of "split in the middle" or "alternate entries." The resulting sort is stable (assuming you don't do anything dumb).
The unnice thing about McCarthy's method is that after the initial phase of perfectlybalanced poweroftwo merges you're left with a cleanup phase in which the merges are not so well balanced. Sorting a 1025node list, for example, uses perfectlybalanced merges to sort the first 1024 items, then performs a 1024plus1 merge to insert the final node: 512.9990+ comparisons (average) to choose among just 1026 possible outcomes. Inserting another "stage" into McCarthy produces
one list each of 512, 256, ..., 4 nodes two lists of 2 nodes one list of 1 node
... just before the cleanup phase. The cleanup merges are then 2+1, 2+3, 4+5, 8+9, ..., 512+513, each as close to balanced as an odd number of nodes can be.
Sounds interesting, though I don't understand it fully and I don't have access to any copy of Knuth's books. Code might be interesting, though. What about posting it or making it available for download?
If I get it correctly, the algorithm you are talking about is unstable, works well only for powers of two (except with some special modification to avoid this) and doesn't recognize natural runs in the input? I would like to test it against mine to see how it performs.
You don't get it correctly. You are implementing McCarthy's algorithm.
He's talking about the same thing that you are doing, except that he is
making a slight modification at the end to avoid some final inefficient
merges. His version is stable. Recognizing natural runs in the input
is irrelevant. The basic unit in McCarthy's algorithm is an ascending
run. It can be a run of length 1 or a natural ascending run.   
On Fri, 26 Jan 2007 13:04:00 0500, Eric Sosman
<es*****@acmdotorg.invalidwrote:
[snip clever cleanup of MCrthy's algorithm]
Very pretty.
Can you describe the general case or is it an exercise for the reader?   
Joerg Schoen wrote:
Eric Sosman wrote:
> Unbury. "Take alternate entries" just changes the pattern of splitting (unstably), not the number of split passes. It makes the same number of node visits (= link traversals) as "split in the middle" does.
The nice thing about McCarthy's "split mechanism" is that it makes one count them one pass over the full list instead of the lg(N) passes of "split in the middle" or "alternate entries." The resulting sort is stable (assuming you don't do anything dumb).
The unnice thing about McCarthy's method is that after the initial phase of perfectlybalanced poweroftwo merges you're left with a cleanup phase in which the merges are not so well balanced. Sorting a 1025node list, for example, uses perfectlybalanced merges to sort the first 1024 items, then performs a 1024plus1 merge to insert the final node: 512.9990+ comparisons (average) to choose among just 1026 possible outcomes. Inserting another "stage" into McCarthy produces
one list each of 512, 256, ..., 4 nodes two lists of 2 nodes one list of 1 node
... just before the cleanup phase. The cleanup merges are then 2+1, 2+3, 4+5, 8+9, ..., 512+513, each as close to balanced as an odd number of nodes can be.
Sounds interesting, though I don't understand it fully and
I don't have access to any copy of Knuth's books. Code might
be interesting, though. What about posting it or making it
available for download?
If I get it correctly, the algorithm you are talking about
is unstable, works well only for powers of two (except with
some special modification to avoid this) and doesn't
recognize natural runs in the input? I would like to test
it against mine to see how it performs.
#include <stddef.h>
#include <limits.h>
typedef struct unknown Node;
#define NEXT(ptr) *(Node**)((char*)(ptr) + offset)
#define MAXBITS (sizeof(Node*) * CHAR_BIT)
static void *
merge(Node *p, Node *q,
size_t offset,
int (*compare)(const void *, const void *))
/*
* Internal routine to merge two nonempty sorted lists. If elements
* of the two lists compare equal, those from the `p' list will precede
* those from the `q' list in the merged output.
*/
{
Node *head, **tail;
tail = &head;
for (;;) {
if (compare(p, q) <= 0) {
*tail = p;
tail = &NEXT(p);
if ( (p = NEXT(p)) == NULL ) {
*tail = q;
break;
}
}
else {
*tail = q;
tail = &NEXT(q);
if ( (q = NEXT(q)) == NULL ) {
*tail = p;
break;
}
}
}
return head;
}
void * /* returns: pointer to new list head */
listsort(
void *head, /* first item in original list */
size_t offset, /* byte offset of link field in each item */
int (*compare) /* item comparison function */
(const void *, const void *))
/*
* listsort() rearranges the links of a singlylinked NULLterminated
* list so they traverse the items in an order defined by a caller
* provided comparison function, and returns a pointer to the first
* item in the rearranged list. The comparison function accepts two
* item pointers and returns a negative, zero, or positive integer to
* indicate that the first item compares less than, equal to, or greater
* than the second. The sort is stable.
*/
{
Node *list[MAXBITS1]; /* sorted sublists of 2,4,8,... items */
Node *p, *q;
int bits, maxbits;
list[ maxbits = 0 ] = NULL;
while ( (p = head) != NULL ) {
if ( (q = NEXT(p)) == NULL )
break;
head = NEXT(q);
if (compare(p, q) <= 0) {
NEXT(q) = NULL;
}
else {
NEXT(q) = p;
NEXT(p) = NULL;
p = q;
}
for (bits = 0; (q = list[bits]) != NULL; ) {
list[bits] = NULL;
p = merge(q, p, offset, compare);
if (++bits maxbits) {
maxbits = bits;
break;
}
}
list[bits] = p;
}
for (bits = 0; bits <= maxbits; ++bits) {
if ( (p = list[bits]) != NULL )
head = (head == NULL) ? p : merge(p, head, offset, compare);
}
return head;
}
Notes:
1) This is packaged as a typeblind "generic" list sort, the
only assumptions being that the links are struct pointers and
that a list's final node has a NULL link. A typeaware version
(with knowledge of the nodes' structure and able to compare nodes
inline) should run faster.
2) The sort doesn't actually need a comparison function that
returns ve/zero/+ve: a Boolean `precedes' function would do as
well and could be faster. In this sort, I decided to stick with
a qsortandbsearch style, mostly because that's what people are
accustomed to.
3) This code implements the basic McCarthy method without
protection against badly unbalanced merges in the cleanup phase.
3a) Well, not the "most basic" McCarthy: It sorts twonode
lists inline instead of merging two onenode lists.
3b) Knuth suggests that "we can sort groups of, say, 16 items
using straight insertion," but in tests I've conducted and on the
machines available to me that doesn't seem to be a good idea. (He
makes the suggestion in connection with merge using arrays instead
of lists, and perhaps things would be different in that setting.)
4) The McCarthy algorithm may be more understandable if you
note the parallel to the process of counting in binary. When N
items (or item pairs, with the 3a optimization) have been consumed,
the elements of list[] correspond to the binary representation of
N: each zero bit corresponds to a NULL in list[], and each one bit
to a nonNULL link to a sorted sublist of N items (2*N with 3a).
Consuming the next item (or pair) "increments" N by working from
right to left: if the loworder bit is zero, the new item lands
in list[0]. Otherwise, it is merged with list[0], which becomes
NULL, and then the new twoitem (twopair) list "carries" to the
list[1] place, and so on.
5) To avoid wildly unbalanced merges during cleanup, use two
arrays alist[] and blist[] instead of the single array list[].
The "incrementing" process first tries to fill the NULL link in
alist[0], but if it's nonNULL it uses blist[0] instead. If
both are nonNULL, it merges those two lists, deposits the new
sublist in alist[0] and NULLs blist[0], then "carries" the merge
result one place to the left:
for (bits = 0; ; p = q) {
if (alist[bits] == NULL) {
alist[bits] = p;
break;
}
if (blist[bits] == NULL) {
blist[bits] = p;
break;
}
q = merge(alist[bits], blist[bits], offset, compare);
alist[bits] = p;
blist[bits] = NULL;
if (++bits maxbits) {
maxbits = bits;
alist[bits] = q;
blist[bits] = NULL;
break;
}
}
In my tests, this method is just a hair slower than the original
on lists of length 2^k or 2^k1, but shows a substantial gain for
lengths like 2^k+1. It is the fastest "generalpurpose" list
merge I know.
6) All the versions mentioned above are "straight" merges.
"Natural" merges that make use of the order already present in
the input list can beat the pants off them *if* the list is in
very good order already. However, in my tests the preexisting
order must be very good indeed for natural to beat straight;
Knuth's remark in the answer to exercise 5.2.412
"We may conclude that natural merging is preferable to
straight merging when linked allocation is being used,
although it was inferior for sequential allocation."
is not supported by my data. (Hypothesis: Knuth was thinking
about the number of comparisons during the merges themselves,
but forgot about the N1 comparisons required to discover the
boundaries between the initial runs. On "random" input, a
natural merge uses N1 comparisons to create approximately N/2
initial runs, while a straight merge uses only N/2 comparisons
to make the same amount of progress.)
7) I've been working on and off  mostly off  on a paper
describing test results for various kinds of linkedlist merges,
but I'm not ready to "publish," however informally. When last
I set it aside, I was having a hard time writing a linkedlist
Quicksort that didn't exhibit pathological behavior. Maybe if
I convince myself it's just not possible, I'll finish the damn'
thing and post it.
8) An interesting (and possibly discouraging) observation:
Efficient linkedlist sorting is relatively unimportant! First,
sorting a linked list brings very little "algorithmic advantage"
the way sorting an array does: In a sorted array you can use
things like binary search, but sorting a linked list offers no
similar gain. (Sorting cuts the average time for an unsuccessful
search in half, but if N is large there are better ways to search.)
Second, linked lists are "unfriendly" to the memory implementations
of contemporary systems  bad locality, cache thrashing, etc. 
so a good programmer usually tries to keep them short; if N is
small, even an inefficient sort will be quick. Yes, there are
circumstances where an efficient linkedlist sort is needed, but
IMHO they are (or should be) on the rare side.

Eric Sosman es*****@acmdotorg.invalid   
Eric Sosman wrote:
3b) Knuth suggests that "we can sort groups of, say, 16 items
using straight insertion," but in tests I've conducted and on the
machines available to me that doesn't seem to be a good idea. (He
makes the suggestion in connection with merge using arrays instead
of lists, and perhaps things would be different in that setting.)
I agree. As mentioned above, I made the same observation. Earlier
in this threadm, wa**@stoner.com gave a useful explanations on this
phenomenon
4) The McCarthy algorithm may be more understandable if you
note the parallel to the process of counting in binary. When N
items (or item pairs, with the 3a optimization) have been consumed,
the elements of list[] correspond to the binary representation of
N: each zero bit corresponds to a NULL in list[], and each one bit
to a nonNULL link to a sorted sublist of N items (2*N with 3a).
Consuming the next item (or pair) "increments" N by working from
right to left: if the loworder bit is zero, the new item lands
in list[0]. Otherwise, it is merged with list[0], which becomes
NULL, and then the new twoitem (twopair) list "carries" to the
list[1] place, and so on.
I can now see that my implementation is basically a reformulation of
McCarthys algorithm. Thanks for revealing this! Your implementations is
more clever, though it takes a bit longer to comprehend. Thanks for the
explanation!
5) To avoid wildly unbalanced merges during cleanup, use two
arrays alist[] and blist[] instead of the single array list[].
The "incrementing" process first tries to fill the NULL link in
alist[0], but if it's nonNULL it uses blist[0] instead. If
both are nonNULL, it merges those two lists, deposits the new
sublist in alist[0] and NULLs blist[0], then "carries" the merge
result one place to the left:
.. snip ..
In my tests, this method is just a hair slower than the original
on lists of length 2^k or 2^k1, but shows a substantial gain for
lengths like 2^k+1. It is the fastest "generalpurpose" list
merge I know.
You don't mention how you modify the final "cleanup" loop. I did it like
that:
for(bits = 0 ; bits <= maxbits ; ++bits) {
if((p = alist[bits]) != NULL)
head = (head == NULL) ? p : merge(p, head, offset, compare);
if((p = blist[bits]) != NULL)
head = (head == NULL) ? p : merge(p, head, offset, compare);
}
Maybe there is another / better way? At least the sorting then works.
But I cannot confirm your tests. Here are my results for three
different lengths. (1) sorts a random list, (2) random list with each
value 10 times, (3) an ordered list, and (4) and invers ordered list.
I compare my implementation against the code you supplied ("McCarthy")
and the improved version ("McCarthy II").
Length  mine  McCarthy  McCarthy II
++++
4*10^6    
(1)  7.6  7.55  8.26 
(2)  7.67  7.63  8.32 
(3)  0.64  0.63  0.73 
(4)  0.59  0.58  0.88 
++++
2^22    
(1)  8.02  7.97  8.27 
(2)  8.09  8.04  8.34 
(3)  0.67  0.65  0.77 
(4)  0.63  0.63  0.81 
++++
2^22+1    
(1)  8.66  8.63  8.25 
(2)  9.17  9.11  8.31 
(3)  0.79  0.77  0.74 
(4)  0.62  0.64  0.79 
++++
As you can see, "McCarthy" and "mine" are basically the same, yours is a
bit faster in all cases.
But the improved one with two lists "alist" and "blist" is slower all
the time except for the 2^n+1 case.
6) All the versions mentioned above are "straight" merges.
"Natural" merges that make use of the order already present in
the input list can beat the pants off them *if* the list is in
very good order already. However, in my tests the preexisting
order must be very good indeed for natural to beat straight;
Knuth's remark in the answer to exercise 5.2.412
"We may conclude that natural merging is preferable to
straight merging when linked allocation is being used,
although it was inferior for sequential allocation."
is not supported by my data. (Hypothesis: Knuth was thinking
about the number of comparisons during the merges themselves,
but forgot about the N1 comparisons required to discover the
boundaries between the initial runs. On "random" input, a
natural merge uses N1 comparisons to create approximately N/2
initial runs, while a straight merge uses only N/2 comparisons
to make the same amount of progress.)
I disagree. In my tests, the natural merges are always faster or only
slightly slower. Please note that the additional compares to check for
natural runs is only O(N) which looks affordable to me.
As mentioned before in this thread, the natural merge excels in case the
list is sorted and looses a few percent in the worst case, i. e. invers
ordered list.
8) An interesting (and possibly discouraging) observation:
Efficient linkedlist sorting is relatively unimportant! First,
sorting a linked list brings very little "algorithmic advantage"
the way sorting an array does: In a sorted array you can use
things like binary search, but sorting a linked list offers no
similar gain. (Sorting cuts the average time for an unsuccessful
search in half, but if N is large there are better ways to search.)
Second, linked lists are "unfriendly" to the memory implementations
of contemporary systems  bad locality, cache thrashing, etc. 
so a good programmer usually tries to keep them short; if N is
small, even an inefficient sort will be quick. Yes, there are
circumstances where an efficient linkedlist sort is needed, but
IMHO they are (or should be) on the rare side.
Agreed, linked list sorting is a bit questionable, so one shouldn't put
too much effort into it. But I actually have some cases where I deal
with doubly linked lists as a usefull general purpose data structure in
a large application program and also want to provide a means to sort it
(e. g. before generating output of the list). Here I wanted to have an
implementation I shouldn't be ashamed off :)
Regarding locality, the McCarthy algortihm is still the best one can
achieve.   
Joerg Schoen wrote:
Eric Sosman wrote:
.... snip ...
>
>8) An interesting (and possibly discouraging) observation: Efficient linkedlist sorting is relatively unimportant! First, sorting a linked list brings very little "algorithmic advantage" the way sorting an array does: In a sorted array you can use things like binary search, but sorting a linked list offers no similar gain. (Sorting cuts the average time for an unsuccessful search in half, but if N is large there are better ways to search.) Second, linked lists are "unfriendly" to the memory implementations of contemporary systems  bad locality, cache thrashing, etc.  so a good programmer usually tries to keep them short; if N is small, even an inefficient sort will be quick. Yes, there are circumstances where an efficient linkedlist sort is needed, but IMHO they are (or should be) on the rare side.
Agreed, linked list sorting is a bit questionable, so one shouldn't
put too much effort into it. But I actually have some cases where I
deal with doubly linked lists as a usefull general purpose data
structure in a large application program and also want to provide a
means to sort it (e. g. before generating output of the list). Here
I wanted to have an implementation I shouldn't be ashamed off :)
If you want to search a table of some form, a hashtable is the most
efficient, since O(1) beats O(logN) by a wide margin. However you
may want to dump the table in sorted order, which can be awkward.
The cure is to be able to form a list of the table content, sort
it, and dump from that. That is easily implemented by simply
including a single spare pointer in the table items. See the
demonstration programs for hashlib on my site. The hash table has
to have a means of walking its content. The overall process is
O(NlogN).
As an extension, you can include two spare pointers and form a
tree. This would probably be the best method when you want to
search for a range, rather than a specific item.
This brings up another consideration  worst case timing. While
quicksort is slighly faster than mergesort, it has a horrible
O(N*N) worst case. With mergesort, the average run time and the
worst case are both O(NlogN). This again makes it an optimum
choice for realtime systems (not to mention stability). AFAIK no
inplace sort can do better (the in place eliminates radix sort).

Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>   
Joerg Schoen wrote:
[in Email and on Usenet; either is fine, but both risks confusion]
[...]
You don't mention how you modify the final "cleanup" loop. I did it like
that:
for(bits = 0 ; bits <= maxbits ; ++bits) {
if((p = alist[bits]) != NULL)
head = (head == NULL) ? p : merge(p, head, offset, compare);
if((p = blist[bits]) != NULL)
head = (head == NULL) ? p : merge(p, head, offset, compare);
}
That will sort, but it ruins stability. Exchange the two
`if' statements and stability is restored. (Stability is not
the most important characteristic of a sort, but when it can be
had for free there's no reason not to preserve it.)
I compare my implementation against the code you supplied ("McCarthy")
and the improved version ("McCarthy II").
Length  mine  McCarthy  McCarthy II
++++
4*10^6    
(1)  7.6  7.55  8.26 
(2)  7.67  7.63  8.32 
(3)  0.64  0.63  0.73 
(4)  0.59  0.58  0.88 
++++
2^22    
(1)  8.02  7.97  8.27 
(2)  8.09  8.04  8.34 
(3)  0.67  0.65  0.77 
(4)  0.63  0.63  0.81 
++++
2^22+1    
(1)  8.66  8.63  8.25 
(2)  9.17  9.11  8.31 
(3)  0.79  0.77  0.74 
(4)  0.62  0.64  0.79 
++++
As you can see, "McCarthy" and "mine" are basically the same, yours is a
bit faster in all cases.
But the improved one with two lists "alist" and "blist" is slower all
the time except for the 2^n+1 case.
My tests concentrated on much shorter lists  as I mentioned
upthread, I believe it is usually a bad idea to sort very long
lists (or even to build them). I used twentyseven assorted list
lengths from 17 to 6144, with keys in unbiased order or biased
toward rising or falling order, with "fast" and "slow" comparison
functions, on five machines covering multiple generations of x86
and UltraSPARC architectures, and with three different compilers.
I also made an effort to scatter the list nodes in memory, trying
to make the links bounce around across different pages as a real
list might (a list whose nodes all come from one big array may
exhibit too much cache locality).
Twentyseven sort implementations of six algorithms were tested
(as I mentioned earlier, I feel there are a few more that ought to
be tried for completeness' sake). The testing used a "tournament"
format: Try all the implementations on a few test combinations,
eliminate the slowest few, try the survivors on more combinations,
prune again, and so on. Six methods made it to the final round
every time; eight methods never survived the first round.
Modified McCarthy was the overall winner in five of the seven
tested configurations, taking a second and a third place in the
other two. Basic McCarthy won in one configuration, with two
seconds, three thirds, and one didnotplace. Other high finishers
were a Modified McCarthy that insertionsorted groups of four nodes
prior to merging (one first place, one second, one third) and a
modified natural merge (three second places).
My choice of test data may have given Modified McCarthy an
unfair advantage: nine of the tested lengths were of the form
2^k+1, and another nine looked like 3*2^k (== 2^(k+1) + 2^k).
In my zeal to avoid the unbalanced merges of Basic McCarthy I may
have given too much emphasis to lengths it doesn't like and thus
shown Modified McCarthy in too favorable a light. The choice of
test data is often the weakest point  or at least the most
debatable  in a timing investigation.
>Knuth's remark in the answer to exercise 5.2.412
"We may conclude that natural merging is preferable to straight merging when linked allocation is being used, although it was inferior for sequential allocation."
is not supported by my data. (Hypothesis: Knuth was thinking about the number of comparisons during the merges themselves, but forgot about the N1 comparisons required to discover the boundaries between the initial runs. On "random" input, a natural merge uses N1 comparisons to create approximately N/2 initial runs, while a straight merge uses only N/2 comparisons to make the same amount of progress.)
I disagree. In my tests, the natural merges are always faster or only
slightly slower. Please note that the additional compares to check for
natural runs is only O(N) which looks affordable to me.
Both N1 and floor(N/2) are O(N), so BigOh doesn't tell enough
of the story.
As mentioned before in this thread, the natural merge excels in case the
list is sorted and looses a few percent in the worst case, i. e. invers
ordered list.
On a reversed list, natural merge makes N1 comparisons to form
N onenode sublists. Thereafter, it follows exactly the same pattern
of merges a straight merge would. In the worst case, then, natural
merge makes N1 more comparisons than straight merge.

Eric Sosman es*****@acmdotorg.invalid   
Eric Sosman wrote:
[in Email and on Usenet; either is fine, but both risks confusion]
wasn't sure you read usenet regularly. I'll promise not doing it again.
That will sort, but it ruins stability. Exchange the two
`if' statements and stability is restored. (Stability is not
the most important characteristic of a sort, but when it can be
had for free there's no reason not to preserve it.)
Thanks for the hint. I didn't think much of it, just tried to get the
program to work.
>As you can see, "McCarthy" and "mine" are basically the same, yours is a bit faster in all cases.
But the improved one with two lists "alist" and "blist" is slower all the time except for the 2^n+1 case.
My tests concentrated on much shorter lists  as I mentioned
upthread, I believe it is usually a bad idea to sort very long
lists (or even to build them). I used twentyseven assorted list
lengths from 17 to 6144, with keys in unbiased order or biased
toward rising or falling order, with "fast" and "slow" comparison
functions, on five machines covering multiple generations of x86
and UltraSPARC architectures, and with three different compilers.
I also made an effort to scatter the list nodes in memory, trying
to make the links bounce around across different pages as a real
list might (a list whose nodes all come from one big array may
exhibit too much cache locality).
Agreed that long linked lists and sorting them is odd, but for the sake
of testing the algorithm it sounds OK to me. Moreover, long lists test
locality. Maybe I should also disperse my lists throughout memory to
see the effect. But my understanding of a CPU cache is that the lower
bits of the address are used as cache addresses, so e. g. putting each
node at the start of a page would trash the cache. Having them in an
array should have the same result as wildly dispersing them in memory.
I'll try it and let you know.
Still it looks odd to me that for an "arbitrary" number like 4 * 10^6
the results become worse for the "improved McCarthy".
I have also tested my implementation on a variety of hardware including
AIX, HPUX, SunOS, etc. If time permits, I will also test yours.
Twentyseven sort implementations of six algorithms were tested
(as I mentioned earlier, I feel there are a few more that ought to
be tried for completeness' sake). The testing used a "tournament"
format: Try all the implementations on a few test combinations,
eliminate the slowest few, try the survivors on more combinations,
prune again, and so on. Six methods made it to the final round
every time; eight methods never survived the first round.
Modified McCarthy was the overall winner in five of the seven
tested configurations, taking a second and a third place in the
other two. Basic McCarthy won in one configuration, with two
seconds, three thirds, and one didnotplace. Other high finishers
were a Modified McCarthy that insertionsorted groups of four nodes
prior to merging (one first place, one second, one third) and a
modified natural merge (three second places).
My choice of test data may have given Modified McCarthy an
unfair advantage: nine of the tested lengths were of the form
2^k+1, and another nine looked like 3*2^k (== 2^(k+1) + 2^k).
In my zeal to avoid the unbalanced merges of Basic McCarthy I may
have given too much emphasis to lengths it doesn't like and thus
shown Modified McCarthy in too favorable a light. The choice of
test data is often the weakest point  or at least the most
debatable  in a timing investigation.
What about trying all possible lengths in a range from 2^k .. 2^k+1 or
so and produce a graph that compares different algorithms?
I agree that your choice maybe favoured modified McCarthy too much.
>I disagree. In my tests, the natural merges are always faster or only slightly slower. Please note that the additional compares to check for natural runs is only O(N) which looks affordable to me.
Both N1 and floor(N/2) are O(N), so BigOh doesn't tell enough
of the story.
I don't quite get this. Mergesort is O(N * log(N)), adding another O(N)
only increases the leading factor but the higher N becomes the more
irrelevant gets this additional contribution.
On a reversed list, natural merge makes N1 comparisons to form
N onenode sublists. Thereafter, it follows exactly the same pattern
of merges a straight merge would. In the worst case, then, natural
merge makes N1 more comparisons than straight merge.
Yes, but that's neglicible for an O(N * log(N)) algorithm and improves
it in real cases with partial order.   
CBFalconer wrote:
If you want to search a table of some form, a hashtable is the most
efficient, since O(1) beats O(logN) by a wide margin. However you
may want to dump the table in sorted order, which can be awkward.
The cure is to be able to form a list of the table content, sort
it, and dump from that. That is easily implemented by simply
including a single spare pointer in the table items. See the
demonstration programs for hashlib on my site. The hash table has
to have a means of walking its content. The overall process is
O(NlogN).
Thanks for pointing this out! The actual situation is that the
(existing) application program I have in mind already uses linked lists
frequently for typical data structures. I now want to provide
an "improved" implementation and also want to offer an easy way of
sorting it prior to output.
I'll take your suggestion like that: I'll have another spare pointer in
the linked list which can be used to generate a hash table from the
list. This becomes handy if the list is to be searched for individual
entries frequently.
As an extension, you can include two spare pointers and form a
tree. This would probably be the best method when you want to
search for a range, rather than a specific item.
I already have two pointers (doubly linked lists), add another for
hashing the whole thing like suggested above. Maybe I offer a function
that converts the doublylinked list into a balanced tree. Would be a
nice excercise for redblack trees I suppose.
This brings up another consideration  worst case timing. While
quicksort is slighly faster than mergesort, it has a horrible
O(N*N) worst case. With mergesort, the average run time and the
worst case are both O(NlogN). This again makes it an optimum
choice for realtime systems (not to mention stability). AFAIK no
inplace sort can do better (the in place eliminates radix sort).
You are right, but I usually live with the "worst case timing" of
quicksort, believing that this only appears in rare cases. I am not too
paranoid  it's just that everybody's after me :)   
Joerg Schoen wrote:
Eric Sosman wrote:
>Joerg Schoen wrote:
>>I disagree. In my tests, the natural merges are always faster or only slightly slower. Please note that the additional compares to check for natural runs is only O(N) which looks affordable to me.
Both N1 and floor(N/2) are O(N), so BigOh doesn't tell enough of the story.
I don't quite get this. Mergesort is O(N * log(N)), adding another O(N)
only increases the leading factor but the higher N becomes the more
irrelevant gets this additional contribution.
For sufficiently large N, only the leading term and its
coefficient are important. But N is not always "sufficiently
large," and the lowerorder terms can be significant or even
dominant when N is small. That's why Quicksort implementations
usually resort to insertion sort for short subfiles: even though
insertion sort is O(N*N), its simplicity typically gives a much
smaller coefficient than that of Quicksort's O(N*ln(N)), so it
winds up being faster for small N.
My interest was not in sorting lists of four million nodes,
but in developing a "generalpurpose" utility that works well
over a wide range of list lengths and doesn't behave too badly
if the callersupplied comparison function is sluggish. The
exercise has given me a renewed appreciation of the difficulties
and compromises faced by the developer of "generalpurpose"
routines (and has increased my already healthy skepticism about
the practicality of code reuse). It is likely that I'd have
come to different conclusions if I'd started with a different
notion of how a "generalpurpose" listsorter would be used.
In the tests I ran, one variation of natural merge performed
respectably. But three variations of straight merge (all based
on McCarthy) did just a little better. YMMV.
> On a reversed list, natural merge makes N1 comparisons to form N onenode sublists. Thereafter, it follows exactly the same pattern of merges a straight merge would. In the worst case, then, natural merge makes N1 more comparisons than straight merge.
Yes, but that's neglicible for an O(N * log(N)) algorithm and improves
it in real cases with partial order.
I once tried to calculate the breakeven point, the average
run length where natural merge's shallower tree repaid the upfront
investment of N1 comparisons. IIRC (this was a while ago, and the
details of the calculation elude me at the moment), the average run
length needed to exceed 3.2 or so for natural merge to win over
straight merge. That may not seem like a lot, but it is extremely
rare in "random" input permutations. Only 15111 of the 362880
permutations of nine distinct keys have three or fewer runs and
hence an average run length of three or more; that's a little less
than 4.2%. For ten keys, only 1.3% of the permutations have three
or fewer runs, and the percentages keep diminishing. For even a
moderate N, very few permutations have fewer than N/3.2 runs.
On the other hand, it must be admitted that real input lists
are probably not "truly random." We're back to the difficult
question of developing generalpurpose software for "typical"
cases, without any solid notion of what "typical" is. YM, as
I said before, MV.

Eric Sosman es*****@acmdotorg.invalid   
Joerg Schoen wrote:
CBFalconer wrote:
>If you want to search a table of some form, a hashtable is the most efficient, since O(1) beats O(logN) by a wide margin. However you may want to dump the table in sorted order, which can be awkward. The cure is to be able to form a list of the table content, sort it, and dump from that. That is easily implemented by simply including a single spare pointer in the table items. See the demonstration programs for hashlib on my site. The hash table has to have a means of walking its content. The overall process is O(NlogN).
Thanks for pointing this out! The actual situation is that the
(existing) application program I have in mind already uses linked
lists frequently for typical data structures. I now want to provide
an "improved" implementation and also want to offer an easy way of
sorting it prior to output.
I'll take your suggestion like that: I'll have another spare pointer
in the linked list which can be used to generate a hash table from
the list. This becomes handy if the list is to be searched for
individual entries frequently.
Take a look at hashlib. All the mechanisms are in there for
forming a list from the table contents. Once you have formed a
list it will remain, ignoring new additions to the table (but
beware deletions if you destroy the deleted entries, which is up to
you).
<http://cbfalconer.home.att.net/download/>
The above assumes you are able to use GPLd software. If not, I can
always be contacted for another license.

Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>   
Um, no, you miss the essential point. The uniqueness lies in the way in
which he accesses the data, not in the fact that he is using natural
mergesort. In fact he conditionalizes whether it is natural or not.
I saw that. But that was via the preprocessor.
Whatever way he compiles it, its behaviour does not change at runtime.
I will recheck mind you.
It isn't true that you can't do a natural mergesort for arrays.
Okay, I will go with that. I overstated the case. But lets go over that.
To do natural mergesort with arrays requires an auxilary array where item
records
 the start of a run
 how many elements in the run
as there may be no other place to record that information
And there are 2 algorithms with natural mergesort:
 there is the recursive version
 there is the iterative version
I dont see how to do a recursive version with natural mergesort for arrays.
Because consider the first run  it could be the only run or it could be one
of many.
How many function calls are made recursively depends on the number of runs
and that is not known in advance or as your are going along.
On the iterative version of natural mergesort  you can make 1 pass and
record the index and run length.
After that, you merge adjacent entries and keep doing so until you have 1
run.
Technically the auxilary array could be O(N) in size in everything is in
reverse order.
The overhead of having to maintain an auxiliary array is quite substantial
but probably less than standard merge sort.
Stephen Howe   This discussion thread is closed Replies have been disabled for this discussion. Similar topics
7 posts
views
Thread by Chris Ritchey 
last post: by

3 posts
views
Thread by s_subbarayan 
last post: by

7 posts
views
Thread by deanfamily 
last post: by

reply
views
Thread by Joerg Schoen 
last post: by
           