Benchmarks

s0suk3

The task: Write a program that reads a set of words from standard
input and prints the number of distinct words.

I came across a website that listed a few programs to accomplish this
task: http://unthought.net/c++/c_vs_c++.html (ignore all the language
flaming :-), and thought that all of them did unnecessary operations,
so I wrote my own. But for some reason, my version turned out slower
that ALL of the versions in the website, even though it seems to
perform less operations (yes, I benchmarked them on my own computer).

According to the website, the slowest version is:

#include <set>
#include <string>
#include <iostream>

int main(int argc, char **argv)
{
// Declare and Initialize some variables
std::string word;
std::set<std::stringwordcount;
// Read words and insert in rb-tree
while (std::cin >word) wordcount.insert(word);
// Print the result
std::cout << "Words: " << wordcount.size() << std::endl;
return 0;
}

My version is about 12 times slower than that. It uses lower-level
constructs. Here it is:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

struct SetNode
{
char *word;
struct SetNode *next;
};

// An unorderd set of words
//
static struct SetNode *gSet = 0;
static int gSetSize = 0;

#define kInitWordSize 32

// Returns a word read from stdin. The returned pointer must be
// deallocated with free().
//
static char *
ReadOneWord(void)
{
int ch = getchar();

while (ch != EOF && isspace(ch))
ch = getchar();
if (ch == EOF)
return 0;

char *word = (char *) malloc(kInitWordSize);
if (!word)
return 0;

int size = kInitWordSize;
int i = 0;

while (ch != EOF && !isspace(ch)) {
if (i >= size) {
size *= 2;

char *newWord = (char *) realloc(word, size);
if (!newWord) {
free(word);
return 0;
}
word = newWord;
}

word[i++] = ch;
ch = getchar();
}

if (i >= size) {
size *= 2;

char *newWord = (char *) realloc(word, size);
if (!newWord) {
free(word);
return 0;
}
word = newWord;
}
word[i] = '\0';

return word;
}

// Inserts a word into the set if it isn't in the set.
// The passed string is expected to have been allocated with
// a memory allocation function, and it should be considered
// lost after passed to this function.
//
static void
InsertWord(char *aWord)
{
struct SetNode *node;

for (node = gSet; node; node = node->next) {
if (strcmp(node->word, aWord) == 0) {
free(aWord);
return;
}
}

node = (struct SetNode *) malloc(sizeof(struct SetNode));
if (!node) {
free(aWord);
return;
}

node->word = aWord;
node->next = gSet;
gSet = node;
++gSetSize;
}

static void
DeleteSet(void)
{
struct SetNode *node = gSet;
struct SetNode *temp;

while (node) {
temp = node;
node = node->next;
free(temp->word);
free(temp);
}

gSet = 0;
gSetSize = 0;
}

int
main(void)
{
char *word;

while ((word = ReadOneWord()))
InsertWord(word);

printf("Words: %d\n", gSetSize);

// Skip cleanup for now...
//DeleteSet();
}

Any ideas as to what causes the big slowdown?

Sebastian

Nov 6 '08

Subscribe Post Reply

120

5449

James Kanze

On Nov 11, 5:18*am, Paul Hsieh <websn...@gmail.comwrote:

On Nov 10, 10:38*am, Juha Nieminen <nos...@thanks.invalidwrote:
James Kanze wrote:
On Nov 8, 12:27 am, user923005 <dcor...@connx.comwrote:
On Nov 7, 2:47 pm, Juha Nieminen <nos...@thanks.invalid>
wrote: Sure. *If hash(x) == return 1; then bad behavior
is to be expected. *The assumption is of maximally
cascade free hash functions like UMAC or Bob Jenkin's
hash.

Do you have any references on those, particularly with an
implementation? *All I found for UMAC was a reference to
book, and Bob Jenkin's seems more concerned with
cryptographic hashing than anything else. *If these are
standard and widely used algorithms for look-up hashing,
I'd like to add them to my set of hashing benchmarks.

(Note that cryptographic hashing and look-up hashing have
very different constraints, and a good hash function for
one isn't necessarily a good hash function for the other.
*Also, in look-up hashing, speed counts. *Taking 10 times
more time to get 1% better distribution on the average is
a loss.)

The arena of "practical non-cryptographic hash functions" is
clearly a relatively new field.

Yes and no. The problem has certainly been around for awhile,
but you're right that it doesn't seem to have attracted much
interest. The published works I've seen mostly just present a
function, and claim that it is good, with no analysis, and most
of the time with no real comparitive benchmarks either.

Outside of crypto-hashes, Bob Jenkin's function and my
function, the quality of everything else out there is truly
pathetic.

As I said, I'm interested, because I've been collecting
benchmarks of various hashing functions. Post a link to the
algorithm, and I'll add it in.

OK, I just saw a link at the end of your posting. So I've got a
model for your implementation. I'll add it to my tests at the
first possible occasion.

Bob Jenkins, for a long time, set the standard with his
lookup2 function (I think he wrote and publicized it in Dr.
Dobb's journal in 1996 or so.)

According to the link on your page, this is one I've already
tested. And found it to be slower than FNV or my own hash
functions. It is, IMHO, a typical example where being overly
complicated to match a particular machine doesn't pay. The
distribution isn't significantly better than FNV or my own, and
in fact it takes more time (on the machines I have access to) to
calculate.

However, in a kind of "first attempt" I was able to design a
hash function myself that was dramatically faster and had
similar quality. *(My function passes Bob Jenkins' avalanche
test as well as my own far more stringent bit distribution and
correlation test; Bob's function performs slightly worse on my
test.) *So its clear that there is plenty of room for research
here if anyone cares to take the time or put in the effort.
Bob rewrote a version of his function, which apparently comes
much closer to the performance of my function, but I have not
gone back to check it.

What is certain is that there is a lot of folklore floating
around, and very little real research. I'm not really much into
mathematical analysis of functions myself, so I can't do much on
that end. My basic idea was based on linear congruential random
number generators; intuitively, it seemed to me that whatever
made a good random number generator would also make a good
hashing algorithm. I also took into account that the execution
time of the hashing algorithm must be balanced against its
distribution characteristic; multiplying the execution time by
10 to gain 1% better distribution will result in slower
accesses, on the average. In my case, I was (at the time)
working on a machine (an 8086) with very, very slow
multiplication, so I became intrigued with the idea of using a
Mersenne prime as the multiplier (so that the multiplication
could be converted into a shift and a subtraction). And my
algorithm did beat out the few other examples I had at hand at
the time.

That was all a long time ago, but I've never really lost
interest in the subject. When Peter K. Pearons published his
article "Fast Hashing of Variable-Length Text Strings" in the
CACM, I created a small test harness, and compared it with the
others; my own turned out to be several times faster. Some time
later, someone in fr.comp.lang.c++ mentionned FNV hashing; by
that time, I had my "standard" benchmark harness designed, so I
whipped up a benchmark with that and all of the others I could
find, and wrote up the results in
http://kanze.james.neuf.fr/code/Docs/html/Hashcode.html. Since
then, I've modified my harness to use my own AssocArray class,
rather than the g++ hash_map (so that I could also test with Sun
CC, VC++, etc.), and have added quite a few additional
algorithms. The fundamental results haven't changed, however;
either a FNV or my Mersenne prime function are always amongst
the fastest (depending on multiplication speed and probably a
number of other factors I'm not aware of). The Pearson and the
Jenkens functions (at least the variants I'm using) are,
depending on the data, between 20% and 35% slower.

I've since extended the benchmark program to support
instrumentation. A quick check showed very little difference in
the actual distributions, so the difference is due to the fact
that the Mersenne prime function or FNV are faster to calculate.

Bob and I took different approaches. *He started with
something cryptographic, and knocked it down to a point where
it was faster, though losing the cryptographic properties that
were no longer required. *I instead took some ideas of his,
and built a function from the ground up that has no pedigree
from any cryptographic function. My design took modern CPU
architecture into account as well as trying to get to "the
heart of the matter" for hash function quality requirements.
So I started with a framework which I knew would deliver a
high performance result and injected quality into it.

The key point behind both our functions is that they are not
only good quality, they are objectively faster than all the
other typically used hash functions (CRC, Dan Bernstein's
ASCII hash, FNV hash, etc). *The only things that even compete
with our functions are things people already know are
worthless (like just summing up the bytes.)

It's interesting to note that on a modern machine, FNV or my own
functions do not take any more time than just summing the bytes,
but result in a very good distribution.

I'm pretty sure Jenkin's hashing function (which uses the
same algorithm as his ISAAC random number generator) is
faster than anything you could ever create yourself, having
even a fraction of the same quality.

Is that a commentary on what you think of James Kanze's
abilities, or are you just indirectly praising people like Bob
Jenkins and myself as being some kind of untouchable
uber-programmers? *If the latter, I would say its likely
unwarranted.

I think we both agree that the final word hasn't been written.
I'm curious to see how you code does, however, because on your
web page, you say you find that Bob's function is faster than
FNV, on an Athlon. Where as I find just the opposite, on a wide
variety of machines (Sun Sparc, some older Intel, and now on
both Intel and AMD based Linux boxes.)

FWIW, I just reran one set of tests on my machine here, an AMD
64 bit machine. Jenkins is almost exactly the same as FNV, and
slightly slower than my Mersenne prime hash using 2^7-1 as
multiplier. This was for a set of 8554 symbols extracted from
my code. A second trial with 1259 URL's (extracted from
somewhere, I forget where), did show Jenkins as slightly better.
So maybe it depends on typical length; the URL's are longer, on
the average, than my program symbols.

At any rate, my recommendation still stands: Mersenne primes
with 2^7-1 as multiplier. That seems to give the best results
over a wide variety of data and hardware. But if your algorithm
is significantly faster than Jenkins, it's definitely worth
looking at. I'll add it to my tests.

>This problem is wide open for anyone with good math/programming
skills with a good rough idea about modern CPU performance.
*Specifically: the mantle is up for grabs for anyone to come up
with a good *64 bit* hash function. Ironically, my gut feeling
is that it should be possible to come up with a function nearly
twice as fast as mine if you can make the assumption that your
platform natively supports 64 bit scalars (which modern systems
now do.)

Note that on a modern CPU, I would expect the byte accesses in
FNV or my own algorithms to have very little impact, since the
entire inner loop will be in cache, all intermediate variables
in registers, so the only memory access will be the characters
in the string, and the CPU will find the data in its pipeline
for all of the reads except for the first in each cache line.

So I think that the word length factor is really a red herring.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 11 '08 #101

user923005

On Nov 11, 8:02*am, James Kanze <james.ka...@gmail.comwrote:

On Nov 11, 5:18*am, Paul Hsieh <websn...@gmail.comwrote:
On Nov 10, 10:38*am, Juha Nieminen <nos...@thanks.invalidwrote:
James Kanze wrote:
On Nov 8, 12:27 am, user923005 <dcor...@connx.comwrote:
On Nov 7, 2:47 pm, Juha Nieminen <nos...@thanks.invalid>
wrote: Sure. *If hash(x) == return 1; then bad behavior
is to be expected. *The assumption is of maximally
cascade free hash functions like UMAC or Bob Jenkin's
hash.
Do you have any references on those, particularly with an
implementation? *All I found for UMAC was a reference to
book, and Bob Jenkin's seems more concerned with
cryptographic hashing than anything else. *If these are
standard and widely used algorithms for look-up hashing,
I'd like to add them to my set of hashing benchmarks.
(Note that cryptographic hashing and look-up hashing have
very different constraints, and a good hash function for
one isn't necessarily a good hash function for the other.
*Also, in look-up hashing, speed counts. *Taking 10 times
more time to get 1% better distribution on the average is
a loss.)
The arena of "practical non-cryptographic hash functions" is
clearly a relatively new field.

Yes and no. *The problem has certainly been around for awhile,
but you're right that it doesn't seem to have attracted much
interest. *The published works I've seen mostly just present a
function, and claim that it is good, with no analysis, and most
of the time with no real comparitive benchmarks either.

Outside of crypto-hashes, Bob Jenkin's function and my
function, the quality of everything else out there is truly
pathetic.

As I said, I'm interested, because I've been collecting
benchmarks of various hashing functions. *Post a link to the
algorithm, and I'll add it in.

OK, I just saw a link at the end of your posting. *So I've got a
model for your implementation. *I'll add it to my tests at the
first possible occasion.

Bob Jenkins, for a long time, set the standard with his
lookup2 function (I think he wrote and publicized it in Dr.
Dobb's journal in 1996 or so.)

According to the link on your page, this is one I've already
tested. *And found it to be slower than FNV or my own hash
functions. *It is, IMHO, a typical example where being overly
complicated to match a particular machine doesn't pay. *The
distribution isn't significantly better than FNV or my own, and
in fact it takes more time (on the machines I have access to) to
calculate.

There is an updated version of Bob Jenkin's hash that is faster.
Another excellent hash is this one:
http://www.geocities.com/drone115b/Goulburn06.pdf

If you are hashing big keys, UMAC is marvelous.
http://fastcrypto.org/umac/2000/perf00.html

However, in a kind of "first attempt" I was able to design a
hash function myself that was dramatically faster and had
similar quality. *(My function passes Bob Jenkins' avalanche
test as well as my own far more stringent bit distribution and
correlation test; Bob's function performs slightly worse on my
test.) *So its clear that there is plenty of room for research
here if anyone cares to take the time or put in the effort.
Bob rewrote a version of his function, which apparently comes
much closer to the performance of my function, but I have not
gone back to check it.

What is certain is that there is a lot of folklore floating
around, and very little real research. *I'm not really much into
mathematical analysis of functions myself, so I can't do much on
that end. *My basic idea was based on linear congruential random
number generators; intuitively, it seemed to me that whatever
made a good random number generator would also make a good
hashing algorithm. *I also took into account that the execution
time of the hashing algorithm must be balanced against its
distribution characteristic; multiplying the execution time by
10 to gain 1% better distribution will result in slower
accesses, on the average. *In my case, I was (at the time)
working on a machine (an 8086) with very, very slow
multiplication, so I became intrigued with the idea of using a
Mersenne prime as the multiplier (so that the multiplication
could be converted into a shift and a subtraction). *And my
algorithm did beat out the few other examples I had at hand at
the time.

That was all a long time ago, but I've never really lost
interest in the subject. *When Peter K. Pearons published his
article "Fast Hashing of Variable-Length Text Strings" in the
CACM, I created a small test harness, and compared it with the
others; my own turned out to be several times faster. *Some time
later, someone in fr.comp.lang.c++ mentionned FNV hashing; by
that time, I had my "standard" benchmark harness designed, so I
whipped up a benchmark with that and all of the others I could
find, and wrote up the results inhttp://kanze.james.neuf.fr/code/Docs/html/Hashcode.html. *Since
then, I've modified my harness to use my own AssocArray class,
rather than the g++ hash_map (so that I could also test with Sun
CC, VC++, etc.), and have added quite a few additional
algorithms. *The fundamental results haven't changed, however;
either a FNV or my Mersenne prime function are always amongst
the fastest (depending on multiplication speed and probably a
number of other factors I'm not aware of). *The Pearson and the
Jenkens functions (at least the variants I'm using) are,
depending on the data, between 20% and 35% slower.

I use frog.cpp as my testing harness. Is your hash testing harness
code available?

I've since extended the benchmark program to support
instrumentation. *A quick check showed very little difference in
the actual distributions, so the difference is due to the fact
that the Mersenne prime function or FNV are faster to calculate.

Bob and I took different approaches. *He started with
something cryptographic, and knocked it down to a point where
it was faster, though losing the cryptographic properties that
were no longer required. *I instead took some ideas of his,
and built a function from the ground up that has no pedigree
from any cryptographic function. *My design took modern CPU
architecture into account as well as trying to get to "the
heart of the matter" for hash function quality requirements.
So I started with a framework which I knew would deliver a
high performance result and injected quality into it.
The key point behind both our functions is that they are not
only good quality, they are objectively faster than all the
other typically used hash functions (CRC, Dan Bernstein's
ASCII hash, FNV hash, etc). *The only things that even compete
with our functions are things people already know are
worthless (like just summing up the bytes.)

It's interesting to note that on a modern machine, FNV or my own
functions do not take any more time than just summing the bytes,
but result in a very good distribution.

FNV is not as good as some of the others. It cascades a bit.

I'm pretty sure Jenkin's hashing function (which uses the
same algorithm as his ISAAC random number generator) is
faster than anything you could ever create yourself, having
even a fraction of the same quality.
Is that a commentary on what you think of James Kanze's
abilities, or are you just indirectly praising people like Bob
Jenkins and myself as being some kind of untouchable
uber-programmers? *If the latter, I would say its likely
unwarranted.

I think we both agree that the final word hasn't been written.
I'm curious to see how you code does, however, because on your
web page, you say you find that Bob's function is faster than
FNV, on an Athlon. *Where as I find just the opposite, on a wide
variety of machines (Sun Sparc, some older Intel, and now on
both Intel and AMD based Linux boxes.)

FWIW, I just reran one set of tests on my machine here, an AMD
64 bit machine. *Jenkins is almost exactly the same as FNV, and
slightly slower than my Mersenne prime hash using 2^7-1 as
multiplier. *This was for a set of 8554 symbols extracted from
my code. *A second trial with 1259 URL's (extracted from
somewhere, I forget where), did show Jenkins as slightly better.
So maybe it depends on typical length; the URL's are longer, on
the average, than my program symbols.

Is your Mersenne prime hash based on the Mersenne twister RNG or are
just just big Mersenne primes as a large prime for modulus operations
with a very long period?

At any rate, my recommendation still stands: Mersenne primes
with 2^7-1 as multiplier. *That seems to give the best results
over a wide variety of data and hardware. *But if your algorithm
is significantly faster than Jenkins, it's definitely worth
looking at. *I'll add it to my tests.

This problem is wide open for anyone with good math/programming
skills with a good rough idea about modern CPU performance.
*Specifically: the mantle is up for grabs for anyone to come up
with a good *64 bit* hash function. *Ironically, my gut feeling
is that it should be possible to come up with a function nearly
twice as fast as mine if you can make the assumption that your
platform natively supports 64 bit scalars (which modern systems
now do.)

Note that on a modern CPU, I would expect the byte accesses in
FNV or my own algorithms to have very little impact, since the
entire inner loop will be in cache, all intermediate variables
in registers, so the only memory access will be the characters
in the string, and the CPU will find the data in its pipeline
for all of the reads except for the first in each cache line.

So I think that the word length factor is really a red herring.

I think it is a good idea to test everything, and then later on retest
it all because assumptions are based on models that can change over
time.

Nov 11 '08 #102

CBFalconer

Ben Bacarisse wrote:

CBFalconer <cb********@yahoo.comwrites:

.... snip ...

>>
You are very wrong. In C, n^2 is n XOR 2, and O(n^2) is quite
fast. But, if we use reasonable notation and ask about O(n*n)
we find that to be much slower than O(n log n).

Are you just repeating your now rather tired joke? If so, fine,
I'll leave it, but if not, do you really dispute that the time
complexity of merge sort (typically measured as the comparisons
made) is also O(n*n)?

It was no joke. This being c.l.c, we tend to use C notation.

Yes, I do dispute it. The speed of mergesort is O(N log N). You
can test it yourself. Try the wdfreq.c exercise in the hashlib
distribution. That does two mergesorts of the output list, just to
demonstrate the advantage of a stable sort. Time it on various
input files. The process of extracting the data from the file is
O(1) per item, i.e. O(M), where M is the number of words, counting
duplicates. The sort is O(N log N), where N is the number of
items, eliminating duplicates. The numbers are displayed in the
output file. When the input file is n869.txt those numbers are:
143209 words, 3910 entries. (M and N respectively).

Use: "wdfreq <n869.txt >junk.txt" followed by "less <junk.txt".

Find other input files to check the tendency.

O(N log N) is much faster than O(N * N).

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Nov 11 '08 #103

James Kanze

On Nov 11, 4:30*pm, c...@tiac.net (Richard Harter) wrote:

On Tue, 11 Nov 2008 06:15:20 -0800 (PST), James Kanze

<james.ka...@gmail.comwrote:
On Nov 11, 11:54=A0am, Pete Becker <p...@versatilecoding.com>
wrote:

It's well understood what sort of input causes O(n^2) behavior.

Really? *If I post a quicksort implementation here, could you
give me an algorithm which would generate the worst case? (I
guess there should a smiley here, because I don't think that
that's really what you meant. *But just a half of one,
because I'd really like to find such. *There are times where
you do want to test worst case.)

There is a published procedure that is very general that is one
the web somewhere that I can't find at the moment. *The essence,
though, is that you use an oracle. *It works like this: *At each
step you tell me how you're going to pick your pivot, i.e., what
O(1) elements you are going to look at to choose the pivot. *I
get to choose the values of the all of the rest. *I choose them
so that they are all bigger (smaller) than your pivot. *I don't
actually have to set their values until you look at them; I just
have to place bounds on them.

I'd vaguely thought of something like this. But it doesn't
really help for creating a test case. Especially if you use
rand() to choose the pivot.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 11 '08 #104

Harald van =?UTF-8?b?RMSzaw==?=

On Tue, 11 Nov 2008 16:47:49 -0500, CBFalconer wrote:

Ben Bacarisse wrote:
>CBFalconer <cb********@yahoo.comwrites:
>>But, if we use reasonable notation and ask about O(n*n) we find that
to be much slower than O(n log n).

do you really dispute that the time complexity of
merge sort (typically measured as the comparisons made) is also O(n*n)?

Yes, I do dispute it.

Then you should look up what exactly O(n*n) means. It isn't what you think
it is.

Nov 11 '08 #105

Antoninus Twink

On 11 Nov 2008 at 21:47, CBFalconer wrote:

O(N log N) is much faster than O(N * N).

Worthy to be a sig, I think...

Despite the patient explanations of half a dozen different people in
this thread, it's obvious that CBF is never going to grok the point
being made. As in so many other things, he is so pig-headed and
obstinate is his ignorance that he's simply unteachable.

Nov 11 '08 #106

CBFalconer

James Kanze wrote:

Pete Becker <p...@versatilecoding.comwrote:

>It's well understood what sort of input causes O(n^2) behavior.

Really? If I post a quicksort implementation here, could you
give me an algorithm which would generate the worst case?
(I guess there should a smiley here, because I don't think that
that's really what you meant. But just a half of one, because
I'd really like to find such. There are times where you do want
to test worst case.)

It exists and is available. I thought I had it here, but I can't
find it. I did try it out, and it works. Hopefully someone else
will come up with the right reference.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Nov 11 '08 #107

CBFalconer

James Kanze wrote:

Paul Hsieh <websn...@gmail.comwrote:

.... snip ...

>
>Outside of crypto-hashes, Bob Jenkin's function and my
function, the quality of everything else out there is truly
pathetic.

As I said, I'm interested, because I've been collecting
benchmarks of various hashing functions. Post a link to the
algorithm, and I'll add it in.

One of the original purposes of hashlib was to make testing hash
functions easy. It is much more than just that now, but still
fills the purpose. hashlib includes a couple of very simple, and
fast, hashes for text strings as a convenience for users. All
available on my site.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Nov 11 '08 #108

James Kanze

On Nov 11, 8:16*pm, user923005 <dcor...@connx.comwrote:

On Nov 11, 8:02*am, James Kanze <james.ka...@gmail.comwrote:

[...]

According to the link on your page, this is one I've already
tested. *And found it to be slower than FNV or my own hash
functions. *It is, IMHO, a typical example where being
overly complicated to match a particular machine doesn't
pay. *The distribution isn't significantly better than FNV
or my own, and in fact it takes more time (on the machines I
have access to) to calculate.

There is an updated version of Bob Jenkin's hash that is faster.

Now he tells me:-). I've just finished adding Paul's algorithm
(and I'll quote the results and comment on them below).

Another excellent hash is this
one:http://www.geocities.com/drone115b/Goulburn06.pdf

If you are hashing big keys, UMAC is
marvelous.http://fastcrypto.org/umac/2000/perf00.html

Two more to add. You're mention of big keys is significant,
however, see below.

[...]

I use frog.cpp as my testing harness. *Is your hash testing
harness code available?

It's at my site, in the Test subsystem. While I won't swear to
it, I don't thing anything has changed in it since I put it up.
See http://kanze.james.neuf.fr/code-en.html. (The easiest
solution might be to simply download the lot, and browse around
in it. If all you're interested in is the harness, however, it
is in Test, and it only depends on Global---no other subsystems,
and no other component in test.)

[...]

It's interesting to note that on a modern machine, FNV or my
own functions do not take any more time than just summing
the bytes, but result in a very good distribution.

FNV is not as good as some of the others. *It cascades a bit.

I'm not sure. It seems to give very good results with my test
data.

[...]

Is your Mersenne prime hash based on the Mersenne twister RNG
or are just just big Mersenne primes as a large prime for
modulus operations with a very long period?

No. It's actually based on linear congruential random number
generation: x[i] = (a * x[i-1] + b) % m. Where m is 2^n (not a
good idea of random number generation, but it doesn't seem to be
a problem here, and b is actually the next character, rather
than a constant. (FNV uses more or less the same principle,
except that it xor's in the next character.) The Mersenne prime
part is that a is a Mersenne prime. If you think about it, it's
important for a to be prime with respect to the size of your
table, so a prime number is indicated. And a Mersenne prime has
the advantage that it can be calculated with a simple shift and
subtraction if multiplication is expensive.

[...]

So I think that the word length factor is really a red herring.

I think it is a good idea to test everything, and then later
on retest it all because assumptions are based on models that
can change over time.

Several things are clear, and one is that the behavior will
depend on hardware. Which evolves, so yesterday's answer might
not hold today.

OK, for the results of my tests. Two comments to begin with,
though:

1. I did my tests using std::string as a key, which definitly
favors character by character access though, at least for
clarity. I used std::string::const_iterator throughout,
reading four bytes and assembling them into a uint32_t when
the algorithms called for it. In the case of Paul's and
Bob's algorithms, which are based on reading four bytes at
a time, I also implemented a hacked version (marked (opt),
for optimized), below, which called std::string::data() to
get the pointer, and did reinterpret_cast it to uint32_t (or
uint16_t, in the case of Paul's), to see what different that
made.

2. Which leads me to the second remark: both Paul's and Bob's
algorithms core dump for certain input on my usual platform
(Sun Sparc), because they don't ensure correct alignment;
incorrect alignment will also cause them to run considerably
slower on an Intel or AMD platform. In my case, the actual
implementations of std::string I use (g++, Sun CC, and once
I get the library back up and running with it, VC++) all
happen (by chance) to ensure that the pointer returned by
data() will be aligned, but I can easily imagine very viable
implementations where this is NOT the case. As such, I
would consider either implementation as unusable except in
certain cases, unless you do as I did above, and reassemble
the bytes.

Anyway, with excuses for the HTML (that's what my scripts
generate), and the fact that it probably won't look that good
(since the generated tables are meant to be incorporated into
pages with a CSS header), here are the complete results, for a
Linux based machine running on an AMD 64 X2 5200+ machine,
compiled with g++ 4.2.1, -O3:

<table border=3>
<tr>
<th> </th>
<th>75URLs</th>
<th>1275URLs</th>
<th>ProgSyms</th>
<th>FrWords</th>
<th>Constructed</th>
<th>Dense</th>
</tr>
<tr>
<td class="tcolHeader">sorted vector</td>
<td class="tFData">0.227</td>
<td class="tFData">0.430</td>
<td class="tFData">0.379</td>
<td class="tFData">0.976</td>
<td class="tFData">2.054</td>
<td class="tFData">0.351</td>
</tr>
<tr>
<td class="tcolHeader">std::map</td>
<td class="tFData">0.249</td>
<td class="tFData">0.467</td>
<td class="tFData">0.511</td>
<td class="tFData">1.319</td>
<td class="tFData">2.784</td>
<td class="tFData">0.465</td>
</tr>
<tr>
<td class="tcolHeader">FNVHash</td>
<td class="tFData">0.150</td>
<td class="tFData">0.163</td>
<td class="tFData">0.147</td>
<td class="tFData">0.296</td>
<td class="tFData">0.339</td>
<td class="tFData">0.106</td>
</tr>
<tr>
<td class="tcolHeader">FNVHash (opt.)</td>
<td class="tFData">0.149</td>
<td class="tFData">0.167</td>
<td class="tFData">0.147</td>
<td class="tFData">0.295</td>
<td class="tFData">0.338</td>
<td class="tFData">0.106</td>
</tr>
<tr>
<td class="tcolHeader">Java hash</td>
<td class="tFData">0.147</td>
<td class="tFData">0.168</td>
<td class="tFData">0.145</td>
<td class="tFData">0.296</td>
<td class="tFData">0.329</td>
<td class="tFData">0.131</td>
</tr>
<tr>
<td class="tcolHeader">Mersenne 2^7-1</td>
<td class="tFData">0.147</td>
<td class="tFData">0.167</td>
<td class="tFData">0.150</td>
<td class="tFData">0.297</td>
<td class="tFData">0.375</td>
<td class="tFData">0.106</td>
</tr>
<tr>
<td class="tcolHeader">Mersenne 2^7-1 (opt.)</td>
<td class="tFData">0.147</td>
<td class="tFData">0.166</td>
<td class="tFData">0.147</td>
<td class="tFData">0.296</td>
<td class="tFData">0.375</td>
<td class="tFData">0.105</td>
</tr>
<tr>
<td class="tcolHeader">Mersenne 2^9-1</td>
<td class="tFData">0.147</td>
<td class="tFData">0.167</td>
<td class="tFData">0.142</td>
<td class="tFData">0.296</td>
<td class="tFData">0.471</td>
<td class="tFData">0.107</td>
</tr>
<tr>
<td class="tcolHeader">Mersenne 2^11-1</td>
<td class="tFData">0.147</td>
<td class="tFData">0.167</td>
<td class="tFData">0.143</td>
<td class="tFData">0.297</td>
<td class="tFData">3.613</td>
<td class="tFData">0.106</td>
</tr>
<tr>
<td class="tcolHeader">Sedgwick</td>
<td class="tFData">0.181</td>
<td class="tFData">0.182</td>
<td class="tFData">0.152</td>
<td class="tFData">0.299</td>
<td class="tFData">0.346</td>
<td class="tFData">0.107</td>
</tr>
<tr>
<td class="tcolHeader">Sobel</td>
<td class="tFData">0.163</td>
<td class="tFData">0.182</td>
<td class="tFData">0.151</td>
<td class="tFData">0.300</td>
<td class="tFData">0.360</td>
<td class="tFData">0.128</td>
</tr>
<tr>
<td class="tcolHeader">Weinberger</td>
<td class="tFData">0.246</td>
<td class="tFData">0.262</td>
<td class="tFData">0.165</td>
<td class="tFData">0.313</td>
<td class="tFData">0.372</td>
<td class="tFData">0.184</td>
</tr>
<tr>
<td class="tcolHeader">K&R</td>
<td class="tFData">0.178</td>
<td class="tFData">0.195</td>
<td class="tFData">0.151</td>
<td class="tFData">0.300</td>
<td class="tFData">0.329</td>
<td class="tFData">0.105</td>
</tr>
<tr>
<td class="tcolHeader">Open SDBM</td>
<td class="tFData">0.165</td>
<td class="tFData">0.182</td>
<td class="tFData">0.152</td>
<td class="tFData">0.301</td>
<td class="tFData">0.368</td>
<td class="tFData">0.107</td>
</tr>
<tr>
<td class="tcolHeader">Bernstein</td>
<td class="tFData">0.147</td>
<td class="tFData">0.166</td>
<td class="tFData">0.147</td>
<td class="tFData">0.296</td>
<td class="tFData">0.366</td>
<td class="tFData">0.129</td>
</tr>
<tr>
<td class="tcolHeader">Knuth</td>
<td class="tFData">0.134</td>
<td class="tFData">0.153</td>
<td class="tFData">0.148</td>
<td class="tFData">0.296</td>
<td class="tFData">0.361</td>
<td class="tFData">0.131</td>
</tr>
<tr>
<td class="tcolHeader">Partow</td>
<td class="tFData">0.181</td>
<td class="tFData">0.197</td>
<td class="tFData">0.155</td>
<td class="tFData">0.303</td>
<td class="tFData">0.360</td>
<td class="tFData">0.110</td>
</tr>
<tr>
<td class="tcolHeader">Pearson</td>
<td class="tFData">0.434</td>
<td class="tFData">0.456</td>
<td class="tFData">0.224</td>
<td class="tFData">0.384</td>
<td class="tFData">0.413</td>
<td class="tFData">0.153</td>
</tr>
<tr>
<td class="tcolHeader">Jenkins</td>
<td class="tFData">0.149</td>
<td class="tFData">0.168</td>
<td class="tFData">0.157</td>
<td class="tFData">0.309</td>
<td class="tFData">0.362</td>
<td class="tFData">0.147</td>
</tr>
<tr>
<td class="tcolHeader">Jenkins (opt.)</td>
<td class="tFData">0.135</td>
<td class="tFData">0.153</td>
<td class="tFData">0.156</td>
<td class="tFData">0.307</td>
<td class="tFData">0.359</td>
<td class="tFData">0.147</td>
</tr>
<tr>
<td class="tcolHeader">Hsieh</td>
<td class="tFData">0.129</td>
<td class="tFData">0.150</td>
<td class="tFData">0.156</td>
<td class="tFData">0.299</td>
<td class="tFData">0.870</td>
<td class="tFData">0.140</td>
</tr>
<tr>
<td class="tcolHeader">Hsieh (opt).</td>
<td class="tFData">0.121</td>
<td class="tFData">0.143</td>
<td class="tFData">0.157</td>
<td class="tFData">0.296</td>
<td class="tFData">0.868</td>
<td class="tFData">0.138</td>
</tr>
<tr>
<td class="tcolHeader">CRC</td>
<td class="tFData">0.249</td>
<td class="tFData">0.270</td>
<td class="tFData">0.181</td>
<td class="tFData">0.325</td>
<td class="tFData">0.381</td>
<td class="tFData">0.143</td>
</tr>
<tr>
<td class="tcolHeader">CRC (opt.)</td>
<td class="tFData">0.250</td>
<td class="tFData">0.270</td>
<td class="tFData">0.178</td>
<td class="tFData">0.325</td>
<td class="tFData">0.383</td>
<td class="tFData">0.142</td>
</tr>
</table>

The times are in microseconds per access.

There's a lot more that could be said, but at the least, it's
important to describe the data sets:

75URLs: 73 URL's, taken from the history file of my browser at
one particular occasion.

1275URLs: 1259 URL's, from all of the files in my browser's
cache, at the same occasion.

(The numbers in the names of
these two represent the original number of entries, before I
realized that there were duplicates, and eliminated them.)

ProgSyms: The set of all symbols and numeric literals in my code
base (the one posted at my site, I think), 8554 entries.

FrWords: A set of French words, extracted from the sources for
the French ispell tables, 47451. (Note that this is the
only table with anything other than standard ASCII; it's in
ISO 8859-1, with accented characters.)

Constructed: An artificially constructed set of all of the words
from x000000 to x999999 (for exactly one million entries---I
wanted something big, with a lot of close matches).

Dense: Another artificially constructed set, will all two
character strings consisting of printable ASCII characters,
so 95*95 = 8836 entries. (This one was constructed
intentionally to display a theoretical weakness of the Java
hash code, which is in fact my Mersenne prime algorithm
using 2^5-1.)

About the only comment I'll make on the results is that with the
exception of the two sets of URL's, most of my strings are
fairly short; I'll write up an AWK script to calculate the
average, median, max and min lengths (although for the last two,
they're trivial), but off hand, ProgSym is the only other one
which contains a significant number of entries with more than,
say 10 characters. (Some of the URL's are also pretty short,
however.) This alone could account for the differences in my
measurements and Paul's; obviously, for very short strings,
there can be nothing gained by reading four characters at a
time. At any rate, it's precisely the two sets with the URL's
in which Paul's and Bob Jenken's algorithms come out best.

Which brings me back to what is probably the most important
point: which hash code is best will depend on your data sets,
the underlying hardware and the compiler, and (one thing we've
not mentionned) how the hash table works. (The above
measurements were done using my own AssocArrayOf class, also
present at my site. A class using a different growth strategy
or collision handling may perform differently.) IMHO, the
Mersenne 2^7-1 algorithm has the advantage of being among the
best pretty much everywhere, and is dead simple to implement, so
I'd stick with it in general. But if your profiler says that
hash table accesses are an issue, it's definitely worth trying
some of the others.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 11 '08 #109

James Kanze

On Nov 11, 11:27*pm, CBFalconer <cbfalco...@yahoo.comwrote:

James Kanze wrote:
Paul Hsieh <websn...@gmail.comwrote:

... snip ...

Outside of crypto-hashes, Bob Jenkin's function and my
function, the quality of everything else out there is truly
pathetic.

As I said, I'm interested, because I've been collecting
benchmarks of various hashing functions. *Post a link to the
algorithm, and I'll add it in.

One of the original purposes of hashlib was to make testing hash
functions easy. *It is much more than just that now, but still
fills the purpose. *hashlib includes a couple of very simple, and
fast, hashes for text strings as a convenience for users. *All
available on my site.

If they're already widely known hashing algorithms, I've
probably already gotten them. But it might be interesting to
compare our results---I'll just make a guess (:-)) that your
library is written in C, so it will definitely have a different
implementation from mine.

(Just curious about one thing: how do you benchmark. I use
virtual functions to impose a firewall on optimization; do you
use pointers to functions, or some other technique?)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 11 '08 #110

Richard Harter

On Tue, 11 Nov 2008 13:52:58 -0800 (PST), James Kanze
<ja*********@gmail.comwrote:

>On Nov 11, 4:30=A0pm, c...@tiac.net (Richard Harter) wrote:
>On Tue, 11 Nov 2008 06:15:20 -0800 (PST), James Kanze

><james.ka...@gmail.comwrote:
>On Nov 11, 11:54=3DA0am, Pete Becker <p...@versatilecoding.com>
wrote:

>It's well understood what sort of input causes O(n^2) behavior.

>Really? =A0If I post a quicksort implementation here, could you
give me an algorithm which would generate the worst case? (I
guess there should a smiley here, because I don't think that
that's really what you meant. =A0But just a half of one,
because I'd really like to find such. =A0There are times where
you do want to test worst case.)

>There is a published procedure that is very general that is one
the web somewhere that I can't find at the moment. =A0The essence,
though, is that you use an oracle. =A0It works like this: =A0At each
step you tell me how you're going to pick your pivot, i.e., what
O(1) elements you are going to look at to choose the pivot. =A0I
get to choose the values of the all of the rest. =A0I choose them
so that they are all bigger (smaller) than your pivot. =A0I don't
actually have to set their values until you look at them; I just
have to place bounds on them.

I'd vaguely thought of something like this. But it doesn't
really help for creating a test case. Especially if you use
rand() to choose the pivot.

Generate the random sequence in advance; then the oracle will
know what decisions you are going to make.
Richard Harter, cr*@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.

Nov 11 '08 #111

robertwessel2

On Nov 11, 8:15*am, James Kanze <james.ka...@gmail.comwrote:

On Nov 11, 11:54*am, Pete Becker <p...@versatilecoding.comwrote:

It's well understood what sort of input causes O(n^2) behavior.

Really? *If I post a quicksort implementation here, could you
give me an algorithm which would generate the worst case?
(I guess there should a smiley here, because I don't think that
that's really what you meant. *But just a half of one, because
I'd really like to find such. *There are times where you do want
to test worst case.)

David Musser's original paper on Introsort ("Introspective Sorting and
Selection Algorithms") provides a surprisingly simple algorithm that
produces worst case (or at least quadratic) behavior in median-of-
three Quicksort. The paper is worth reading just for that.

http://www.cs.rpi.edu/~musser/gp/introsort.ps

Nov 11 '08 #112

Phil Carmody

CBFalconer <cb********@yahoo.comwrites:

Ben Bacarisse wrote:
>CBFalconer <cb********@yahoo.comwrites:

... snip ...

>>>
You are very wrong. In C, n^2 is n XOR 2, and O(n^2) is quite
fast. But, if we use reasonable notation and ask about O(n*n)
we find that to be much slower than O(n log n).

Are you just repeating your now rather tired joke? If so, fine,
I'll leave it, but if not, do you really dispute that the time
complexity of merge sort (typically measured as the comparisons
made) is also O(n*n)?

It was no joke. This being c.l.c, we tend to use C notation.

It was a joke. You were called on it. You didn't man up to it.
Bad show.

(Or you've flipped, that's the alternative.)

Yes, I do dispute it.

In that case, you're misinformed.

The speed of mergesort is O(N log N).

Correct! Hoorah, a nugget of correctness in a swamp of stuff
that could be otherwise described.

That of course means it's also O(N^2), O(N**2), O(pow(N,2)), O(N*N),
and of course O(exp(2*log(N))), or, heaven help us, O(N<sup>2</sup>).

You
can test it yourself. Try the wdfreq.c exercise in the hashlib
distribution.

That library's becoming a broken record. (Should I have used the
past tense rather than the present progressive, I dunno?)

O(N log N) is much faster than O(N * N).

Nope, O(N log N) contains O(N*N). If you wish to dabble in fields
mathematics, then at least be mathematical.

If you can do one thing before you next post, please may it be the
difference between O() and Theta().

Phil
--
I tried the Vista speech recognition by running the tutorial. I was
amazed, it was awesome, recognised every word I said. Then I said the
wrong word ... and it typed the right one. It was actually just
detecting a sound and printing the expected word! -- pbhj on /.

Nov 12 '08 #113

CBFalconer

James Kanze wrote:

>

.... snip ...

>
If they're already widely known hashing algorithms, I've
probably already gotten them. But it might be interesting to
compare our results---I'll just make a guess (:-)) that your
library is written in C, so it will definitely have a different
implementation from mine.

(Just curious about one thing: how do you benchmark. I use
virtual functions to impose a firewall on optimization; do you
use pointers to functions, or some other technique?)

I suspect you do have them. However, the installation is dead
simple. Just open the hashtable with pointers to the hash
functions (and a few other things). The documentation explains
all.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Nov 12 '08 #114

Ben Bacarisse

CBFalconer <cb********@yahoo.comwrites:

Ben Bacarisse wrote:
>CBFalconer <cb********@yahoo.comwrites:

... snip ...

>>>
You are very wrong. In C, n^2 is n XOR 2, and O(n^2) is quite
fast. But, if we use reasonable notation and ask about O(n*n)
we find that to be much slower than O(n log n).

Are you just repeating your now rather tired joke? If so, fine,
I'll leave it, but if not, do you really dispute that the time
complexity of merge sort (typically measured as the comparisons
made) is also O(n*n)?

It was no joke. This being c.l.c, we tend to use C notation.

OK.

Yes, I do dispute it. The speed of mergesort is O(N log N).

Are any of these identifiers macros? If not, then this is not C
notation either. Maybe you meant O(N * log(N))?

You
can test it yourself. Try the wdfreq.c exercise in the hashlib
distribution. That does two mergesorts of the output list, just to
demonstrate the advantage of a stable sort. Time it on various
input files.

Tests tell you what happens for that particular data set on some
particular system. You can only guess at the asymptotic bounds of an
algorithm from such tests. However, based on analysis of merge sort I
am as sure as you are that its time complexity is O(N * log(N)). That
means it is also O(N * N) as I said. If you still dispute it, maybe
you could say why rather the just stating that I am wrong (I often am,
but I am pretty sure about this).

<snip>

O(N log N) is much faster than O(N * N).

This is so informal as to be virtually meaningless. By the way, is
this all not off-topic? I am more than happy to drop it (or move it
to comp.programming or comp.theory).

--
Ben.

Nov 12 '08 #115

Keith H Duggar

Juha Nieminen wrote:

However, apparently the "in average" in this particular context means
something slightly different. More precisely, the function f(n) is
defined as:

f(n) = the average amount of steps the algorithm performs for an input
of size n (with all possible different inputs of that size)

In this case the f(n) function can indeed have a smaller upper bound
than the worst case for the algorithm in question.

I think the "in average" is a bit confusing.

That is one purpose of education: to formalize and normalize
vocabulary
and understanding. It exposes students to common conventions that
enable
them to communicate on a common ground. Unfortunately many internet
folk
mistakenly believe wikipeducation is a substitute for education and
that
vociferous noise is a substitute for listening and critical thinking.

They defend their misinformed ignorant view ad nauseam. They claim
vast
experience when in truth they have none. In the end if some modicum of
intelligence finally motivates them to capitulate, their pride does
not
allow them to admit "I was wrong. I learned something. Thank you!".
No.
Instead they attribute their "arguments" to a language barrier or to
"confusing" conventions, etc.

KHD

Nov 12 '08 #116

Richard Heathfield

CBFalconer said:

Ben Bacarisse wrote:

<snip>

>>
Are you just repeating your now rather tired joke? If so, fine,
I'll leave it, but if not, do you really dispute that the time
complexity of merge sort (typically measured as the comparisons
made) is also O(n*n)?

It was no joke. This being c.l.c, we tend to use C notation.

Yes, I do dispute it. The speed of mergesort is O(N log N).

No, it isn't. C has no standard function named O (and you didn't provide
the source for O). It does, however, have a function named log, and if
you're going to use its name in that context, I'd be very interested to
see the #define for N that makes it legal.

Note, finally, that the C Standard has nothing to say about the speed of
mergesort.

Note, finally finally, that if you're going to play such games with such a
straight face, you ought at least to play them properly.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Nov 12 '08 #117

Old Wolf

On Nov 7, 7:00*am, s0s...@gmail.com wrote:

But, as others have pointed out, it's clear that the reason for the
difference in performance is the searching mechanism.

Another reason for slowdown may be that you
malloc for every single word read. If the C++
program is using the small string optimization
then it won't need to malloc anything except
the new set node, for short words.

I suspect you could improve performance
(and memory consumption!) a lot with improved
allocation strategies too. Two mallocs per word
is a lot of work.

Nov 12 '08 #118

Kai-Uwe Bux

Jerry Coffin wrote:

In article <49***********************@read.cnntp.org>,
jk********@gmx.net says...

[ ... ]

>Leaving out the scanning forward and backward part (swapping offending
pairs), you are missing the most important point about quicksort, namely
the one that explains the speed. In terms of number of comparisons,
quicksort is not doing better than, say, heapsort. However, when it comes
to moving elements around in memory, quicksort has a huge advantage.
Experimentally, the number of moves is one place quicksort beats the
other algorithms whereas the number of comparisons is not. Declaring the
very feature of quicksort that makes this happen to be a minor issue
seems weird.

Of course, that only applies to sorting sequences. For linked structures,
the number of pointer operations would be relevant.

I'll tell you what: if you're really convinced that scanning forward and
backward, and swapping elements in the list will reduce the number of
pointer operations, why don't you implement both and find out?

[snip]

I think, I did not write that. I am sorry if the second paragraph is taken
to say that statements, which pertain to quicksort, transfer to sort
algorithms for linked lists. This was not intended.
Best

Kai-Uwe Bux

Nov 13 '08 #119

Kai-Uwe Bux

Juha Nieminen wrote:

Kai-Uwe Bux wrote:
>My linguistic intuitions differ: Quicksort _is_ the algorithm published
as Algorithm 64 (and 63) by C.A.R. Hoare in the Communications of the
ACM. That algorithm sorts a sequence in place and is not stable. It makes
2n*log(n) comparisons on average and (1/3)n*log(n) exchanges on average.
Anything that is stable or deals with linked lists is an implementation
of a _different_ algorithm (and probably has different average
complexities).

Why? Although it's a surprisingly common misconception that quicksort
requires random access, it doesn't. Quicksort is implemented in terms of
purely linear traversals of the array. Thus it's perfectly possible to
implement the *exact* same algorithm for a doubly-linked list (which can
be traversed forwards and backwards).

Right. I missed the case of a doubly linked list. I was thinking singly
linked lists where there is an obvious partition-recurse scheme that would
not be quicksort. (Although it would be quick on average and quadratic in
the worst case).
Best

Kai-Uwe Bux

Nov 13 '08 #120

Kai-Uwe Bux

James Kanze wrote:

On Nov 10, 3:48*am, Kai-Uwe Bux <jkherci...@gmx.netwrote:
>Jerry Coffin wrote:
In article <49163374.B0380...@yahoo.com>, cbfalco...@yahoo.com says...

[ ... ]

>The result is that for most cases quicksort will be the
fastest sort. *Right behind it are other O(nLOGn) methods.
I prefer mergesort and data input in lists, because then I
don't have to worry about the size to be sorted. *In
addition, mergesort is stable (quicksort is not).

The most common implementation of Quicksort for arrays is
unstable -- but a Quicksort on an array _can_ be written to
be stable if you want to badly enough, and for a linked
list, it's quite easy to make it stable.

>My linguistic intuitions differ: Quicksort _is_ the algorithm
published as Algorithm 64 (and 63) by C.A.R. Hoare in the
Communications of the ACM. That algorithm sorts a sequence in
place and is not stable. It makes 2n*log(n) comparisons on
average and (1/3)n*log(n) exchanges on average. Anything that
is stable or deals with linked lists is an implementation of a
_different_ algorithm (and probably has different average
complexities).

The algorithm presented by Hoare in Algorithm 64 is very
general; it only refers to algorithm 63, and presumably,
anything that achieves a partition with similar properties would
be acceptable.

>I am not saying that there are no refinements or other
algorithms based on similar ideas. Some of these algorithms
are stable and some apply to linked lists. Also, I am not a
native speaker of English. Nonetheless, I feel that algorithms
have a certain identity (which makes it meaningfull to say
that a certain sorting algorithm is stable); and that
implementations of an algorithm must preserve that identity
and the associated characteristics to count as implementations
_of_ that algorithm.

So you're saying that the implementation proposed by Jon Bentley
(section 10.2 of _Programming_ _Pearls_, which is just a reprint
of one of his columns in the CACM) isn't a quick sort, although
he claims it to be one. (I think his algorithm is actually
stable, although I haven't verified; it may have problems if
there are objects equal to the pivot. But I think it could
easily be fixed.)

I don't have access to this one, so I cannot tell. I will hit a library
shortly and hunt it down -- it sounds interesting: I have a hard time
seeing a stable and fast version of the partition-recurse idea.

My understanding is that the term quick sort is applied to all
algorithms that partition, then sort each partition recursively.

I only gave my linguistic intuitions, and they differ. The way I see the
literature is way more specific about quicksort and analyses of that
algorithm have been put forth, which do not apply to other algorithms that
also exploit the partition-recurse idea.

Of course, sometimes algorithms are so closely related that you would call
one a variation of the other. In particular, sometimes the analysis for one
algorithm can be used to start the analysis of another. Nonetheless, I feel
that saying some sorting procedure is stable rules out the existence of
unstable implementations and vice versa.

Of course, it could be that most people have a wider notion of quicksort
and, as you pointed out, might consider any partition-recurse algorithm a
variant of quicksort.

(Also, of course, it's trivial to implement quick sort for a
list; just implement [] in terms of std::advance. Of course,
the adjective "quick" might not be so appropriate then, but the
algorithm stands.)

Nonetheless, the underlying data structure does matter. For instance, the
median-of-three refinement is hard to do on lists. Also, choosing the pivot
randomly is hard on lists, essentially making the average runtime
quadratic. So, at least performance characteristics of a sorting routine
can vary with underlying data structure.
Best

Kai-Uwe Bux

Nov 13 '08 #121

Similar topics