attempt at qsort implementation

buda

I had some spare time and decided to try to implement the standard library
qsort. This is a pretty simpleminded attempt, but it seems to be working. I
have to point out that efficiency is not at all an issue for me, and this is
purely a toy, so to speak. I'm quite aware that this is not the quickest way
to implement the quicksort algorithm, but I basically just wanted to try to
make a "generic function". I tested it with a few arrays of structures
containing all sorts of types (this, ofcourse, doesn't mean that it's
correct, even by a long shot :). I would appreciate any and all constructive
comments. The code below is a snippet. All the necessary headers are
included and uchar is a typedef for unsigned char.
Thank you.

void do_swap_(void *a, void *b, size_t sz)
{
uchar t;
uchar *i = a;
uchar *j = b;

while (sz) {
t = *i;
*i++ = *j;
*j++ = t;
--sz;
}
}
void quick_sort(void *v, size_t count, size_t sz, int (*cmp)(const void *,
const void *))
{
uchar *t = v;
uchar *x, *y;
uchar *r;

if (count) {
x = y = t;
r = t+(count-1)*sz;
for(; y<r; y += sz)
if (cmp(y, r)<0) {
do_swap_(x, y, sz);
x += sz;
}
do_swap_(x, r, sz);
quick_sort(t, (x-t)/sz, sz, cmp);
quick_sort(x+sz, count-(x-t)/sz - 1, sz, cmp);
}
}

Nov 14 '05 #1

Subscribe Post Reply

2582

pete

buda wrote:

I had some spare time and decided to try to implement the standard library
qsort. This is a pretty simpleminded attempt, but it seems to be working. I
have to point out that efficiency is not at all an issue for me, and this is
purely a toy, so to speak. I'm quite aware that this is not the quickest way
to implement the quicksort algorithm, but I basically just wanted to try to
make a "generic function". I tested it with a few arrays of structures
containing all sorts of types (this, ofcourse, doesn't mean that it's
correct, even by a long shot :).
I would appreciate any and all constructive
comments. The code below is a snippet. All the necessary headers are
included and uchar is a typedef for unsigned char.
Thank you.
This implementation will probably run out of memory
on large already ordered arrays.
If the array is in order prior to sorting,
then (x-t)/sz will be equal to count - 1
and you'll wind up with about count - 1, levels of recursion.
If count is large, the function could crash the program.
void do_swap_(void *a, void *b, size_t sz)
{
uchar t;
uchar *i = a;
uchar *j = b;

while (sz) {
t = *i;
*i++ = *j;
*j++ = t;
--sz;
}
}
void quick_sort(void *v, size_t count,
size_t sz, int (*cmp)(const void *,
const void *))
{
uchar *t = v;
uchar *x, *y;
uchar *r;

if (count) {
x = y = t;
r = t+(count-1)*sz;
for(; y<r; y += sz)
if (cmp(y, r)<0) {
do_swap_(x, y, sz);
x += sz;
}
do_swap_(x, r, sz);
quick_sort(t, (x-t)/sz, sz, cmp);
quick_sort(x+sz, count-(x-t)/sz - 1, sz, cmp);
}
}

--
pete

Nov 14 '05 #2

Gordon Burditt

>I had some spare time and decided to try to implement the standard library

qsort. This is a pretty simpleminded attempt, but it seems to be working. I
have to point out that efficiency is not at all an issue for me, and this is
purely a toy, so to speak. I'm quite aware that this is not the quickest way
to implement the quicksort algorithm, but I basically just wanted to try to
make a "generic function".
qsort() is not required to implement any particular sort algorithm.
It's just supposed to sort. It's not specified HOW. A bubble sort
is not an invalid implementation. A recursive bogosort[*] (which
operates in time equivalent to O(e**infinity)) is probably an invalid
implementation since it NEVER finishes.
I tested it with a few arrays of structures
containing all sorts of types (this, ofcourse, doesn't mean that it's
correct, even by a long shot :). I would appreciate any and all constructive
comments. The code below is a snippet. All the necessary headers are
included and uchar is a typedef for unsigned char.
Thank you.

If count == 1, the algorithm does a compare anyway. I know you
weren't going for real efficiency here, but this seems a bit
strange.

If sz == 0, various wierd and obnoxious things will happen, but
the caller IS asking for it.

It seems like this algorithm could end up recursing a lot.
This could be a problem with large amounts of data.

One thing worth checking here is how well your algorithm will operate
with a bogus compare routine (e.g. one which always returns -1, or
one for which cmp(a,b) is not guaranteed to equal -cmp(b,a) ). Some
implementations end up going outside the passed-in array in such
circumstances.
[*] Technical note: a bogosort operates by creating all possible
orders of the sorted array, determining how many elements are out
of order, and returns the order with the least number of elements
out of order by sorting them and returning the first one. In other
words, it turns the problem of sorting N elements into one of sorting
N! elements.

A recursive bogosort uses bogosort to sort the possible orders by
number of elements out of order, thereby turning the problem of
sorting N elements into one of sorting N!!!!!!!!!!!!!!!!!!!!!!!!!!....
elements. Bogosorts typically include other performance-wasting
code such as:
while (malloc(INT_MAX)) fork();
since they will never finish anyway.

Gordon L. Burditt

Nov 14 '05 #3

pete

Gordon Burditt wrote:

I had some spare time and decided to try to
implement the standard library qsort.
qsort() is not required to implement any particular sort algorithm.
It's just supposed to sort. One thing worth checking here is how well your algorithm will operate
with a bogus compare routine (e.g. one which always returns -1, or
one for which cmp(a,b) is not guaranteed to equal -cmp(b,a) ).
Some implementations end up going outside the passed-in
array in such circumstances.

The behavior of qsort with a bogus compare routine, is undefined,
so what's the point?

--
pete

Nov 14 '05 #4

Michael Mair

>>>I had some spare time and decided to try to

implement the standard library qsort.

qsort() is not required to implement any particular sort algorithm.
It's just supposed to sort.

One thing worth checking here is how well your algorithm will operate
with a bogus compare routine (e.g. one which always returns -1, or
one for which cmp(a,b) is not guaranteed to equal -cmp(b,a) ).
Some implementations end up going outside the passed-in
array in such circumstances.

The behavior of qsort with a bogus compare routine, is undefined,
so what's the point?

Testing. You do not want to get a segfault (or its undetected friends)
when a slightly faulty cmp() is provided.

If my first concern is being sure that everything works, I usually do
that with unit tests. There, I have usually conditional code I can
switch on for "do all checks" vs. assuming the "input is alright".

If I do not care for speed, I go for the former and try to handle
faulty input as well and provide test values (or in this case test
functions) which behave extreme or wrong or are, umh, don't know
the right word, "limit" cases, where the behaviour of the function
processing the input should change radically.

The bogus cmp() helps you to find such problems which could also
arise in some complicated and thus opaque real world applications.
One nice example is always people believing that floating point
variables could be treated as real numbers -- forgetting to account
for rounding errors in the operations as well as in the
representation of a number. Then they apply some operations, still
assuming the trichotomy of order or some such and wonder why their
results really do not fit, as they did not account for seemingly
impossible cases or the arbitraryness of some decisions based on
rounded values.
Cheers
Michael

Nov 14 '05 #5

buda

"pete" <pf*****@mindspring.com> wrote in message
news:41***********@mindspring.com...

This implementation will probably run out of memory
on large already ordered arrays.
If the array is in order prior to sorting,
then (x-t)/sz will be equal to count - 1
and you'll wind up with about count - 1, levels of recursion.
If count is large, the function could crash the program.

As I said, trying to make this a useable qsort was not my intention in any
way. I'm aware of ways to improve worst case performance of the quicksort
algorithm (median of three/five, randomization...), but the sole purpose of
this attempt was to make a generic function, since I haven't done this yet
:)

Nov 14 '05 #6

Michael Wojcik

In article <cj**********@infosun2.rus.uni-stuttgart.de>, Michael Mair <ma********************@ians.uni-stuttgart.de> writes:

[Quoting pete <pf*****@mindspring.com> in message
<41**********@mindspring.com>. Please fix your newsreader to insert
proper attributions.]
The behavior of qsort with a bogus compare routine, is undefined,
so what's the point?

Testing. You do not want to get a segfault (or its undetected friends)
when a slightly faulty cmp() is provided.

There is a QoI issue here, but it's not clear that handling broken
comparison routines is the proper answer. That could have a
significant adverse effect on performance, which is probably one
of the reasons why the Standard makes a broken comparison routine
UB, rather than requiring implementations to, say, signal an error
in some fashion.

I've seen reports of commercial qsort implementations crashing if
given a broken comparison routine, and the response they've elicited
from other developers has generally been that the comparison routine
should be fixed, not that the qsort implementation should be made
more tolerant.

The C standard library generally favors efficiency over safety. We
may approve or disapprove, but trying to fix that in one particular
case is probably not worth any great effort. Someone who passes a
bad comparison routine to qsort will make a similar error with some
other library function elsewhere.

--
Michael Wojcik mi************@microfocus.com

"We are facing a dire shortage of clowns," said Erickson, also known as
Jingles.

Nov 14 '05 #7

Gordon Burditt

>> > The behavior of qsort with a bogus compare routine, is undefined,

> so what's the point?
Testing. You do not want to get a segfault (or its undetected friends)
when a slightly faulty cmp() is provided.

There is a QoI issue here, but it's not clear that handling broken
comparison routines is the proper answer. That could have a
significant adverse effect on performance, which is probably one
of the reasons why the Standard makes a broken comparison routine
UB, rather than requiring implementations to, say, signal an error
in some fashion.

Reliably detecting a broken comparison routine would probably mean
a big performance hit (e.g. running cmp(a,b) and cmp(b,a) and
checking that the results are negatives of each other, and even
that's not a thorough test). I'm not asking for that, I'm just
suggesting that arranging to not segfault would be a good idea.
And I'm not sure it means a big (or even any) performance hit to do that.

Consider the tech support issues. If it segfaulted in qsort, it's
obviously qsort's fault, right? How many times have we seen posters
claim to have found a bug in malloc() or free() when a debugger
indicates the segfault happened while executing one of those
functions?
I've seen reports of commercial qsort implementations crashing if
given a broken comparison routine, and the response they've elicited
from other developers has generally been that the comparison routine
should be fixed, not that the qsort implementation should be made
more tolerant.
But the commercial outfit's tech support line still takes a $$ hit,
doesn't it?
The C standard library generally favors efficiency over safety. We
may approve or disapprove, but trying to fix that in one particular
case is probably not worth any great effort. Someone who passes a
bad comparison routine to qsort will make a similar error with some
other library function elsewhere.

I think of this code more as user code, not something to replace a routine
in the standard library. User code should be able to handle bad input
without invoking UB.

Gordon L. Burditt

Nov 14 '05 #8

pete

buda wrote:

"pete" <pf*****@mindspring.com> wrote in message
news:41***********@mindspring.com...
This implementation will probably run out of memory
on large already ordered arrays.
If the array is in order prior to sorting,
then (x-t)/sz will be equal to count - 1
and you'll wind up with about count - 1, levels of recursion.
If count is large, the function could crash the program.
As I said, trying to make this a useable qsort
was not my intention in any way.
I'm aware of ways to improve worst case performance of the quicksort
algorithm (median of three/five, randomization...),

Those are methods of improving speed and reducing the probability
of having the sort go quadratic.
but the sole purpose of this attempt was to make a
generic function, since I haven't done this yet
:)

What you have, is a function that can crash with valid input,
and by that, I mean abnormally terminate the program.

There is a way to rewrite your function
without improving it's speed in any way,
which will prevent it from recursing nmemb levels deep.

What you have to do, is find out which partition is smaller
and recurse that one.
The larger partitions should be handled
by making the function body into a loop.

Here's an example of the variation:
q0sort will crash for large ordered arrays,
q_sort won't.

void q0sort(void *base, size_t nmemb, size_t size,
int (*compar)(const void*, const void*))
{
unsigned char *middle, *left, *last, *right;
unsigned char swap, *p1, *p2, *end;

if (nmemb > 1) {
last = middle = left = base;
right = (nmemb - 1) * size + last;
do {
middle += size;
if (compar(left, middle) > 0) {
last += size;
BYTE_SWAP(middle, last);
}
} while (middle != right);
BYTE_SWAP(left, last);
q0sort(last + size, (right - last) / size, size, compar);
q0sort(left , (last - left) / size, size, compar);
}
}

void q_sort(void *base, size_t nmemb, size_t size,
int (*compar)(const void*, const void*))
{
unsigned char *middle, *left, *last, *right;
size_t nmemb_right;
unsigned char swap, *p1, *p2, *end;

left = base;
while (nmemb > 1) {
last = middle = left;
right = (nmemb - 1) * size + last;
do {
middle += size;
if (compar(left, middle) > 0) {
last += size;
BYTE_SWAP(middle, last);
}
} while (middle != right);
BYTE_SWAP(left, last);
nmemb = (last - left) / size;
nmemb_right = (right - last) / size;
if (nmemb_right > nmemb) {
q_sort(left, nmemb, size, compar);
left = last + size;
nmemb = nmemb_right;
} else {
q_sort(last + size, nmemb_right, size, compar);
}
}
}

--
pete

Nov 14 '05 #9

pete

Michael Mair wrote:

I had some spare time and decided to try to
implement the standard library qsort.

qsort() is not required to implement any particular sort algorithm.
It's just supposed to sort.

One thing worth checking here is how well your algorithm will operate
with a bogus compare routine (e.g. one which always returns -1, or
one for which cmp(a,b) is not guaranteed to equal -cmp(b,a) ).
Some implementations end up going outside the passed-in
array in such circumstances.

The behavior of qsort with a bogus compare routine, is undefined,
so what's the point?

Testing. You do not want to get a segfault (or its undetected friends)
when a slightly faulty cmp() is provided.

I disagree. You want the segfault.
It's better to wrong than vague.

--
pete

Nov 14 '05 #10

Michael Mair

>>>The behavior of qsort with a bogus compare routine, is undefined,

so what's the point?
Testing. You do not want to get a segfault (or its undetected friends)
when a slightly faulty cmp() is provided.

Would have liked you quoting some more context; however, ...
There is a QoI issue here, but it's not clear that handling broken
comparison routines is the proper answer. That could have a
significant adverse effect on performance, which is probably one
of the reasons why the Standard makes a broken comparison routine
UB, rather than requiring implementations to, say, signal an error
in some fashion.

I've seen reports of commercial qsort implementations crashing if
given a broken comparison routine, and the response they've elicited
from other developers has generally been that the comparison routine
should be fixed, not that the qsort implementation should be made
more tolerant.

The C standard library generally favors efficiency over safety. We
may approve or disapprove, but trying to fix that in one particular
case is probably not worth any great effort. Someone who passes a
bad comparison routine to qsort will make a similar error with some
other library function elsewhere.

You are of course right in that but I was talking about code I am
producing. And I personally would be happy to have a qsort() which
does not segfault.
Perhaps I made it not clear enough in the part you snipped but
I am not demanding that to hold for the most efficient variant of
the code. Nonetheless, I like to be able to switch on sensible or
even exhaustive checks and asserting everything if I find that
something goes wrong. Personal paranoia, I guess.
In addition to that, if by testing the undesirable and unlikely I
find a way to make my code handle even that without performance
losses, then in my opinion it was worth it.
The example with the "slightly faulty" FP cmp() is not too far from
what I have already seen or even produced myself, so I would rather
be sure that some modules and functions are safe to my best
knowledge so that I can start looking for errors in the other code.
This saves time in the long run.
Cheers
Michael

Nov 14 '05 #11

Michael Mair

Cheerio pete,

The behavior of qsort with a bogus compare routine, is undefined,
so what's the point?

Testing. You do not want to get a segfault (or its undetected friends)
when a slightly faulty cmp() is provided.

I disagree. You want the segfault.
It's better to wrong than vague.

Read the stuff in the parentheses, please.
Of course you'd rather get hit on the nose by the error than
having a function doing strange stuff. But I even more do not
want to have a segfault at all when trying to do some productive
work with the code I have written.
It is downright nasty to have to adjust your code for every
new application because it exposes a new fault in your code.
This is the reason why you want to check your code even with
faulty input after having looked at all sensible checks you
could conceive.

I think this and my original post answer your original question,
so please read it in context. If we are discussing about fine
points or possible error sources, nitpicking is okay and
sometimes even important, but I do not think it is here.
--Michael

Nov 14 '05 #12

buda

"pete" <pf*****@mindspring.com> wrote in message
news:41**********@mindspring.com...

buda wrote:

"pete" <pf*****@mindspring.com> wrote in message
news:41***********@mindspring.com...
This implementation will probably run out of memory
on large already ordered arrays.
If the array is in order prior to sorting,
then (x-t)/sz will be equal to count - 1
and you'll wind up with about count - 1, levels of recursion.
If count is large, the function could crash the program.

As I said, trying to make this a useable qsort
was not my intention in any way.
I'm aware of ways to improve worst case performance of the quicksort
algorithm (median of three/five, randomization...),

Those are methods of improving speed and reducing the probability
of having the sort go quadratic.

They will also make partitioning better ('on average'), thus reducing the
size of each subarray to be sorted recursively...

but the sole purpose of this attempt was to make a
generic function, since I haven't done this yet
:)

What you have, is a function that can crash with valid input,
and by that, I mean abnormally terminate the program.

There is a way to rewrite your function
without improving it's speed in any way,
which will prevent it from recursing nmemb levels deep.

What you have to do, is find out which partition is smaller
and recurse that one.
The larger partitions should be handled
by making the function body into a loop.

[code deleted...]

This is all very true, but in light of my initial post completely
irrelevant. I suppose I asked for it by mentioning "qsort implementation",
when what I really wanted to do is implement the simplest possible quicksort
algorithm that would work for all data types. I simply wanted to avoid
excessive explanations about function parameters and such...

On a side note, your code will still inevitably crass eventually, for a big
enough array... :)

Nov 14 '05 #13

pete

buda wrote:

On a side note, your code will still inevitably crass eventually,
for a big enough array... :)

.... only if (sizeof(size_t) * CHAR_BIT)
is greater than the level of recursion available.

--
pete

Nov 14 '05 #14

attempt at qsort implementation

Similar topics