449,411 Members | 1,022 Online
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,411 IT Pros & Developers. It's quick & easy.

# attempt at qsort implementation

 P: n/a I had some spare time and decided to try to implement the standard library qsort. This is a pretty simpleminded attempt, but it seems to be working. I have to point out that efficiency is not at all an issue for me, and this is purely a toy, so to speak. I'm quite aware that this is not the quickest way to implement the quicksort algorithm, but I basically just wanted to try to make a "generic function". I tested it with a few arrays of structures containing all sorts of types (this, ofcourse, doesn't mean that it's correct, even by a long shot :). I would appreciate any and all constructive comments. The code below is a snippet. All the necessary headers are included and uchar is a typedef for unsigned char. Thank you. void do_swap_(void *a, void *b, size_t sz) { uchar t; uchar *i = a; uchar *j = b; while (sz) { t = *i; *i++ = *j; *j++ = t; --sz; } } void quick_sort(void *v, size_t count, size_t sz, int (*cmp)(const void *, const void *)) { uchar *t = v; uchar *x, *y; uchar *r; if (count) { x = y = t; r = t+(count-1)*sz; for(; y
13 Replies

 P: n/a buda wrote: I had some spare time and decided to try to implement the standard library qsort. This is a pretty simpleminded attempt, but it seems to be working. I have to point out that efficiency is not at all an issue for me, and this is purely a toy, so to speak. I'm quite aware that this is not the quickest way to implement the quicksort algorithm, but I basically just wanted to try to make a "generic function". I tested it with a few arrays of structures containing all sorts of types (this, ofcourse, doesn't mean that it's correct, even by a long shot :). I would appreciate any and all constructive comments. The code below is a snippet. All the necessary headers are included and uchar is a typedef for unsigned char. Thank you. This implementation will probably run out of memory on large already ordered arrays. If the array is in order prior to sorting, then (x-t)/sz will be equal to count - 1 and you'll wind up with about count - 1, levels of recursion. If count is large, the function could crash the program. void do_swap_(void *a, void *b, size_t sz) { uchar t; uchar *i = a; uchar *j = b; while (sz) { t = *i; *i++ = *j; *j++ = t; --sz; } } void quick_sort(void *v, size_t count, size_t sz, int (*cmp)(const void *, const void *)) { uchar *t = v; uchar *x, *y; uchar *r; if (count) { x = y = t; r = t+(count-1)*sz; for(; y

 P: n/a >I had some spare time and decided to try to implement the standard libraryqsort. This is a pretty simpleminded attempt, but it seems to be working. Ihave to point out that efficiency is not at all an issue for me, and this ispurely a toy, so to speak. I'm quite aware that this is not the quickest wayto implement the quicksort algorithm, but I basically just wanted to try tomake a "generic function". qsort() is not required to implement any particular sort algorithm. It's just supposed to sort. It's not specified HOW. A bubble sort is not an invalid implementation. A recursive bogosort[*] (which operates in time equivalent to O(e**infinity)) is probably an invalid implementation since it NEVER finishes. I tested it with a few arrays of structurescontaining all sorts of types (this, ofcourse, doesn't mean that it'scorrect, even by a long shot :). I would appreciate any and all constructivecomments. The code below is a snippet. All the necessary headers areincluded and uchar is a typedef for unsigned char.Thank you. If count == 1, the algorithm does a compare anyway. I know you weren't going for real efficiency here, but this seems a bit strange. If sz == 0, various wierd and obnoxious things will happen, but the caller IS asking for it. It seems like this algorithm could end up recursing a lot. This could be a problem with large amounts of data. One thing worth checking here is how well your algorithm will operate with a bogus compare routine (e.g. one which always returns -1, or one for which cmp(a,b) is not guaranteed to equal -cmp(b,a) ). Some implementations end up going outside the passed-in array in such circumstances. [*] Technical note: a bogosort operates by creating all possible orders of the sorted array, determining how many elements are out of order, and returns the order with the least number of elements out of order by sorting them and returning the first one. In other words, it turns the problem of sorting N elements into one of sorting N! elements. A recursive bogosort uses bogosort to sort the possible orders by number of elements out of order, thereby turning the problem of sorting N elements into one of sorting N!!!!!!!!!!!!!!!!!!!!!!!!!!.... elements. Bogosorts typically include other performance-wasting code such as: while (malloc(INT_MAX)) fork(); since they will never finish anyway. Gordon L. Burditt Nov 14 '05 #3

 P: n/a Gordon Burditt wrote:I had some spare time and decided to try to implement the standard library qsort. qsort() is not required to implement any particular sort algorithm. It's just supposed to sort. One thing worth checking here is how well your algorithm will operate with a bogus compare routine (e.g. one which always returns -1, or one for which cmp(a,b) is not guaranteed to equal -cmp(b,a) ). Some implementations end up going outside the passed-in array in such circumstances. The behavior of qsort with a bogus compare routine, is undefined, so what's the point? -- pete Nov 14 '05 #4

 P: n/a >>>I had some spare time and decided to try toimplement the standard library qsort.qsort() is not required to implement any particular sort algorithm.It's just supposed to sort.One thing worth checking here is how well your algorithm will operatewith a bogus compare routine (e.g. one which always returns -1, orone for which cmp(a,b) is not guaranteed to equal -cmp(b,a) ). Some implementations end up going outside the passed-inarray in such circumstances. The behavior of qsort with a bogus compare routine, is undefined, so what's the point? Testing. You do not want to get a segfault (or its undetected friends) when a slightly faulty cmp() is provided. If my first concern is being sure that everything works, I usually do that with unit tests. There, I have usually conditional code I can switch on for "do all checks" vs. assuming the "input is alright". If I do not care for speed, I go for the former and try to handle faulty input as well and provide test values (or in this case test functions) which behave extreme or wrong or are, umh, don't know the right word, "limit" cases, where the behaviour of the function processing the input should change radically. The bogus cmp() helps you to find such problems which could also arise in some complicated and thus opaque real world applications. One nice example is always people believing that floating point variables could be treated as real numbers -- forgetting to account for rounding errors in the operations as well as in the representation of a number. Then they apply some operations, still assuming the trichotomy of order or some such and wonder why their results really do not fit, as they did not account for seemingly impossible cases or the arbitraryness of some decisions based on rounded values. Cheers Michael Nov 14 '05 #5

 P: n/a "pete" wrote in message news:41***********@mindspring.com... This implementation will probably run out of memory on large already ordered arrays. If the array is in order prior to sorting, then (x-t)/sz will be equal to count - 1 and you'll wind up with about count - 1, levels of recursion. If count is large, the function could crash the program. As I said, trying to make this a useable qsort was not my intention in any way. I'm aware of ways to improve worst case performance of the quicksort algorithm (median of three/five, randomization...), but the sole purpose of this attempt was to make a generic function, since I haven't done this yet :) Nov 14 '05 #6

 P: n/a In article , Michael Mair writes: [Quoting pete in message <41**********@mindspring.com>. Please fix your newsreader to insert proper attributions.] The behavior of qsort with a bogus compare routine, is undefined, so what's the point? Testing. You do not want to get a segfault (or its undetected friends) when a slightly faulty cmp() is provided. There is a QoI issue here, but it's not clear that handling broken comparison routines is the proper answer. That could have a significant adverse effect on performance, which is probably one of the reasons why the Standard makes a broken comparison routine UB, rather than requiring implementations to, say, signal an error in some fashion. I've seen reports of commercial qsort implementations crashing if given a broken comparison routine, and the response they've elicited from other developers has generally been that the comparison routine should be fixed, not that the qsort implementation should be made more tolerant. The C standard library generally favors efficiency over safety. We may approve or disapprove, but trying to fix that in one particular case is probably not worth any great effort. Someone who passes a bad comparison routine to qsort will make a similar error with some other library function elsewhere. -- Michael Wojcik mi************@microfocus.com "We are facing a dire shortage of clowns," said Erickson, also known as Jingles. Nov 14 '05 #7

 P: n/a >> > The behavior of qsort with a bogus compare routine, is undefined, > so what's the point? Testing. You do not want to get a segfault (or its undetected friends) when a slightly faulty cmp() is provided.There is a QoI issue here, but it's not clear that handling brokencomparison routines is the proper answer. That could have asignificant adverse effect on performance, which is probably oneof the reasons why the Standard makes a broken comparison routineUB, rather than requiring implementations to, say, signal an errorin some fashion. Reliably detecting a broken comparison routine would probably mean a big performance hit (e.g. running cmp(a,b) and cmp(b,a) and checking that the results are negatives of each other, and even that's not a thorough test). I'm not asking for that, I'm just suggesting that arranging to not segfault would be a good idea. And I'm not sure it means a big (or even any) performance hit to do that. Consider the tech support issues. If it segfaulted in qsort, it's obviously qsort's fault, right? How many times have we seen posters claim to have found a bug in malloc() or free() when a debugger indicates the segfault happened while executing one of those functions? I've seen reports of commercial qsort implementations crashing ifgiven a broken comparison routine, and the response they've elicitedfrom other developers has generally been that the comparison routineshould be fixed, not that the qsort implementation should be mademore tolerant. But the commercial outfit's tech support line still takes a \$\$ hit, doesn't it? The C standard library generally favors efficiency over safety. Wemay approve or disapprove, but trying to fix that in one particularcase is probably not worth any great effort. Someone who passes abad comparison routine to qsort will make a similar error with someother library function elsewhere. I think of this code more as user code, not something to replace a routine in the standard library. User code should be able to handle bad input without invoking UB. Gordon L. Burditt Nov 14 '05 #8

 P: n/a buda wrote: "pete" wrote in message news:41***********@mindspring.com... This implementation will probably run out of memory on large already ordered arrays. If the array is in order prior to sorting, then (x-t)/sz will be equal to count - 1 and you'll wind up with about count - 1, levels of recursion. If count is large, the function could crash the program. As I said, trying to make this a useable qsort was not my intention in any way. I'm aware of ways to improve worst case performance of the quicksort algorithm (median of three/five, randomization...), Those are methods of improving speed and reducing the probability of having the sort go quadratic. but the sole purpose of this attempt was to make a generic function, since I haven't done this yet :) What you have, is a function that can crash with valid input, and by that, I mean abnormally terminate the program. There is a way to rewrite your function without improving it's speed in any way, which will prevent it from recursing nmemb levels deep. What you have to do, is find out which partition is smaller and recurse that one. The larger partitions should be handled by making the function body into a loop. Here's an example of the variation: q0sort will crash for large ordered arrays, q_sort won't. void q0sort(void *base, size_t nmemb, size_t size, int (*compar)(const void*, const void*)) { unsigned char *middle, *left, *last, *right; unsigned char swap, *p1, *p2, *end; if (nmemb > 1) { last = middle = left = base; right = (nmemb - 1) * size + last; do { middle += size; if (compar(left, middle) > 0) { last += size; BYTE_SWAP(middle, last); } } while (middle != right); BYTE_SWAP(left, last); q0sort(last + size, (right - last) / size, size, compar); q0sort(left , (last - left) / size, size, compar); } } void q_sort(void *base, size_t nmemb, size_t size, int (*compar)(const void*, const void*)) { unsigned char *middle, *left, *last, *right; size_t nmemb_right; unsigned char swap, *p1, *p2, *end; left = base; while (nmemb > 1) { last = middle = left; right = (nmemb - 1) * size + last; do { middle += size; if (compar(left, middle) > 0) { last += size; BYTE_SWAP(middle, last); } } while (middle != right); BYTE_SWAP(left, last); nmemb = (last - left) / size; nmemb_right = (right - last) / size; if (nmemb_right > nmemb) { q_sort(left, nmemb, size, compar); left = last + size; nmemb = nmemb_right; } else { q_sort(last + size, nmemb_right, size, compar); } } } -- pete Nov 14 '05 #9

 P: n/a Michael Mair wrote:I had some spare time and decided to try toimplement the standard library qsort.qsort() is not required to implement any particular sort algorithm.It's just supposed to sort.One thing worth checking here is how well your algorithm will operatewith a bogus compare routine (e.g. one which always returns -1, orone for which cmp(a,b) is not guaranteed to equal -cmp(b,a) ). Some implementations end up going outside the passed-inarray in such circumstances. The behavior of qsort with a bogus compare routine, is undefined, so what's the point? Testing. You do not want to get a segfault (or its undetected friends) when a slightly faulty cmp() is provided. I disagree. You want the segfault. It's better to wrong than vague. -- pete Nov 14 '05 #10

 P: n/a >>>The behavior of qsort with a bogus compare routine, is undefined,so what's the point?Testing. You do not want to get a segfault (or its undetected friends)when a slightly faulty cmp() is provided. Would have liked you quoting some more context; however, ... There is a QoI issue here, but it's not clear that handling broken comparison routines is the proper answer. That could have a significant adverse effect on performance, which is probably one of the reasons why the Standard makes a broken comparison routine UB, rather than requiring implementations to, say, signal an error in some fashion. I've seen reports of commercial qsort implementations crashing if given a broken comparison routine, and the response they've elicited from other developers has generally been that the comparison routine should be fixed, not that the qsort implementation should be made more tolerant. The C standard library generally favors efficiency over safety. We may approve or disapprove, but trying to fix that in one particular case is probably not worth any great effort. Someone who passes a bad comparison routine to qsort will make a similar error with some other library function elsewhere. You are of course right in that but I was talking about code I am producing. And I personally would be happy to have a qsort() which does not segfault. Perhaps I made it not clear enough in the part you snipped but I am not demanding that to hold for the most efficient variant of the code. Nonetheless, I like to be able to switch on sensible or even exhaustive checks and asserting everything if I find that something goes wrong. Personal paranoia, I guess. In addition to that, if by testing the undesirable and unlikely I find a way to make my code handle even that without performance losses, then in my opinion it was worth it. The example with the "slightly faulty" FP cmp() is not too far from what I have already seen or even produced myself, so I would rather be sure that some modules and functions are safe to my best knowledge so that I can start looking for errors in the other code. This saves time in the long run. Cheers Michael Nov 14 '05 #11