446,233 Members | 1,789 Online
Need help? Post your question and get tips & solutions from a community of 446,233 IT Pros & Developers. It's quick & easy.

# Where's the bug?

 P: n/a As a programming exercise, I was playing with the quicksort algorithm. The commented line in the "partition" function speeds things up a bit. However, when used on (my machine) arrays larger than about 5,000,000 elements, the sort appears to freeze. I don't understand this behavior. With this line commented as shown, the sort works fine up to available memory (circa 50,000,000 elements, (200MB) on my 256MB machine). I'm guessing the error has to do with "val" taking a minimum or maximum value, but I can't fugure it out. If this Q is off-topic to c++ and more appropriate on an "algorithm-oriented" newsgroup, I apologize in advance. Thanks, Joe #include const int kSMALL_ENOUGH = 16; //when to call selection-sort using namespace std; void fillRND(unsigned int* a, int size); void showit(unsigned int* a, int size); void quicksort(unsigned int* a, int left, int right); int partition(unsigned int* a, int left, int right); void selectionsort(unsigned int* a, int left, int right); int verify(unsigned int* a, int size); int main(){ int clo; cout << "Enter the number of elements to be sorted. "; int sz; cin >> sz; unsigned int* a = new unsigned int[sz]; cout << "generating data..."; fillRND(a, sz); cout << "done\nchecking data..."; verify(a, sz); //showit(a, sz); clo=clock(); cout << "sorting data..."; quicksort(a, 0, sz-1); cout << "done in " << clock() - clo << " (ms)\nchecking data..."; verify(a, sz); //showit(a, sz); delete [] a; system("pause"); return 0; } void quicksort(unsigned int* a, int left, int right){ if(left + kSMALL_ENOUGH < right) { int split_pt = partition(a,left, right); quicksort(a, left, split_pt); quicksort(a, split_pt+1, right); }else selectionsort(a, left, right); } int partition(unsigned int* a, int left, int right){ unsigned int val = a[left]///2 + a[right]/2 ;//<---note end of line here! int lm = left-1; int rm = right+1; for(;;){ while(a[--rm] > val); while(a[++lm] < val); if(lm < rm){ unsigned int temp = a[rm]; a[rm] = a[lm]; a[lm] = temp; }else return rm; } } void selectionsort(unsigned int* data, int left, int right){ for(int i = left; i < right; ++i) { int min = i; for(int j=i+1; j <= right; ++j) if(data[j] < data[min]) min = j; unsigned int temp = data[min]; data[min] = data[i]; data[i] = temp; } } int verify(unsigned int* a, int size){ for(int i = 0; i a[i+1]){ cout << "numbers not sorted\n"; return 1; } } cout << "numbers are sorted\n"; return 0; } void fillRND(unsigned int* a, int size){ srand(time(0)); int bits = (CHAR_BIT * sizeof(int))/2; for(int i = 0; i < size; ++i) a[i] = rand() + (rand() << bits); //to fill more than 15 bits... } void showit(unsigned int* a, int size){ for(int i = 0; i < size; ++i) cout << i << ' ' << a[i] << endl; cout << endl; } Jul 22 '05 #1
7 Replies

 P: n/a "J. Campbell" wrote: int partition(unsigned int* a, int left, int right){ unsigned int val = a[left]///2 + a[right]/2 ;//<---note end of line here! int lm = left-1; int rm = right+1; for(;;){ while(a[--rm] > val); while(a[++lm] < val); if(lm < rm){ unsigned int temp = a[rm]; a[rm] = a[lm]; a[lm] = temp; }else return rm; } } consider your array to sort consists of just 2 numbers 5 5 what will be the value of val? a[left] / 2 -> 5 / 2 -> 2 a[right] / 2 -> 5 / 2 -> 2 val therefore will get a value of 4 now look at the while loops while( a[--rm] > val); that loop will sure run out of bounds while( a[++lm] < val); again: out of bounds Out of bounds may mean in this case: outside the range left-right. The partition function then swaps some array elements it shouldn't, which will influence the whole algorithm. The problem is, that with the calculation you do, you may end up with a number for val which is not even in the boundaries of the numbers you try to sort (Sorry: I am not a native english speaker so the above may sound confusing, but I don't know how to express it better). One technique I learned for comming up with a value for that trashold value is something like this: take a[left], a[right], a[(left+right)/2] and discard the smallest and highest value (sort them and take the second in sorting order) -- Karl Heinz Buchegger kb******@gascad.at Jul 22 '05 #2

 P: n/a "J. Campbell" wrote... As a programming exercise, I was playing with the quicksort algorithm. The commented line in the "partition" function speeds things up a bit. However, when used on (my machine) arrays larger than about 5,000,000 elements, the sort appears to freeze. I don't understand this behavior. With this line commented as shown, the sort works fine up to available memory (circa 50,000,000 elements, (200MB) on my 256MB machine). I'm guessing the error has to do with "val" taking a minimum or maximum value, but I can't fugure it out. If this Q is off-topic to c++ and more appropriate on an "algorithm-oriented" newsgroup, I apologize in advance. It's not related to language, nor it is related to algorithms, I am afraid. It's completely compiler- and system-specific. The secret is that your program is likely causing stack overflow or some such. Quicksort is recursive. The depth of its recursion depends on data (quantity and contents) and difficult to predict. Since it uses [stack] memory for recursion, you can expect it to behave differently if you allocate more for stack, but that's beyond the scope of the language. See your compiler options. Victor Jul 22 '05 #3

 P: n/a "Karl Heinz Buchegger" wrote in message news:3F***************@gascad.at... "J. Campbell" wrote: int partition(unsigned int* a, int left, int right){ unsigned int val = a[left]///2 + a[right]/2 ;//<---note end of line here! int lm = left-1; int rm = right+1; for(;;){ while(a[--rm] > val); while(a[++lm] < val); if(lm < rm){ unsigned int temp = a[rm]; a[rm] = a[lm]; a[lm] = temp; }else return rm; } } consider your array to sort consists of just 2 numbers Karl, Thanks for the reply. I need to consider your suggestions more closely. However, please note that in my implementation, the array never has fewer than 16 elements (else the selection sort is used). I need to examine if it has to do with overrunning the array, as you suggest, or with overrunning the memory, as suggested by victor. (Or another problem altogether). Thanks again. 5 5 what will be the value of val? a[left] / 2 -> 5 / 2 -> 2 a[right] / 2 -> 5 / 2 -> 2 val therefore will get a value of 4 now look at the while loops while( a[--rm] > val); that loop will sure run out of bounds while( a[++lm] < val); again: out of bounds Out of bounds may mean in this case: outside the range left-right. The partition function then swaps some array elements it shouldn't, which will influence the whole algorithm. The problem is, that with the calculation you do, you may end up with a number for val which is not even in the boundaries of the numbers you try to sort (Sorry: I am not a native english speaker so the above may sound confusing, but I don't know how to express it better). One technique I learned for comming up with a value for that trashold value is something like this: take a[left], a[right], a[(left+right)/2] and discard the smallest and highest value (sort them and take the second in sorting order) -- Karl Heinz Buchegger kb******@gascad.at Jul 22 '05 #4

 P: n/a "Karl Heinz Buchegger" wrote in message news:3F***************@gascad.at... consider your array to sort consists of just 2 numbers 5 5 what will be the value of val? a[left] / 2 -> 5 / 2 -> 2 a[right] / 2 -> 5 / 2 -> 2 val therefore will get a value of 4 now look at the while loops while( a[--rm] > val); that loop will sure run out of bounds while( a[++lm] < val); again: out of bounds Out of bounds may mean in this case: outside the range left-right. The partition function then swaps some array elements it shouldn't, which will influence the whole algorithm. The problem is, that with the calculation you do, you may end up with a number for val which is not even in the boundaries of the numbers you try to sort (Sorry: I am not a native english speaker so the above may sound confusing, but I don't know how to express it better). One technique I learned for comming up with a value for that trashold value is something like this: take a[left], a[right], a[(left+right)/2] and discard the smallest and highest value (sort them and take the second in sorting order) -- Karl Heinz Buchegger kb******@gascad.at Karl...I understand your point, and I believe you are correct...thanks for the help. Joe Jul 22 '05 #5

 P: n/a "Karl Heinz Buchegger" wrote in message news:3F***************@gascad.at... consider your array to sort consists of just 2 numbers 5 5 what will be the value of val? a[left] / 2 -> 5 / 2 -> 2 a[right] / 2 -> 5 / 2 -> 2 val therefore will get a value of 4 now look at the while loops while( a[--rm] > val); that loop will sure run out of bounds while( a[++lm] < val); again: out of bounds Out of bounds may mean in this case: outside the range left-right. The partition function then swaps some array elements it shouldn't, which will influence the whole algorithm. The problem is, that with the calculation you do, you may end up with a number for val which is not even in the boundaries of the numbers you try to sort (Sorry: I am not a native english speaker so the above may sound confusing, but I don't know how to express it better). One technique I learned for comming up with a value for that trashold value is something like this: take a[left], a[right], a[(left+right)/2] and discard the smallest and highest value (sort them and take the second in sorting order) -- Karl Heinz Buchegger kb******@gascad.at Thanks Karl...this modification fixes the bug....and maintains the speed boost ;-) good job to us both!!!! #include const int kSMALL_ENOUGH = 16; //when to call selection-sort using namespace std; void fillRND(unsigned int* a, int size); void showit(unsigned int* a, int size); void quicksort(unsigned int* a, int left, int right); int partition(unsigned int* a, int left, int right); void selectionsort(unsigned int* a, int left, int right); int verify(unsigned int* a, int size); //unsigned int triplet(unsigned int* a, int left, int right); int main(){ int clo; cout << "Enter the number of elements to be sorted. "; int sz; cin >> sz; unsigned int* a = new unsigned int[sz]; cout << "generating data..."; fillRND(a, sz); cout << "done\nchecking data..."; verify(a, sz); //showit(a, sz); clo=clock(); cout << "sorting data..."; quicksort(a, 0, sz-1); cout << "done in " << clock() - clo << " (ms)\nchecking data..."; verify(a, sz); //showit(a, sz); delete [] a; system("pause"); return 0; } void quicksort(unsigned int* a, int left, int right){ if(left + kSMALL_ENOUGH < right) { int split_pt = partition(a,left, right); quicksort(a, left, split_pt); quicksort(a, split_pt+1, right); }else selectionsort(a, left, right); } int partition(unsigned int* a, int left, int right){ unsigned int val = a[left]/2 + a[right]/2;// = triplet(unsigned int* a, int left, int right); unsigned int x = a[left]; unsigned int z = a[right]; if(x < val) if(val < z); //val = y; //x < y < z else if(x < z) val = z; //x < z < y else val = x; //z < x < y else if(z < val); //val = y; //z < y < x else if(z < x) val = z; //y < z < x else val = x; //y < x < z int lm = left-1; int rm = right+1; for(;;){ while(a[--rm] > val); while(a[++lm] < val); if(lm < rm){ unsigned int temp = a[rm]; a[rm] = a[lm]; a[lm] = temp; }else return rm; } } void selectionsort(unsigned int* data, int left, int right){ for(int i = left; i < right; ++i) { int min = i; for(int j=i+1; j <= right; ++j) if(data[j] < data[min]) min = j; unsigned int temp = data[min]; data[min] = data[i]; data[i] = temp; } } int verify(unsigned int* a, int size){ for(int i = 0; i a[i+1]){ cout << "numbers not sorted\n"; return 1; } } cout << "numbers are sorted\n"; return 0; } void fillRND(unsigned int* a, int size){ srand(time(0)); int bits = (CHAR_BIT * sizeof(int))/2; for(int i = 0; i < size; ++i) a[i] = rand() + (rand() << bits); //to fill more than 15 bits... } void showit(unsigned int* a, int size){ for(int i = 0; i < size; ++i) cout << i << ' ' << a[i] << endl; cout << endl; } Jul 22 '05 #6

 P: n/a Joe C wrote: consider your array to sort consists of just 2 numbers Karl, Thanks for the reply. I need to consider your suggestions more closely. However, please note that in my implementation, the array never has fewer than 16 elements (else the selection sort is used). Well. With such a large array of ints, it's quite possible that you have a sequence of 16 identical numbers which qualify for that problem. I need to examine if it has to do with overrunning the array, as you suggest, or with overrunning the memory, as suggested by victor. (Or another problem altogether). Thanks again. You have an 'endless loop' in partition(). That's OK, since there is a return in it, thus it isn't realy endless. But I would start with setting a high bound in this for loop and see if this bound is ever reached: int lm = left - 1; int rm = right + 1; for( int tmp = 0; tmp < 500000; ++tmp ) { while( a[--rm] > val ); while( a[++lm] < val ); ... } /* should never get here. */ assert( 0 ); I would also insert some assertion into the while loop while( a[--rm] > val ) assert( rm > left ); while( a[++rm] < val ) assert( lm < right ); If things are the way I think they are (which is hard to diagnose with a dataset of 50,000,000 items) one of the assertions will fire. You then have a foot in the door and your debugger will tell you more. -- Karl Heinz Buchegger kb******@gascad.at Jul 22 '05 #7

 P: n/a Joe C wrote: Thanks Karl...this modification fixes the bug....and maintains the speed boost ;-) good job to us both!!!! Glad to hear that. It was some shot in the dark. When seeing the partition function and recognizing what val is used for I thought by myself: That's strange, I have never seen it done that way (and I haven't seen a quicksort since years). int partition(unsigned int* a, int left, int right){ unsigned int val = a[left]/2 + a[right]/2;// = triplet(unsigned int* a, int left, int right); unsigned int x = a[left]; unsigned int z = a[right]; if(x < val) if(val < z); //val = y; //x < y < z else if(x < z) val = z; //x < z < y else val = x; //z < x < y else if(z < val); //val = y; //z < y < x else if(z < x) val = z; //y < z < x else val = x; //y < x < z Yep. That looks like the thing I can remember from my algorithms class (nearly 20 years ago). The idea is: Ideally you want the median of all the numbers. But the median is expensive to compute. On the other hand you definitly want to avoid to take the lowest or the highest value, since no partitioning would take place then. So what to do: take 3 samples of the numbers (it doesn't matter which ones) and choose the middle. That's good enough in most cases. -- Karl Heinz Buchegger kb******@gascad.at Jul 22 '05 #8

### This discussion thread is closed

Replies have been disabled for this discussion.