By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,553 Members | 907 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,553 IT Pros & Developers. It's quick & easy.

Best multi-thread sort for use with C++11 ?

SwissProgrammer
100+
P: 133
If I have 2 cores, or 200 cores, etc. that I can use for my program and I have a huge amount of data to be sorted that is not being used via Access or any other common database, and if I want to sort it fast, I was thinking of maybe using threads for partial sorts, or something like that. Maybe sort 10% of the data with each of 10 different cpu cores, then step over in the data 5% of the total of the data and sort from there in 10% sections, maybe. Do this until no changes are detected. If I have so much data that to do each of these individual 10% sorts takes a few minutes in each thread using its own cpu core, then maybe there is a better way. I am looking for usable opinions on this.

Specifically, what is the best type of sort to use for this? Someone else's pre-made software is not what I am looking for with this.

In C++ up to and including C++11.

Thank you.
1 Week Ago #1

✓ answered by Banfa

This paper appears to specifically address use of parallelism (threads) in sorting algorithms and I think the conclusions could be summerised as
  1. Make sure you use the right sorting algorithm because parallelism makes little difference to some of them
  2. Don't both if you have < 2^12 samples
  3. While a few threads is better than 1 thread; lots of threads is not that much better than a few.

The paper considers Rank Sort, Merge Sort, Quicksort, Bitonic Sort, Bucket Sort and Radix Sort so there may be other algorithms out there to try (but don't bother with Bozo Sort) but this should be a good place to start.

If you don't want to read the paper then try Bitonic Sort.

Share this Question
Share on Google+
2 Replies

Banfa
Expert Mod 5K+
P: 8,996
This paper appears to specifically address use of parallelism (threads) in sorting algorithms and I think the conclusions could be summerised as
  1. Make sure you use the right sorting algorithm because parallelism makes little difference to some of them
  2. Don't both if you have < 2^12 samples
  3. While a few threads is better than 1 thread; lots of threads is not that much better than a few.

The paper considers Rank Sort, Merge Sort, Quicksort, Bitonic Sort, Bucket Sort and Radix Sort so there may be other algorithms out there to try (but don't bother with Bozo Sort) but this should be a good place to start.

If you don't want to read the paper then try Bitonic Sort.
1 Week Ago #2

SwissProgrammer
100+
P: 133
Banfa,

For over 20,000,000 short data sets (example: Fred Flintstone 4321 Rock Creak)
It looks like a multi-core (multi-physical-thread) Radix Sort
should work.

For over 20,000,000 longer data sets (example: Communication Avenue: Phone/Land Line, Time Stamp: 123456789, Location Origination: USA/Maryland, VISA #1234567890123456, Order #0987654321)
It looks like a multi-core (multi-physical-thread) combination of starting with Radix Sort first (maybe 20% of data length), then Bitonic Sort to complete the process.
Thank you for that information.

For a very fast changing data load (example: a "massively multiplayer" game as defined by the originator of that term back in the early 1990's as being a minimum of 1,000 concurrent users all in the same game and all playing against each other at the same time, not sectioned off. Full interaction one to 1,000.) Especially if the data sets are long (more than 150 characters each): I think that the sort which I described in my question could be the best.
Example: sort 10% of the data with each of 10 different physical cpu cores, then step over in the data 5% of the total of the data and sort from there in 10% sections, maybe. The actual sorts could be Radix for short data and Bitonic for longer data or a combination as previously described. Do this non-stop as long as the data is very fast changing.
An addition could be to compartmentalize with multiple servers as needed.




etcetera:

In case someone is interested in my observations of the paper reported by Banfa, see the following:

"The paper concludes that an implementation of radix sort is the fastest for GPUs, by a great margin." [pdf page 4].
But, they note that this can vary based upon randomness and presortedness of the data at the time of the sort. [pdf page 5+] That makes sense.

In that paper it says,
"As the data sets grows larger, the speedup increases pending on the number of threads" [pdf page 11].
That was expected.
"Results . Multithreading does improve performance, with two threads in average providing a speedup of up to 2x and four threads up to 3x. However, the potential speedup is bound to the available physical threads of the CPU and dependent of balancing the workload." [pdf page 3].
The 2x was expected. The 3x: I reject as inconclusive due to insufficient data size. I am guessing that the load was not sufficient on the four threads to achieve results approaching 4x. But, the 2x for two threads gave me an answer.
"On a single-core machine, there is little improvement due to multithreading, since it only has one core and can only run one thread at the time." [pdf page 10]
I did not notice where they specifically stated that a multi-threaded single core was to be considered only via the actual core count. They seem to completely discount the "multi-thread" ability. It seems to be a premise. They use the term "physical threads", thus I think that it was a conditional that they are using. It is exactly a part of what I based my question upon. Thank you Banfa for finding this and reporting it.

I have not gotten far in reading this report, and now I see a potential error in their testing logic. I wanted to know about multi-core threading. They reported on what seemed to be multi-core threading, but if I read the report correctly, it is not [specifically not] multi-core threading as I was looking for. For multi-core threading, I would have used cores reported plus one. The plus one would have been for the operating system and the program being executed, without the threads being testing. Then the threads being tested would have been on further separate cores. For a two core test: use a minimum of a three core system. Operate on thread one. Test on thread two and thread three. With the constraints of keeping all other use of those threads, two and three, from being used for other purposes.

I now have finished the report and I think that I agree with them and you (both) conditionally about the sort type to use based upon the randomness of the data and the average individual data set size.

Thank you Banfa.
1 Week Ago #3

Post your reply

Sign in to post your reply or Sign up for a free account.