473,405 Members | 2,373 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Best multi-thread sort for use with C++11 ?

SwissProgrammer
220 128KB
If I have 2 cores, or 200 cores, etc. that I can use for my program and I have a huge amount of data to be sorted that is not being used via Access or any other common database, and if I want to sort it fast, I was thinking of maybe using threads for partial sorts, or something like that. Maybe sort 10% of the data with each of 10 different cpu cores, then step over in the data 5% of the total of the data and sort from there in 10% sections, maybe. Do this until no changes are detected. If I have so much data that to do each of these individual 10% sorts takes a few minutes in each thread using its own cpu core, then maybe there is a better way. I am looking for usable opinions on this.

Specifically, what is the best type of sort to use for this? Someone else's pre-made software is not what I am looking for with this.

In C++ up to and including C++11.

Thank you.
Nov 17 '20 #1

✓ answered by Banfa

This paper appears to specifically address use of parallelism (threads) in sorting algorithms and I think the conclusions could be summerised as
  1. Make sure you use the right sorting algorithm because parallelism makes little difference to some of them
  2. Don't both if you have < 2^12 samples
  3. While a few threads is better than 1 thread; lots of threads is not that much better than a few.

The paper considers Rank Sort, Merge Sort, Quicksort, Bitonic Sort, Bucket Sort and Radix Sort so there may be other algorithms out there to try (but don't bother with Bozo Sort) but this should be a good place to start.

If you don't want to read the paper then try Bitonic Sort.

2 2389
Banfa
9,065 Expert Mod 8TB
This paper appears to specifically address use of parallelism (threads) in sorting algorithms and I think the conclusions could be summerised as
  1. Make sure you use the right sorting algorithm because parallelism makes little difference to some of them
  2. Don't both if you have < 2^12 samples
  3. While a few threads is better than 1 thread; lots of threads is not that much better than a few.

The paper considers Rank Sort, Merge Sort, Quicksort, Bitonic Sort, Bucket Sort and Radix Sort so there may be other algorithms out there to try (but don't bother with Bozo Sort) but this should be a good place to start.

If you don't want to read the paper then try Bitonic Sort.
Nov 17 '20 #2
SwissProgrammer
220 128KB
Banfa,

For over 20,000,000 short data sets (example: Fred Flintstone 4321 Rock Creak)
It looks like a multi-core (multi-physical-thread) Radix Sort
should work.

For over 20,000,000 longer data sets (example: Communication Avenue: Phone/Land Line, Time Stamp: 123456789, Location Origination: USA/Maryland, VISA #1234567890123456, Order #0987654321)
It looks like a multi-core (multi-physical-thread) combination of starting with Radix Sort first (maybe 20% of data length), then Bitonic Sort to complete the process.
Thank you for that information.

For a very fast changing data load (example: a "massively multiplayer" game as defined by the originator of that term back in the early 1990's as being a minimum of 1,000 concurrent users all in the same game and all playing against each other at the same time, not sectioned off. Full interaction one to 1,000.) Especially if the data sets are long (more than 150 characters each): I think that the sort which I described in my question could be the best.
Example: sort 10% of the data with each of 10 different physical cpu cores, then step over in the data 5% of the total of the data and sort from there in 10% sections, maybe. The actual sorts could be Radix for short data and Bitonic for longer data or a combination as previously described. Do this non-stop as long as the data is very fast changing.
An addition could be to compartmentalize with multiple servers as needed.




etcetera:

In case someone is interested in my observations of the paper reported by Banfa, see the following:

"The paper concludes that an implementation of radix sort is the fastest for GPUs, by a great margin." [pdf page 4].
But, they note that this can vary based upon randomness and presortedness of the data at the time of the sort. [pdf page 5+] That makes sense.

In that paper it says,
"As the data sets grows larger, the speedup increases pending on the number of threads" [pdf page 11].
That was expected.
"Results . Multithreading does improve performance, with two threads in average providing a speedup of up to 2x and four threads up to 3x. However, the potential speedup is bound to the available physical threads of the CPU and dependent of balancing the workload." [pdf page 3].
The 2x was expected. The 3x: I reject as inconclusive due to insufficient data size. I am guessing that the load was not sufficient on the four threads to achieve results approaching 4x. But, the 2x for two threads gave me an answer.
"On a single-core machine, there is little improvement due to multithreading, since it only has one core and can only run one thread at the time." [pdf page 10]
I did not notice where they specifically stated that a multi-threaded single core was to be considered only via the actual core count. They seem to completely discount the "multi-thread" ability. It seems to be a premise. They use the term "physical threads", thus I think that it was a conditional that they are using. It is exactly a part of what I based my question upon. Thank you Banfa for finding this and reporting it.

I have not gotten far in reading this report, and now I see a potential error in their testing logic. I wanted to know about multi-core threading. They reported on what seemed to be multi-core threading, but if I read the report correctly, it is not [specifically not] multi-core threading as I was looking for. For multi-core threading, I would have used cores reported plus one. The plus one would have been for the operating system and the program being executed, without the threads being testing. Then the threads being tested would have been on further separate cores. For a two core test: use a minimum of a three core system. Operate on thread one. Test on thread two and thread three. With the constraints of keeping all other use of those threads, two and three, from being used for other purposes.

I now have finished the report and I think that I agree with them and you (both) conditionally about the sort type to use based upon the randomness of the data and the average individual data set size.

Thank you Banfa.
Nov 18 '20 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

0
by: r_obert | last post by:
Hello, I'm trying to create a worker thread for my VC++ program, and was wondering whether I should be linking with the Multithread /MT or Multithread DLL /MD option? I'm not quite sure, in...
0
by: Stefan Turalski \(stic\) | last post by:
Hi, What I do is looking for holes in my solution to problem which is: 1. taking something from database by this sth ID, by some application 2. process it to this multithread enviroment of some...
4
by: zbcong | last post by:
Hello: I write a multithread c# socket server,it is a winform application,there is a richtextbox control and button,when the button is click,the server begin to listen the socket port,waiting for a...
2
by: zhebincong | last post by:
Hello: I write a multithread c# socket server,it is a winform application,there is a richtextbox control and button,when the button is click,the server begin to listen the socket port,waiting...
0
by: fred | last post by:
I need some help in trying to understand how to make myCollection (inherited from CollectionBase) multithread safe. Taking my implementation of the Add Sub and a readonly property Item. Public...
5
by: mrkbrndck | last post by:
Please see the code below as I am trying to use multithreading for copying files to a new location in a way that improves performance of the client windows application. The problem occurs when 2...
6
by: jmartin | last post by:
Hi, I have made a multithread version of a program (load a file into database), and with two processors I get the double of time in the multithread than in the process (unithread) version. I...
2
by: tikcireviva | last post by:
Hi Guys, I've done a mulithread queue implementation on stl<queue>, my developement environment is on VC6 as well as FC3. Let's talks about the win32 side. The suspected memory leak is find...
4
by: RB | last post by:
Hi there, I'm pretty new to ASP.NET, and have a "best practice" sort-of question. I'm programming for .NET 2.0, using VB.Net in Visual Studio 2002. Basically, I've got a web-page which is...
5
by: halooti | last post by:
Hello all I created a button with excel and assigned a macro to it which I recorded with excel Macro Record.. The button sorts column C ascending, this is the code Sub Macro1() ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.