473,388 Members | 1,408 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

Re: random sorting a collection

On Wed, 28 May 2008 06:23:35 -0700, Adam Sandler <co****@excite.comwrote:
[...]
The problem is, the sorted array can now contain duplicates!
No, it can't. Not unless it had duplicates to start with.

Personally, I wouldn't bother with such a contrived loop. Something
simpler is fine:

for (int i = 0; i < a.Length; i++)
{
int j = r.Next(a.Length);
String temp = a[i];

a[i] = a[j];
a[j] = temp;
}

Then you avoid all the -1 and +1 and off-by-one. For the purposes of
shuffling the input, it doesn't matter that you might shuffle an item that
you'd already shuffled. It's still random.

That whole thread you referenced is a bit sketchy, if you ask me.

Pete
Jun 27 '08 #1
2 2218
On May 28, 3:39 pm, "Peter Duniho" <NpOeStPe...@nnowslpianmk.com>
wrote:

<snip>
Then you avoid all the -1 and +1 and off-by-one. For the purposes of
shuffling the input, it doesn't matter that you might shuffle an item that
you'd already shuffled. It's still random.
It ends up being less random though. It's less likely that something
will end up in its initial position than with a true shuffle, IIRC. It
takes a *little* more work to get rid of the off-by-one errors, but
it's not that bad.

I'd personally write it as:

// i=position we're going to replace (possibly with same element)
// Everything less than i is already shuffled
for (int i=0; i < a.Length; i++)
{
// target is in range [i, a.Length)
int target = i+rng.Next(a.Length-i);

String temp = a[i];
a[i] = a[target];
a[target] = temp;
}

Jon
Jun 27 '08 #2
On Wed, 28 May 2008 08:30:18 -0700, Jon Skeet [C# MVP] <sk***@pobox.com>
wrote:
On May 28, 3:39 pm, "Peter Duniho" <NpOeStPe...@nnowslpianmk.com>
wrote:

<snip>
>Then you avoid all the -1 and +1 and off-by-one. For the purposes of
shuffling the input, it doesn't matter that you might shuffle an item
that
you'd already shuffled. It's still random.

It ends up being less random though. It's less likely that something
will end up in its initial position than with a true shuffle, IIRC.
By the way, I did finally get a chance to try to look up a reference that
would refute or verify this. I found in Wikipedia an argument that points
out that for a shuffle, there are N! permutations, but that selecting a
random swap from every possible index on each iteration produces N^N
possible outcomes for the random selections. It then points out that N^N
is never divisible by N! and uses that to claim that some permutations
must occur more often (basically, there are duplicates in the possible
random outcomes, but only for some permutations).

Anyway, that's a long way of saying I think you're right.

As long as I'm commenting, I'll point out though that if we're going to
fix the algorithm so that it produces correct uniform distribution, we
might as well take advantage of the chance and reduce our iteration count
by one. So instead of:
for (int i = 0; i < a.Length; i++)
we can have:
for (int i = 0; i < a.Length - 1; i++)
Not nearly as good a nitpick as the one you came up with, but it's all
I've got. :)

There's another optimization with respect to not swapping with something
winds up in the same place, but that involves a much more (relatively
speaking) complex change to the basic algorithm, and so more easily
qualifies as a premature optimization. The above I think makes sense due
to its simplicity, even as I would ignore the potentially more beneficial
optimization. I find this whole thought process somewhat
counter-intuitive and ironic. :)

Pete
Jun 27 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Jean-Francois.Doyon | last post by:
Hello, I'm trying to retrieve a limited number of random rows, and order them by a column, and am not having any luck with that last part: SELECT * FROM tablename ORDER BY random(), id LIMIT...
22
by: Nhmiller | last post by:
Is there a way to do this? Thanks. Neil Cat Paintings At Carol Wilson Gallery http://www.carolwilsongallery.com
18
by: Scott | last post by:
I have a collection where the items in the collection are dates. I want to iterate over the collection and build a value list string for the rowsource of a listbox. The dates in the collection are...
7
by: Pete Davis | last post by:
A different question this time. I have a DataGrid bound to a collection. Is there any way for me to allow sorting? The DataGrid.AllowSorting=true doesn't work, but that's probably because it can't...
1
by: Patrick | last post by:
Hi, This post is the 'sequel' ;) of the "Data Oriented vs Object Oriented Design" post, but it can be read and treated apart from that one. I will just quote the beginning of my previous message...
19
by: Boris Borcic | last post by:
does x.sort(cmp = lambda x,y : cmp(random.random(),0.5)) pick a random shuffle of x with uniform distribution ? Intuitively, assuming list.sort() does a minimal number of comparisons to ...
16
by: Claudio Grondi | last post by:
I have a 250 Gbyte file (occupies the whole hard drive space) and want to change only eight bytes in this file at a given offset of appr. 200 Gbyte (all other data in that file should remain...
10
by: Sjaakie | last post by:
Hi, I'm, what it turns out to be, fooling around with 3-tier design. At several websites people get really enthusiastic about using custom dataobjects instead of datasets/-tables. While trying to...
13
by: Bruza | last post by:
I need to implement a "random selection" algorithm which takes a list of as input. Each of the (obj, prob) represents how likely an object, "obj", should be selected based on its probability of...
0
by: =?ISO-8859-1?Q?Arne_Vajh=F8j?= | last post by:
Adam Sandler wrote: The following code does both array and collection. It is not the most efficient way of doing it, but the algorithm is easy to understand. public class Util<T> {
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.