459,180 Members | 1,140 Online
Need help? Post your question and get tips & solutions from a community of 459,180 IT Pros & Developers. It's quick & easy.

# python & mathematical methods of picking numbers at random

 P: n/a I am using method 'a' below to pick 25 names from a pool of 225. A co-worker is using method 'b' by running it 25 times and throwing out the winning name (names are associated with numbers) after each run and then re-counting the list and doing it all over again. My boss thinks that 'b' is somehow less fair than 'a', but the only thing I see wrong with it is that it is really inefficient and ugly as it's doing manually what 'a' does automatically, other than that I think the outcome of both methods (25 unique winners from a pool of 225) are the same. Anyone disagree with that and if so, please demonstrate how 'b' isn't as fair as 'a' count = len(list_of_potential_winners) a = random.sample(range(count), 25) b = random.sample(range(count), 1) Thanks! Bart Jul 18 '05 #1
9 Replies

 P: n/a It sounds like your co-worker has re-written sample. random.sample(l, 1) is the same as random.choice(l), so that's another source of inefficiency. But why are *you* using random.sample(range(len(x)), 25) instead of random.sample(x, 25) ? Jeff Jul 18 '05 #2

 P: n/a Jeff Epler wrote: It sounds like your co-worker has re-written sample. random.sample(l, 1) is the same as random.choice(l), so that's another source of inefficiency. But why are *you* using random.sample(range(len(x)), 25) instead of random.sample(x, 25) ? Jeff Because it works and it's fast and len(count) changes every drawing. Jul 18 '05 #3

 P: n/a Bart Nessux writes: I am using method 'a' below to pick 25 names from a pool of 225. A co-worker is using method 'b' by running it 25 times and throwing out the winning name (names are associated with numbers) after each run and then re-counting the list and doing it all over again. My boss thinks that 'b' is somehow less fair than 'a', Both are the same, as you can see by calculating the probability of any given name being selected. What is the application, and the computer environment? You may also need to worry about correlations in the underlying Mersenne Twister PRNG. If the application is something where randomness is very important (you're picking winners for a big lottery or something) then you should use a better RNG. Jul 18 '05 #4

 P: n/a Paul Rubin wrote: Bart Nessux writes:I am using method 'a' below to pick 25 names from a pool of 225. Aco-worker is using method 'b' by running it 25 times and throwing outthe winning name (names are associated with numbers) after each runand then re-counting the list and doing it all over again.My boss thinks that 'b' is somehow less fair than 'a', Both are the same, as you can see by calculating the probability of any given name being selected. What is the application, and the computer environment? You may also need to worry about correlations in the underlying Mersenne Twister PRNG. If the application is something where randomness is very important (you're picking winners for a big lottery or something) then you should use a better RNG. We're raffling off crock-pots... that's why I think this is OK for our purposes. Jul 18 '05 #5

 P: n/a "Bart Nessux" wrote in message news:bu**********@solaris.cc.vt.edu... Jeff Epler wrote: But why are *you* using random.sample(range(len(x)), 25) instead of random.sample(x, 25) ? Because it works and it's fast and len(count) changes every drawing. I think you missed Jeff's point, which is that you are repeating part of the work that sample tries to do for you. From the Lib Ref: " sample(sequence, k): Return a k length list of unique elements chosen from the population sequence. Used for random sampling without replacement. New in version 2.3. Returns a new list containing elements from the population while leaving the original population unchanged. The resulting list is in selection order so that all sub-slices will also be valid random samples. This allows raffle winners (the sample) to be partitioned into grand prize and second place winners (the subslices). " When you get the sample from range(n), you have to use them as indexes into x to get the actual list of names. But the indexing and extraction is what sample would do if you gave it x instead of range(x)! Terry J. Reedy Jul 18 '05 #6

 P: n/a Bart Nessux wrote: Paul Rubin wrote: Bart Nessux writes: I am using method 'a' below to pick 25 names from a pool of 225. A co-worker is using method 'b' by running it 25 times and throwing out the winning name (names are associated with numbers) after each run and then re-counting the list and doing it all over again. My boss thinks that 'b' is somehow less fair than 'a', Both are the same, as you can see by calculating the probability of any given name being selected. What is the application, and the computer environment? You may also need to worry about correlations in the underlying Mersenne Twister PRNG. If the application is something where randomness is very important (you're picking winners for a big lottery or something) then you should use a better RNG. We're raffling off crock-pots... that's why I think this is OK for our purposes. Some will claim you cooked the numbers, even if it is a crock. Let 'em blow off some steam, but don't chicken out. If you let them stew for a day, they'll soften up and you'll eventually reach a cord. Jul 18 '05 #7

 P: n/a | Bart Nessux said | I am using method 'a' below to pick 25 names from a pool of 225. A co-worker is using method 'b' by running it 25 times and throwing out the winning name (names are associated with numbers) after each run and then re-counting the list and doing it all over again. My boss thinks that 'b' is somehow less fair than 'a', but the only thing I see wrong with it is that it is really inefficient and ugly as it's doing manually what 'a' does automatically, other than that I think the outcome of both methods (25 unique winners from a pool of 225) are the same. Anyone disagree with that and if so, please demonstrate how 'b' isn't as fair as 'a' count = len(list_of_potential_winners) a = random.sample(range(count), 25) b = random.sample(range(count), 1) Thanks! Bart I looked at the code for random.sample, and found out that the two methods are probabilistically equivalent. Neither is more or less fair than the other. You can, however, poke fun at your cow-orker for using random.sample(range(count, 1) when random.randint(1,count) would have done the exact same thing with the way he used random.sample. HTH Sam Walters. P.S. The code for sample in random.py is very simple and fairly straightforward. You should take a peek at it. The basic algorithm is to make a list of winners, choose a random number, then if the winner is not already in the list, add them. If the winner is already in the list, retry until a new winner comes up. Repeat until you have the desired number of winners. -- Never forget the halloween documents. http://www.opensource.org/halloween/ """ Where will Microsoft try to drag you today? Do you really want to go there?""" Jul 18 '05 #8

 P: n/a Terry Reedy wrote: "Bart Nessux" wrote in message news:bu**********@solaris.cc.vt.edu...Jeff Epler wrote:But why are *you* using random.sample(range(len(x)), 25)instead of random.sample(x, 25)?Because it works and it's fast and len(count) changes every drawing. I think you missed Jeff's point, which is that you are repeating part of the work that sample tries to do for you. From the Lib Ref: " sample(sequence, k): Return a k length list of unique elements chosen from the population sequence. Used for random sampling without replacement. New in version 2.3. Returns a new list containing elements from the population while leaving the original population unchanged. The resulting list is in selection order so that all sub-slices will also be valid random samples. This allows raffle winners (the sample) to be partitioned into grand prize and second place winners (the subslices). " When you get the sample from range(n), you have to use them as indexes into x to get the actual list of names. But the indexing and extraction is what sample would do if you gave it x instead of range(x)! Ahh, I see what you mean. Jul 18 '05 #9

 P: n/a Terry Reedy wrote: "Bart Nessux" wrote in message news:bu**********@solaris.cc.vt.edu...Jeff Epler wrote:But why are *you* using random.sample(range(len(x)), 25)instead of random.sample(x, 25)?Because it works and it's fast and len(count) changes every drawing. I think you missed Jeff's point, which is that you are repeating part of the work that sample tries to do for you. From the Lib Ref: " sample(sequence, k): Return a k length list of unique elements chosen from the population sequence. Used for random sampling without replacement. New in version 2.3. Returns a new list containing elements from the population while leaving the original population unchanged. The resulting list is in selection order so that all sub-slices will also be valid random samples. This allows raffle winners (the sample) to be partitioned into grand prize and second place winners (the subslices). " When you get the sample from range(n), you have to use them as indexes into x to get the actual list of names. But the indexing and extraction is what sample would do if you gave it x instead of range(x)! Terry J. Reedy Also, the below statement should be removed from random's documentation... it's where I got the idea to do: random.sample(range(len(x)), 25) instead of random.sample(x, 25) "To choose a sample from a range of integers, use xrange as an argument. This is especially fast and space efficient for sampling from a large population: sample(xrange(10000000), 60)." http://www.python.org/doc/current/li...le-random.html Jul 18 '05 #10

### This discussion thread is closed

Replies have been disabled for this discussion.