python & mathematical methods of picking numbers at random

Bart Nessux

I am using method 'a' below to pick 25 names from a pool of 225. A
co-worker is using method 'b' by running it 25 times and throwing out
the winning name (names are associated with numbers) after each run and
then re-counting the list and doing it all over again.

My boss thinks that 'b' is somehow less fair than 'a', but the only
thing I see wrong with it is that it is really inefficient and ugly as
it's doing manually what 'a' does automatically, other than that I think
the outcome of both methods (25 unique winners from a pool of 225) are
the same. Anyone disagree with that and if so, please demonstrate how
'b' isn't as fair as 'a'

count = len(list_of_potential_winners)

a = random.sample(range(count), 25)

b = random.sample(range(count), 1)

Thanks!
Bart

Jul 18 '05 #1

Subscribe Post Reply

2242

Jeff Epler

It sounds like your co-worker has re-written sample. random.sample(l, 1)
is the same as random.choice(l), so that's another source of inefficiency.

But why are *you* using
random.sample(range(len(x)), 25)
instead of
random.sample(x, 25)
?

Jeff

Jul 18 '05 #2

Bart Nessux

Jeff Epler wrote:

It sounds like your co-worker has re-written sample. random.sample(l, 1)
is the same as random.choice(l), so that's another source of inefficiency.

But why are *you* using
random.sample(range(len(x)), 25)
instead of
random.sample(x, 25)
?

Jeff

Because it works and it's fast and len(count) changes every drawing.

Jul 18 '05 #3

Paul Rubin

Bart Nessux <ba*********@hotmail.com> writes:

I am using method 'a' below to pick 25 names from a pool of 225. A
co-worker is using method 'b' by running it 25 times and throwing out
the winning name (names are associated with numbers) after each run
and then re-counting the list and doing it all over again.

My boss thinks that 'b' is somehow less fair than 'a',

Both are the same, as you can see by calculating the probability of
any given name being selected. What is the application, and the
computer environment? You may also need to worry about correlations
in the underlying Mersenne Twister PRNG. If the application is
something where randomness is very important (you're picking winners
for a big lottery or something) then you should use a better RNG.

Jul 18 '05 #4

Bart Nessux

Paul Rubin wrote:

Bart Nessux <ba*********@hotmail.com> writes:
I am using method 'a' below to pick 25 names from a pool of 225. A
co-worker is using method 'b' by running it 25 times and throwing out
the winning name (names are associated with numbers) after each run
and then re-counting the list and doing it all over again.

My boss thinks that 'b' is somehow less fair than 'a',

Both are the same, as you can see by calculating the probability of
any given name being selected. What is the application, and the
computer environment? You may also need to worry about correlations
in the underlying Mersenne Twister PRNG. If the application is
something where randomness is very important (you're picking winners
for a big lottery or something) then you should use a better RNG.

We're raffling off crock-pots... that's why I think this is OK for our
purposes.

Jul 18 '05 #5

Terry Reedy

"Bart Nessux" <ba*********@hotmail.com> wrote in message
news:bu**********@solaris.cc.vt.edu...

Jeff Epler wrote:
But why are *you* using
random.sample(range(len(x)), 25)
instead of
random.sample(x, 25)
?
Because it works and it's fast and len(count) changes every drawing.

I think you missed Jeff's point, which is that you are repeating part of
the work that sample tries to do for you. From the Lib Ref:
"
sample(sequence, k): Return a k length list of unique elements chosen from
the population sequence. Used for random sampling without replacement. New
in version 2.3.

Returns a new list containing elements from the population while leaving
the original population unchanged. The resulting list is in selection order
so that all sub-slices will also be valid random samples. This allows
raffle winners (the sample) to be partitioned into grand prize and second
place winners (the subslices).
"
When you get the sample from range(n), you have to use them as indexes into
x to get the actual list of names. But the indexing and extraction is what
sample would do if you gave it x instead of range(x)!

Terry J. Reedy

Jul 18 '05 #6

Mark Borgerding

Bart Nessux wrote:

Paul Rubin wrote:
Bart Nessux <ba*********@hotmail.com> writes:
I am using method 'a' below to pick 25 names from a pool of 225. A
co-worker is using method 'b' by running it 25 times and throwing out
the winning name (names are associated with numbers) after each run
and then re-counting the list and doing it all over again.

My boss thinks that 'b' is somehow less fair than 'a',

Both are the same, as you can see by calculating the probability of
any given name being selected. What is the application, and the
computer environment? You may also need to worry about correlations
in the underlying Mersenne Twister PRNG. If the application is
something where randomness is very important (you're picking winners
for a big lottery or something) then you should use a better RNG.

We're raffling off crock-pots... that's why I think this is OK for our
purposes.

Some will claim you cooked the numbers, even if it is a crock.
Let 'em blow off some steam, but don't chicken out. If you let them stew
for a day, they'll soften up and you'll eventually reach a cord.

Jul 18 '05 #7

Samuel Walters

| Bart Nessux said |

I am using method 'a' below to pick 25 names from a pool of 225. A
co-worker is using method 'b' by running it 25 times and throwing out the
winning name (names are associated with numbers) after each run and then
re-counting the list and doing it all over again.

My boss thinks that 'b' is somehow less fair than 'a', but the only thing
I see wrong with it is that it is really inefficient and ugly as it's
doing manually what 'a' does automatically, other than that I think the
outcome of both methods (25 unique winners from a pool of 225) are the
same. Anyone disagree with that and if so, please demonstrate how 'b'
isn't as fair as 'a'

count = len(list_of_potential_winners)

a = random.sample(range(count), 25)

b = random.sample(range(count), 1)

Thanks!
Bart

I looked at the code for random.sample, and found out that the two methods
are probabilistically equivalent. Neither is more or less fair than the
other.

You can, however, poke fun at your cow-orker for using
random.sample(range(count, 1) when random.randint(1,count) would have done
the exact same thing with the way he used random.sample.

HTH

Sam Walters.

P.S. The code for sample in random.py is very simple and fairly
straightforward. You should take a peek at it. The basic algorithm is to
make a list of winners, choose a random number, then if the winner is not
already in the list, add them. If the winner is already in the list,
retry until a new winner comes up. Repeat until you have the desired
number of winners.

--
Never forget the halloween documents.
http://www.opensource.org/halloween/
""" Where will Microsoft try to drag you today?
Do you really want to go there?"""

Jul 18 '05 #8

Bart Nessux

Terry Reedy wrote:

"Bart Nessux" <ba*********@hotmail.com> wrote in message
news:bu**********@solaris.cc.vt.edu...
Jeff Epler wrote:
But why are *you* using
random.sample(range(len(x)), 25)
instead of
random.sample(x, 25)
?

Because it works and it's fast and len(count) changes every drawing.

I think you missed Jeff's point, which is that you are repeating part of
the work that sample tries to do for you. From the Lib Ref:
"
sample(sequence, k): Return a k length list of unique elements chosen from
the population sequence. Used for random sampling without replacement. New
in version 2.3.

Returns a new list containing elements from the population while leaving
the original population unchanged. The resulting list is in selection order
so that all sub-slices will also be valid random samples. This allows
raffle winners (the sample) to be partitioned into grand prize and second
place winners (the subslices).
"
When you get the sample from range(n), you have to use them as indexes into
x to get the actual list of names. But the indexing and extraction is what
sample would do if you gave it x instead of range(x)!

Ahh, I see what you mean.

Jul 18 '05 #9

Bart Nessux

Terry Reedy wrote:

"Bart Nessux" <ba*********@hotmail.com> wrote in message
news:bu**********@solaris.cc.vt.edu...
Jeff Epler wrote:
But why are *you* using
random.sample(range(len(x)), 25)
instead of
random.sample(x, 25)
?

Because it works and it's fast and len(count) changes every drawing.

I think you missed Jeff's point, which is that you are repeating part of
the work that sample tries to do for you. From the Lib Ref:
"
sample(sequence, k): Return a k length list of unique elements chosen from
the population sequence. Used for random sampling without replacement. New
in version 2.3.

Returns a new list containing elements from the population while leaving
the original population unchanged. The resulting list is in selection order
so that all sub-slices will also be valid random samples. This allows
raffle winners (the sample) to be partitioned into grand prize and second
place winners (the subslices).
"
When you get the sample from range(n), you have to use them as indexes into
x to get the actual list of names. But the indexing and extraction is what
sample would do if you gave it x instead of range(x)!

Terry J. Reedy

Also, the below statement should be removed from random's
documentation... it's where I got the idea to do:

random.sample(range(len(x)), 25)
instead of
random.sample(x, 25)

"To choose a sample from a range of integers, use xrange
as an argument. This is especially fast and space efficient
for sampling from a large population: sample(xrange(10000000), 60)."

http://www.python.org/doc/current/li...le-random.html

Jul 18 '05 #10

Similar topics

226

reduce() anomaly?

by: Stephen C. Waterbury | last post by:

This seems like it ought to work, according to the description of reduce(), but it doesn't. Is this a bug, or am I missing something? Python 2.3.2 (#1, Oct 20 2003, 01:04:35) on linux2 Type...

Python

author index for Python Cookbook 2?

by: Andrew Dalke | last post by:

Is there an author index for the new version of the Python cookbook? As a contributor I got my comp version delivered today and my ego wanted some gratification. I couldn't find my entries. ...

Python

Python or PHP?

by: Lad | last post by:

Is anyone capable of providing Python advantages over PHP if there are any? Cheers, L.

Python

Encryption with Python?

by: Blake T. Garretson | last post by:

I want to save some sensitive data (passwords, PIN numbers, etc.) to disk in a secure manner in one of my programs. What is the easiest/best way to accomplish strong file encryption in Python? ...

Python

137

What is different with Python ?

by: Philippe C. Martin | last post by:

I apologize in advance for launching this post but I might get enlightment somehow (PS: I am _very_ agnostic ;-). - 1) I do not consider my intelligence/education above average - 2) I am very...

Python

VB6 OR VBA & Webbrowser DOM Tiny $50 Mini Project Programmer help

by: gunimpi | last post by:

http://www.vbforums.com/showthread.php?p=2745431#post2745431 ******************************************************** VB6 OR VBA & Webbrowser DOM Tiny $50 Mini Project Programmer help wanted...

Microsoft Access / VBA

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General