By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,089 Members | 2,418 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,089 IT Pros & Developers. It's quick & easy.

implement random selection in Python

P: n/a
I need to implement a "random selection" algorithm which takes a list
of [(obj, prob),...] as input. Each of the (obj, prob) represents how
likely an object, "obj", should be selected based on its probability
of
"prob".To simplify the problem, assuming "prob" are integers, and the
sum of all "prob" equals 100. For example,

items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]

The algorithm will take a number "N", and a [(obj, prob),...] list as
inputs, and randomly pick "N" objects based on the probabilities of
the
objects in the list.
For N=1 this is pretty simply; the following code is sufficient to do
the job.

def foo(items):
index = random.randint(0, 99)
currentP = 0
for (obj, p) in items:
currentP += w
if currentP index:
return obj

But how about the general case, for N 1 and N < len(items)? Is there
some clever algorithm using Python standard "random" package to do
the trick?

Thanks,

Ben
Nov 16 '07 #1
Share this Question
Share on Google+
13 Replies


P: n/a
On Nov 15, 10:40�pm, Bruza <benr...@gmail.comwrote:
I need to implement a "random selection" algorithm which takes a list
of [(obj, prob),...] as input. Each of the (obj, prob) represents how
likely an object, "obj", should be selected based on its probability
of
"prob".To simplify the problem, assuming "prob" are integers, and the
sum of all "prob" equals 100. For example,

� �items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]

The algorithm will take a number "N", and a [(obj, prob),...] list as
inputs, and randomly pick "N" objects based on the probabilities of
the
objects in the list.

For N=1 this is pretty simply; the following code is sufficient to do
the job.

def foo(items):
� � index = random.randint(0, 99)
� � currentP = 0
� � for (obj, p) in items:
� � � �currentP += w
� � � �if currentP index:
� � � � � return obj

But how about the general case, for N 1 and N < len(items)? Is there
some clever algorithm using Python standard "random" package to do
the trick?

Thanks,

Ben
What do you think of this?

import random
N = 100
items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
the_items = []
for i,j in items:
the_items.extend([i]*j)
histogram = {}
for i in xrange(N):
chosen = random.choice(the_items)
print chosen,
if chosen in histogram:
histogram[chosen] += 1
else:
histogram[chosen] = 1
print
print
for i in histogram:
print '%4s: %d' % (i,histogram[i])

## John Mary Jane Tom Tom Mary Mary Tom John John Tom John Tom
## John Mary Mary Mary John Tom Tom John Mary Mary Tom Mary
## Mary John Tom Jane Jane Jane John Tom Jane Tom Tom John Tom
## Tom Mary Tom Tom Mary Tom Mary Tom Tom Tom Tom Mary Mary Tom
## Mary Tom Mary Tom Tom Jane Tom Mary Tom Jane Tom Tom Tom Tom
## Tom Mary Tom Jane Tom Mary Mary Jane Mary John Mary Mary Tom
## Mary Mary Tom Mary John Tom Tom Tom Tom Mary Jane Mary Tom
## Mary Tom Tom Jane Tom Mary Mary Tom
##
## Jane: 11
## John: 12
## Mary: 32
## Tom: 45
Nov 16 '07 #2

P: n/a
On Nov 15, 9:32 pm, "mensana...@aol.com" <mensana...@aol.comwrote:
On Nov 15, 10:40�pm, Bruza <benr...@gmail.comwrote:
I need to implement a "random selection" algorithm which takes a list
of [(obj, prob),...] as input. Each of the (obj, prob) represents how
likely an object, "obj", should be selected based on its probability
of
"prob".To simplify the problem, assuming "prob" are integers, and the
sum of all "prob" equals 100. For example,
� �items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
The algorithm will take a number "N", and a [(obj, prob),...] list as
inputs, and randomly pick "N" objects based on the probabilities of
the
objects in the list.
For N=1 this is pretty simply; the following code is sufficient to do
the job.
def foo(items):
� � index = random.randint(0, 99)
� � currentP = 0
� � for (obj, p) in items:
� � � �currentP += w
� � � �if currentP index:
� � � � � return obj
But how about the general case, for N 1 and N < len(items)? Is there
some clever algorithm using Python standard "random" package to do
the trick?
Thanks,
Ben

What do you think of this?

import random
N = 100
items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
the_items = []
for i,j in items:
the_items.extend([i]*j)
histogram = {}
for i in xrange(N):
chosen = random.choice(the_items)
print chosen,
if chosen in histogram:
histogram[chosen] += 1
else:
histogram[chosen] = 1
print
print
for i in histogram:
print '%4s: %d' % (i,histogram[i])

## John Mary Jane Tom Tom Mary Mary Tom John John Tom John Tom
## John Mary Mary Mary John Tom Tom John Mary Mary Tom Mary
## Mary John Tom Jane Jane Jane John Tom Jane Tom Tom John Tom
## Tom Mary Tom Tom Mary Tom Mary Tom Tom Tom Tom Mary Mary Tom
## Mary Tom Mary Tom Tom Jane Tom Mary Tom Jane Tom Tom Tom Tom
## Tom Mary Tom Jane Tom Mary Mary Jane Mary John Mary Mary Tom
## Mary Mary Tom Mary John Tom Tom Tom Tom Mary Jane Mary Tom
## Mary Tom Tom Jane Tom Mary Mary Tom
##
## Jane: 11
## John: 12
## Mary: 32
## Tom: 45
No. That does not solve the problem. What I want is a function

def randomPick(n, the_items):

which will return n DISTINCT items from "the_items" such that
the n items returned are according to their probabilities specified
in the (item, pro) elements inside "the_items".

If I understand correctly, both of the previous replies will output
one item at a time independently, instead of returning n DISTINCT
items at a time.

Nov 16 '07 #3

P: n/a
Bruza wrote:
No. That does not solve the problem. What I want is a function

def randomPick(n, the_items):

which will return n DISTINCT items from "the_items" such that
the n items returned are according to their probabilities specified
in the (item, pro) elements inside "the_items".

If I understand correctly, both of the previous replies will output
one item at a time independently, instead of returning n DISTINCT
items at a time.
from random import sample
randomPick = lambda n,its : sample(eval('+'.join('[%r]*%r'%p for p in its)),n)

hth :)

Nov 16 '07 #4

P: n/a
This recipe of mine may help:

http://aspn.activestate.com/ASPN/Coo.../Recipe/498229

Bye,
bearophile
Nov 16 '07 #5

P: n/a
Bruza wrote:
No. That does not solve the problem. What I want is a function

def randomPick(n, the_items):

which will return n DISTINCT items from "the_items" such that
the n items returned are according to their probabilities specified
in the (item, pro) elements inside "the_items".
and in the initial post you wrote :
But how about the general case, for N 1 and N < len(items)?
The problem is you need to make "the n items returned are according
to their probabilities" more precise. "According to their probabilities" implies
n INDEPENDENT picks, but this is contradictory with the n picks having to
provide DISTINCT results (what clearly constrains picks relative to each other).

Of course there are obvious ways to combine the results of random choices of
single items to obtain a set like you want, but it is not obvious that they are
equivalent.

This is the most simple-minded :

def randomPick(n, the_items) :
assert n<len(the_items)
result = set()
while len(result)<n :
result.add(singlePick(the_items))
return sorted(result)

This is another (but it won't work with your version of singlePick as it is,
although it would with that provided by the other posters) :

def randomPick(n, the_items) :
result = []
items = dict(the_items)
for k in range(n) :
pick = singlePick(items.items())
result.append(pick)
del items[pick]
return result

I would be surprised if they had exactly the same statistical properties, IOW,
if they did fit the same exact interpretation of "according to their probabilities".
Nov 16 '07 #6

P: n/a
Bruza <be*****@gmail.comwrites:
But how about the general case, for N 1 and N < len(items)? Is there
some clever algorithm using Python standard "random" package
Yeah, I'm not sure what the name for it is, but there'ss a well known
algorithm that's sort of an online verison of random.choice. The
famous N=1 example where all probabilities are equal goes:

# choose a random element of seq
for k,s in enumerate(seq):
if random() < 1.0/(k+1):
choice = s

you should be able to generalize this to N items with different
probabilities.
Nov 16 '07 #7

P: n/a
Boris Borcic wrote:
Bruza wrote:
>No. That does not solve the problem. What I want is a function

def randomPick(n, the_items):

which will return n DISTINCT items from "the_items" such that
the n items returned are according to their probabilities specified
in the (item, pro) elements inside "the_items".

and in the initial post you wrote :
But how about the general case, for N 1 and N < len(items)?

The problem is you need to make "the n items returned are according
to their probabilities" more precise. "According to their probabilities" implies
n INDEPENDENT picks, but this is contradictory with the n picks having to
provide DISTINCT results (what clearly constrains picks relative to each other).

Of course there are obvious ways to combine the results of random choices of
single items to obtain a set like you want, but it is not obvious that they are
equivalent.

This is the most simple-minded :

def randomPick(n, the_items) :
assert n<len(the_items)
result = set()
while len(result)<n :
result.add(singlePick(the_items))
return sorted(result)

This is another (but it won't work with your version of singlePick as it is,
although it would with that provided by the other posters) :

def randomPick(n, the_items) :
result = []
items = dict(the_items)
for k in range(n) :
pick = singlePick(items.items())
result.append(pick)
del items[pick]
return result

I would be surprised if they had exactly the same statistical properties, IOW,
if they did fit the same exact interpretation of "according to their probabilities".

yet another one, constructing a list of n-sets first, and then picking one;
like the other solutions, it doesn't optimize for repeated use.

def pickn(items,n) :
"yields all n-sublists of (destructed) items"
if n<=len(items) :
if n :
item = items.pop()
for res in pickn(items[:],n) :
yield res
for res in pickn(items,n-1) :
res.append(item)
yield res
else :
yield []
def randomPick(n,the_items) :
"randomly pick n distinct members of the_items"
the_sets = list(pickn(the_items[:],n))
divisor = len(the_sets)*float(n)/len(the_items)
for k,s in enumerate(the_sets) :
the_sets[k] = (sorted(who for who,_ in s),
int(1+sum(p for _,p in s)/divisor)) # mhh...
return singlePick(the_sets)

Nov 16 '07 #8

P: n/a
Bruza wrote:
I need to implement a "random selection" algorithm which takes a list
of [(obj, prob),...] as input. Each of the (obj, prob) represents how
likely an object, "obj", should be selected based on its probability
of
"prob".To simplify the problem, assuming "prob" are integers, and the
sum of all "prob" equals 100. For example,

items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]

The algorithm will take a number "N", and a [(obj, prob),...] list as
inputs, and randomly pick "N" objects based on the probabilities of
the
objects in the list.
For N=1 this is pretty simply; the following code is sufficient to do
the job.

def foo(items):
index = random.randint(0, 99)
currentP = 0
for (obj, p) in items:
currentP += w
if currentP index:
return obj

But how about the general case, for N 1 and N < len(items)? Is there
some clever algorithm using Python standard "random" package to do
the trick?
I think you need to clarify what you want to do. The "probs" are
clearly not probabilities. Are they counts of items? Are you then
sampling without replacement? When you say N < len(items) do you mean N
<= sum of the "probs"?

Duncabn
Nov 16 '07 #9

P: n/a
On Nov 16, 6:58 am, duncan smith <buzz...@urubu.freeserve.co.uk>
wrote:
Bruza wrote:
I need to implement a "random selection" algorithm which takes a list
of [(obj, prob),...] as input. Each of the (obj, prob) represents how
likely an object, "obj", should be selected based on its probability
of
"prob".To simplify the problem, assuming "prob" are integers, and the
sum of all "prob" equals 100. For example,
items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
The algorithm will take a number "N", and a [(obj, prob),...] list as
inputs, and randomly pick "N" objects based on the probabilities of
the
objects in the list.
For N=1 this is pretty simply; the following code is sufficient to do
the job.
def foo(items):
index = random.randint(0, 99)
currentP = 0
for (obj, p) in items:
currentP += w
if currentP index:
return obj
But how about the general case, for N 1 and N < len(items)? Is there
some clever algorithm using Python standard "random" package to do
the trick?

I think you need to clarify what you want to do. The "probs" are
clearly not probabilities. Are they counts of items? Are you then
sampling without replacement? When you say N < len(items) do you mean N
<= sum of the "probs"?

Duncabn
I think I need to explain on the probability part: the "prob" is a
relative likelihood that the object will be included in the output
list. So, in my example input of

items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]

So, for any size of N, 'Tom' (with prob of 45) will be more likely to
be included in the output list of N distinct member than 'Mary' (prob
of 30) and much more likely than that of 'John' (with prob of 10).

I know "prob" is not exactly the "probability" in the context of
returning a multiple member list. But what I want is a way to "favor"
some member in a selection process.

So far, only Boris's solution is closest (but not quite) to what I
need, which returns a list of N distinct object from the input
"items". However, I tried with input of

items = [('Mary',1), ('John', 1), ('Tom', 1), ('Jane', 97)]

and have a repeated calling of

Ben
Nov 17 '07 #10

P: n/a
On Nov 16, 4:47 pm, Bruza <benr...@gmail.comwrote:
On Nov 16, 6:58 am, duncan smith <buzz...@urubu.freeserve.co.uk>
wrote:
Bruza wrote:
I need to implement a "random selection" algorithm which takes a list
of [(obj, prob),...] as input. Each of the (obj, prob) represents how
likely an object, "obj", should be selected based on its probability
of
"prob".To simplify the problem, assuming "prob" are integers, and the
sum of all "prob" equals 100. For example,
items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
The algorithm will take a number "N", and a [(obj, prob),...] list as
inputs, and randomly pick "N" objects based on the probabilities of
the
objects in the list.
For N=1 this is pretty simply; the following code is sufficient to do
the job.
def foo(items):
index = random.randint(0, 99)
currentP = 0
for (obj, p) in items:
currentP += w
if currentP index:
return obj
But how about the general case, for N 1 and N < len(items)? Is there
some clever algorithm using Python standard "random" package to do
the trick?
I think you need to clarify what you want to do. The "probs" are
clearly not probabilities. Are they counts of items? Are you then
sampling without replacement? When you say N < len(items) do you mean N
<= sum of the "probs"?
Duncabn

I think I need to explain on the probability part: the "prob" is a
relative likelihood that the object will be included in the output
list. So, in my example input of

items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]

So, for any size of N, 'Tom' (with prob of 45) will be more likely to
be included in the output list of N distinct member than 'Mary' (prob
of 30) and much more likely than that of 'John' (with prob of 10).

I know "prob" is not exactly the "probability" in the context of
returning a multiple member list. But what I want is a way to "favor"
some member in a selection process.

So far, only Boris's solution is closest (but not quite) to what I
need, which returns a list of N distinct object from the input
"items". However, I tried with input of

items = [('Mary',1), ('John', 1), ('Tom', 1), ('Jane', 97)]

and have a repeated calling of

Ben
OOPS. I pressed the Send too fast.

The problem w/ Boris's solution is that after repeated calling of
randomPick(3,items), 'Jane' is not the most "frequent appearing"
member in all the out list of 3 member lists...

Nov 17 '07 #11

P: n/a
Knuth says to pick N distinct records from a collection where the
probability is equal you should:

first fill up N records by chosing the first seen.

if less than N were in the collection, quit.

otherwise, t = (the number of items looked at) or N to start.

while your not at the end of the collection:
increment t
generate a random number between 0 and t =K
if K < N:
add it to your collection at position K (replacing the previous
item).

Now, to incorporate probability is another question . . .

Find the largest probability. Assume it has 100% probability.
Calculate the probability of each item accordingly. Take the randomly
generated number and multiply it by the probability. Take the random
number minus the (random number times to probability). If it falls in
range, then you replace like usual. You should find that the max
probability will always appear in the zeroth position if it is not
replaced by another value. Now, this does not guarantee that the
highest probability value will show up first in the list, since that
is the same as sorting by the probability. It is just a way of
increasing the probability of making the value fall in the range as
the probability varies.

I am not guaranteeing this even works. I am seeing that there is some
collision among the numbers, but it will work for the most part.
Nov 17 '07 #12

P: n/a
On Sat, 17 Nov 2007 13:51:16 -0800, je**********@gmail.com wrote:
I am not guaranteeing this even works. I am seeing that there is some
collision among the numbers, but it will work for the most part.
"Work for the most part" -- is that another way of saying "Apart from the
bugs, this is bug-free"?
--
Steven.
Nov 17 '07 #13

P: n/a
On 2007-11-17, Bruza <be*****@gmail.comwrote:
OOPS. I pressed the Send too fast.

The problem w/ Boris's solution is that after repeated calling
of randomPick(3,items), 'Jane' is not the most "frequent
appearing" member in all the out list of 3 member lists...
How does this solution fair against your spec?

def sample_bp(seq, probabilities, k):
""" Return k distinct random items from seq, with probabilities specified
as weighted integers in a list.
>>random.seed(0)
>>sample_bp(['a', 'b'], [1, 5], 2)
['b', 'a']
>>sample_bp(['a', 'b', 'c'], [1, 5, 2], 3)
['b', 'a', 'c']

"""

if k len(seq):
raise ValueError('sample size greater than population')
probs = build_probs(probabilities)
rv = []
while k 0:
j = random_prob(probs)
rv.append(probs[j][2])
remove_prob(probs, j)
k -= 1
return [seq[i] for i in rv]

def build_probs(probabilities):
""" Receives a list of integers, and returns list of ranges and original
indices.
>>build_probs([8, 10, 7])
[(0, 8, 0), (8, 18, 1), (18, 25, 2)]
>>build_probs([1, 5, 8, 2, 3, 7])
[(0, 1, 0), (1, 6, 1), (6, 14, 2), (14, 16, 3), (16, 19, 4), (19, 26, 5)]
"""
k = 0
probs = []
for i, p in enumerate(probabilities):
if p < 0:
raise ValueError('negative probability')
probs.append((k, k+p, i))
k = k+p
return probs

def random_prob(probs):
""" Return the index of a weighted random element of prob.
>>random.seed(0)
>>for x in xrange(20):
... print random_prob(build_probs([1, 5, 8, 2, 3, 7])),
5 5 2 2 2 2 5 2 2 3 5 2 2 5 4 2 5 5 5 5
>>random_prob([(0, 15, 0)])
0

"""
i = random.randrange(probs[-1][1])
# Binary search for the element whose range contains i
hi = len(probs)
lo = 0
while lo < hi:
mid = (lo+hi)//2
begin, end, _ = probs[mid]
if i >= begin and i < end: return mid
elif i >= end: lo = mid+1
else: hi = mid

def remove_prob(probs, i):
""" Remove element j from the probability list, adjusting ranges as needed.
>>prob = [(0, 12, 0), (12, 15, 1), (15, 25, 2)]
remove_prob(prob, 1)
prob
[(0, 12, 0), (12, 22, 2)]

"""
begin, end, _ = probs[i]
diff = end - begin
j = i+1
while j < len(probs):
begin, end, index = probs[j]
probs[j] = (begin-diff, end-diff, index)
j += 1
del probs[i]

This was the most simple-minded approach I could think of, so it
might serve as a reference against a more efficient approach.
Although thorough testing of such a bizarre function boggles my
mind.

I toyed with sorting the probability list from greatest to
lowest, which would make a linear search fast, but unfortunately
it would slow down removing probabilities.

--
Neil Cerutti
I make love to pressure. --Stephen Jackson
Nov 19 '07 #14

This discussion thread is closed

Replies have been disabled for this discussion.