444,089 Members | 2,418 Online Need help? Post your question and get tips & solutions from a community of 444,089 IT Pros & Developers. It's quick & easy.

# implement random selection in Python

 P: n/a I need to implement a "random selection" algorithm which takes a list of [(obj, prob),...] as input. Each of the (obj, prob) represents how likely an object, "obj", should be selected based on its probability of "prob".To simplify the problem, assuming "prob" are integers, and the sum of all "prob" equals 100. For example, items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)] The algorithm will take a number "N", and a [(obj, prob),...] list as inputs, and randomly pick "N" objects based on the probabilities of the objects in the list. For N=1 this is pretty simply; the following code is sufficient to do the job. def foo(items): index = random.randint(0, 99) currentP = 0 for (obj, p) in items: currentP += w if currentP index: return obj But how about the general case, for N 1 and N < len(items)? Is there some clever algorithm using Python standard "random" package to do the trick? Thanks, Ben Nov 16 '07 #1
13 Replies

 P: n/a On Nov 15, 10:40ï¿½pm, Bruza

 P: n/a On Nov 15, 9:32 pm, "mensana...@aol.com"

 P: n/a Bruza wrote: No. That does not solve the problem. What I want is a function def randomPick(n, the_items): which will return n DISTINCT items from "the_items" such that the n items returned are according to their probabilities specified in the (item, pro) elements inside "the_items". If I understand correctly, both of the previous replies will output one item at a time independently, instead of returning n DISTINCT items at a time. from random import sample randomPick = lambda n,its : sample(eval('+'.join('[%r]*%r'%p for p in its)),n) hth :) Nov 16 '07 #4

 P: n/a This recipe of mine may help: http://aspn.activestate.com/ASPN/Coo.../Recipe/498229 Bye, bearophile Nov 16 '07 #5

 P: n/a Bruza wrote: No. That does not solve the problem. What I want is a function def randomPick(n, the_items): which will return n DISTINCT items from "the_items" such that the n items returned are according to their probabilities specified in the (item, pro) elements inside "the_items". and in the initial post you wrote : But how about the general case, for N 1 and N < len(items)? The problem is you need to make "the n items returned are according to their probabilities" more precise. "According to their probabilities" implies n INDEPENDENT picks, but this is contradictory with the n picks having to provide DISTINCT results (what clearly constrains picks relative to each other). Of course there are obvious ways to combine the results of random choices of single items to obtain a set like you want, but it is not obvious that they are equivalent. This is the most simple-minded : def randomPick(n, the_items) : assert n

 P: n/a Bruza

 P: n/a Boris Borcic wrote: Bruza wrote: >No. That does not solve the problem. What I want is a function def randomPick(n, the_items):which will return n DISTINCT items from "the_items" such thatthe n items returned are according to their probabilities specifiedin the (item, pro) elements inside "the_items". and in the initial post you wrote : But how about the general case, for N 1 and N < len(items)? The problem is you need to make "the n items returned are according to their probabilities" more precise. "According to their probabilities" implies n INDEPENDENT picks, but this is contradictory with the n picks having to provide DISTINCT results (what clearly constrains picks relative to each other). Of course there are obvious ways to combine the results of random choices of single items to obtain a set like you want, but it is not obvious that they are equivalent. This is the most simple-minded : def randomPick(n, the_items) : assert n

 P: n/a Bruza wrote: I need to implement a "random selection" algorithm which takes a list of [(obj, prob),...] as input. Each of the (obj, prob) represents how likely an object, "obj", should be selected based on its probability of "prob".To simplify the problem, assuming "prob" are integers, and the sum of all "prob" equals 100. For example, items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)] The algorithm will take a number "N", and a [(obj, prob),...] list as inputs, and randomly pick "N" objects based on the probabilities of the objects in the list. For N=1 this is pretty simply; the following code is sufficient to do the job. def foo(items): index = random.randint(0, 99) currentP = 0 for (obj, p) in items: currentP += w if currentP index: return obj But how about the general case, for N 1 and N < len(items)? Is there some clever algorithm using Python standard "random" package to do the trick? I think you need to clarify what you want to do. The "probs" are clearly not probabilities. Are they counts of items? Are you then sampling without replacement? When you say N < len(items) do you mean N <= sum of the "probs"? Duncabn Nov 16 '07 #9

 P: n/a On Nov 16, 6:58 am, duncan smith wrote: Bruza wrote: I need to implement a "random selection" algorithm which takes a list of [(obj, prob),...] as input. Each of the (obj, prob) represents how likely an object, "obj", should be selected based on its probability of "prob".To simplify the problem, assuming "prob" are integers, and the sum of all "prob" equals 100. For example, items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)] The algorithm will take a number "N", and a [(obj, prob),...] list as inputs, and randomly pick "N" objects based on the probabilities of the objects in the list. For N=1 this is pretty simply; the following code is sufficient to do the job. def foo(items): index = random.randint(0, 99) currentP = 0 for (obj, p) in items: currentP += w if currentP index: return obj But how about the general case, for N 1 and N < len(items)? Is there some clever algorithm using Python standard "random" package to do the trick? I think you need to clarify what you want to do. The "probs" are clearly not probabilities. Are they counts of items? Are you then sampling without replacement? When you say N < len(items) do you mean N <= sum of the "probs"? Duncabn I think I need to explain on the probability part: the "prob" is a relative likelihood that the object will be included in the output list. So, in my example input of items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)] So, for any size of N, 'Tom' (with prob of 45) will be more likely to be included in the output list of N distinct member than 'Mary' (prob of 30) and much more likely than that of 'John' (with prob of 10). I know "prob" is not exactly the "probability" in the context of returning a multiple member list. But what I want is a way to "favor" some member in a selection process. So far, only Boris's solution is closest (but not quite) to what I need, which returns a list of N distinct object from the input "items". However, I tried with input of items = [('Mary',1), ('John', 1), ('Tom', 1), ('Jane', 97)] and have a repeated calling of Ben Nov 17 '07 #10

 P: n/a On Nov 16, 4:47 pm, Bruza wrote: Bruza wrote: I need to implement a "random selection" algorithm which takes a list of [(obj, prob),...] as input. Each of the (obj, prob) represents how likely an object, "obj", should be selected based on its probability of "prob".To simplify the problem, assuming "prob" are integers, and the sum of all "prob" equals 100. For example, items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)] The algorithm will take a number "N", and a [(obj, prob),...] list as inputs, and randomly pick "N" objects based on the probabilities of the objects in the list. For N=1 this is pretty simply; the following code is sufficient to do the job. def foo(items): index = random.randint(0, 99) currentP = 0 for (obj, p) in items: currentP += w if currentP index: return obj But how about the general case, for N 1 and N < len(items)? Is there some clever algorithm using Python standard "random" package to do the trick? I think you need to clarify what you want to do. The "probs" are clearly not probabilities. Are they counts of items? Are you then sampling without replacement? When you say N < len(items) do you mean N <= sum of the "probs"? Duncabn I think I need to explain on the probability part: the "prob" is a relative likelihood that the object will be included in the output list. So, in my example input of items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)] So, for any size of N, 'Tom' (with prob of 45) will be more likely to be included in the output list of N distinct member than 'Mary' (prob of 30) and much more likely than that of 'John' (with prob of 10). I know "prob" is not exactly the "probability" in the context of returning a multiple member list. But what I want is a way to "favor" some member in a selection process. So far, only Boris's solution is closest (but not quite) to what I need, which returns a list of N distinct object from the input "items". However, I tried with input of items = [('Mary',1), ('John', 1), ('Tom', 1), ('Jane', 97)] and have a repeated calling of Ben OOPS. I pressed the Send too fast. The problem w/ Boris's solution is that after repeated calling of randomPick(3,items), 'Jane' is not the most "frequent appearing" member in all the out list of 3 member lists... Nov 17 '07 #11

 P: n/a Knuth says to pick N distinct records from a collection where the probability is equal you should: first fill up N records by chosing the first seen. if less than N were in the collection, quit. otherwise, t = (the number of items looked at) or N to start. while your not at the end of the collection: increment t generate a random number between 0 and t =K if K < N: add it to your collection at position K (replacing the previous item). Now, to incorporate probability is another question . . . Find the largest probability. Assume it has 100% probability. Calculate the probability of each item accordingly. Take the randomly generated number and multiply it by the probability. Take the random number minus the (random number times to probability). If it falls in range, then you replace like usual. You should find that the max probability will always appear in the zeroth position if it is not replaced by another value. Now, this does not guarantee that the highest probability value will show up first in the list, since that is the same as sorting by the probability. It is just a way of increasing the probability of making the value fall in the range as the probability varies. I am not guaranteeing this even works. I am seeing that there is some collision among the numbers, but it will work for the most part. Nov 17 '07 #12

 P: n/a On Sat, 17 Nov 2007 13:51:16 -0800, je**********@gmail.com wrote: I am not guaranteeing this even works. I am seeing that there is some collision among the numbers, but it will work for the most part. "Work for the most part" -- is that another way of saying "Apart from the bugs, this is bug-free"? -- Steven. Nov 17 '07 #13

 P: n/a On 2007-11-17, Bruza >random.seed(0) >>sample_bp(['a', 'b'], [1, 5], 2) ['b', 'a'] >>sample_bp(['a', 'b', 'c'], [1, 5, 2], 3) ['b', 'a', 'c'] """ if k len(seq): raise ValueError('sample size greater than population') probs = build_probs(probabilities) rv = [] while k 0: j = random_prob(probs) rv.append(probs[j]) remove_prob(probs, j) k -= 1 return [seq[i] for i in rv] def build_probs(probabilities): """ Receives a list of integers, and returns list of ranges and original indices. >>build_probs([8, 10, 7]) [(0, 8, 0), (8, 18, 1), (18, 25, 2)] >>build_probs([1, 5, 8, 2, 3, 7]) [(0, 1, 0), (1, 6, 1), (6, 14, 2), (14, 16, 3), (16, 19, 4), (19, 26, 5)] """ k = 0 probs = [] for i, p in enumerate(probabilities): if p < 0: raise ValueError('negative probability') probs.append((k, k+p, i)) k = k+p return probs def random_prob(probs): """ Return the index of a weighted random element of prob. >>random.seed(0) >>for x in xrange(20): ... print random_prob(build_probs([1, 5, 8, 2, 3, 7])), 5 5 2 2 2 2 5 2 2 3 5 2 2 5 4 2 5 5 5 5 >>random_prob([(0, 15, 0)]) 0 """ i = random.randrange(probs[-1]) # Binary search for the element whose range contains i hi = len(probs) lo = 0 while lo < hi: mid = (lo+hi)//2 begin, end, _ = probs[mid] if i >= begin and i < end: return mid elif i >= end: lo = mid+1 else: hi = mid def remove_prob(probs, i): """ Remove element j from the probability list, adjusting ranges as needed. >>prob = [(0, 12, 0), (12, 15, 1), (15, 25, 2)]remove_prob(prob, 1)prob [(0, 12, 0), (12, 22, 2)] """ begin, end, _ = probs[i] diff = end - begin j = i+1 while j < len(probs): begin, end, index = probs[j] probs[j] = (begin-diff, end-diff, index) j += 1 del probs[i] This was the most simple-minded approach I could think of, so it might serve as a reference against a more efficient approach. Although thorough testing of such a bizarre function boggles my mind. I toyed with sorting the probability list from greatest to lowest, which would make a linear search fast, but unfortunately it would slow down removing probabilities. -- Neil Cerutti I make love to pressure. --Stephen Jackson Nov 19 '07 #14

### This discussion thread is closed

Replies have been disabled for this discussion. 