469,275 Members | 1,471 Online

# Weighted "random" selection from list of lists

Hello -

I'm probably missing something here, but I have a problem where I am
populating a list of lists like this:

list1 = [ 'a', 'b', 'c' ]
list2 = [ 'dog', 'cat', 'panda' ]
list3 = [ 'blue', 'red', 'green' ]

main_list = [ list1, list2, list3 ]

Once main_list is populated, I want to build a sequence from items
within the lists, "randomly" with a defined percentage of the sequence
coming for the various lists. For example, if I want a 6 item
sequence, I might want:

60% from list 1 (main_list)
30% from list 2 (main_list)
10% from list 3 (main_list)

I know how to pull a random sequence (using random()) from the lists,
but I'm not sure how to pick it with the desired percentages.

Any help is appreciated, thanks

-jesse
Oct 8 '05 #1
4 4767 Jesse Noller wrote:

60% from list 1 (main_list)
30% from list 2 (main_list)
10% from list 3 (main_list)

I know how to pull a random sequence (using random()) from the lists,
but I'm not sure how to pick it with the desired percentages.

Any help is appreciated, thanks

-jesse

Just add up the total of all lists.

total = len(list1)+len(list2)+len(list3)
n1 = .60 * total # number from list 1
n2 = .30 * total # number from list 2
n3 = .10 * total # number from list 3

You'll need to decide how to handle when a list has too few items in it.

Cheers,
Ron
Oct 8 '05 #2
Jesse Noller wrote:
I'm probably missing something here, but I have a problem where I am
populating a list of lists like this:

list1 = [ 'a', 'b', 'c' ]
list2 = [ 'dog', 'cat', 'panda' ]
list3 = [ 'blue', 'red', 'green' ]

main_list = [ list1, list2, list3 ]

Once main_list is populated, I want to build a sequence from items
within the lists, "randomly" with a defined percentage of the sequence
coming for the various lists. For example, if I want a 6 item
sequence, I might want:

60% from list 1 (main_list)
30% from list 2 (main_list)
10% from list 3 (main_list)

I know how to pull a random sequence (using random()) from the lists,
but I'm not sure how to pick it with the desired percentages.

If the percentages can be normalized to small integral numbers, just make a
pool where each list is repeated according to its weight, e. g.
list1 occurs 6, list2 3 times, and list3 once:

pools =[list1, list2, list3]
weights = [6, 3, 1]
sample_size = 10

weighted_pools = []
for p, w in zip(pools, weights):
weighted_pools.extend([p]*w)

sample = [random.choice(random.choice(weighted_pools))
for _ in xrange(sample_size)]
Another option is to use bisect() to choose a pool:

pools =[list1, list2, list3]
sample_size = 10

def isum(items, sigma=0.0):
for item in items:
sigma += item
yield sigma

cumulated_weights = list(isum([60, 30, 10], 0))
sigma = cumulated_weights[-1]

sample = []
for _ in xrange(sample_size):
pool = pools[bisect.bisect(cumulated_weights, random.random()*sigma)]
sample.append(random.choice(pool))

(all code untested)

Peter
Oct 8 '05 #3
Jesse Noller wrote:
<paraphrased>
Once main_list is populated, I want to build a sequence from items
within the lists, "randomly" with a defined percentage of the sequence
coming for the various lists. For example:
60% from list 1 (main_list), 30% from list 2 (main_list), 10% from list 3 (main_list)

import bisect, random
main_list = [['a', 'b', 'c'],
['dog', 'cat', 'panda'],
['blue', 'red', 'green']]
weights = [60, 30, 10]

cumulative = []
total = 0
for index, value in enumerate(weights):
total += value
cumulative.append(total)

for i in range(20):
score = random.random() * total
index = bisect.bisect(cumulative, score)
print random.choice(main_list[index]),
--
-Scott David Daniels
sc***********@acm.org
Oct 8 '05 #4
On Sat, 08 Oct 2005 12:48:26 -0400, Jesse Noller wrote:
Once main_list is populated, I want to build a sequence from items
within the lists, "randomly" with a defined percentage of the sequence
coming for the various lists. For example, if I want a 6 item
sequence, I might want:

60% from list 1 (main_list)
30% from list 2 (main_list)
10% from list 3 (main_list)

If you are happy enough to match the percentages statistically rather than
exactly, simply do something like this:

pr = random.random()
if pr < 0.6:
list_num = 0
elif pr < 0.9:
list_num = 1
else:
list_num = 2
return random.choice(main_list[list_num])

or however you want to extract an item.

On average, this will mean 60% of the items will come from list1 etc, but
for small numbers of trials, you may have significant differences.

--
Steven.

Oct 9 '05 #5

### This discussion thread is closed

Replies have been disabled for this discussion.