443,302 Members | 1,812 Online Need help? Post your question and get tips & solutions from a community of 443,302 IT Pros & Developers. It's quick & easy.

# Biased random?

 P: n/a Hi, I have a list of items, and need to choose several elements from it, "almost random". The catch is that the elements from the beginning should have more chance of being selected than those at the end (how much more? I don't care how the "envelope" of probability looks like at this point - can be linear). I see that there are several functions in Python standard libraries for various distribution, but is there an easy pythonic way to make them do what I need? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG0zdFldnAQVacBcgRAg/yAJ9ZVWfDptl4Qxl+kSIpZqoi00qNYgCg+YNf jVEXRG+grsbVDrRH1wXSw14= =mGlJ -----END PGP SIGNATURE----- Aug 27 '07 #1
22 Replies

 P: n/a How about decorating your list of elements with an additional value, which indicates the weight of that element. A value of 1 will indicate 'as likely as any other', < 1 will be 'less likely than' any other and 1 will be 'more likely than any other'. Then create a sorted list based on the combined value of weight and the output of random(); from this take the first N many elements to meet your requirements. hth Jon On 27 Aug, 21:42, Ivan Voras

 P: n/a Ivan Voras wrote: Hi, I have a list of items, and need to choose several elements from it, "almost random". The catch is that the elements from the beginning should have more chance of being selected than those at the end (how much more? I don't care how the "envelope" of probability looks like at this point - can be linear). I see that there are several functions in Python standard libraries for various distribution, but is there an easy pythonic way to make them do what I need? That's weird. random.randint(a,b) will be enough for most cases. Test your system to see the distribution is uniform with something like: ---- import random dist = {} for z in xrange(100000): u = random.randint(1,10) try: dist[u] = dist[u] + 1 except KeyError: dist[u] = 1 print dist ---- Aug 27 '07 #3

 P: n/a On 2007-08-27, Jun-geun Park I have a list of items, and need to choose several elementsfrom it, "almost random". The catch is that the elements fromthe beginning should have more chance of being selected thanthose at the end (how much more? I don't care how the"envelope" of probability looks like at this point - can belinear). I see that there are several functions in Pythonstandard libraries for various distribution, but is there aneasy pythonic way to make them do what I need? That's weird. random.randint(a,b) will be enough for most cases. Test your system to see the distribution is uniform with something like: Except he wants a non-uniform distribution. -- Grant Edwards grante Yow! I'm changing the at CHANNEL ... But all I get visi.com is commercials for "RONCO MIRACLE BAMBOO STEAMERS"! Aug 27 '07 #4

 P: n/a On Aug 27, 3:42 pm, Ivan Voras

 P: n/a On 8/27/07, J. Cliff Dyer

 P: n/a Jun-geun Park wrote: Ivan Voras wrote: >Hi,I have a list of items, and need to choose several elements from it,"almost random". The catch is that the elements from the beginningshould have more chance of being selected than those at the end (howmuch more? I don't care how the "envelope" of probability looks like atthis point - can be linear). I see that there are several functions inPython standard libraries for various distribution, but is there an easypythonic way to make them do what I need? That's weird. random.randint(a,b) will be enough for most cases. Test your system to see the distribution is uniform with something like: The distribution is uniform. However, he wants a way to get non-uniform sampling of that list. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco Aug 27 '07 #7

 P: n/a Ivan Voras wrote: I have a list of items, and need to choose several elements from it, "almost random". The catch is that the elements from the beginning should have more chance of being selected than those at the end (how much more? I don't care how the "envelope" of probability looks like at this point - can be linear). I see that there are several functions in Python standard libraries for various distribution, but is there an easy pythonic way to make them do what I need? import random def choose(samples, **kwargs): total = sum(kwargs.values()) result = [] for n in range(samples): choice = random.random() * total for key, weight in kwargs.iteritems(): choice -= weight if choice <= 0: result.append(key) total -= kwargs.pop(key) break else: raise ValueError('Too many samples taken from the universe') return result print choose(3, a=1, b=2, c=2, d=1, e=3) -Scott David Daniels Sc***********@Acm.Org Aug 28 '07 #8

 P: n/a Jun-geun Park wrote: Ivan Voras wrote: >Hi,I have a list of items, and need to choose several elements from it,"almost random". The catch is that the elements from the beginningshould have more chance of being selected than those at the end (howmuch more? I don't care how the "envelope" of probability looks like atthis point - can be linear). I see that there are several functions inPython standard libraries for various distribution, but is there an easypythonic way to make them do what I need? That's weird. random.randint(a,b) will be enough for most cases. Test your system to see the distribution is uniform with something like: ---- import random dist = {} for z in xrange(100000): u = random.randint(1,10) try: dist[u] = dist[u] + 1 except KeyError: dist[u] = 1 print dist ---- I understood the question wrong.(Thanks, Grant and Robert.) To get the linear distribution, you can start from the following: a = (random.random() + random.random())/2.0 # Triangle distribution in [0,1) b = random.random() # Uniform in [0,1) c = b + (1-b)*a # Convolution u = int(100*(1-c)+1) # ceil(100*(1-c)) , then, because convolution of a rectangle function and a triangle function is a linear function under the valid domain, u is a random integer in [1,100] that follows (decreasing) linear distribution. Alternatively if I were in your case, I would go with an exponential random with a suitable lambda(by cutting its tail.) Aug 28 '07 #9

 P: n/a En Mon, 27 Aug 2007 17:42:45 -0300, Ivan Voras escribiï¿½: I have a list of items, and need to choose several elements from it, "almost random". The catch is that the elements from the beginning should have more chance of being selected than those at the end (how much more? I don't care how the "envelope" of probability looks like at this point - can be linear). I see that there are several functions in Python standard libraries for various distribution, but is there an easy pythonic way to make them do what I need? Using a linear (or triangular) distribution: --- begin --- from random import randint def biased_choice(values): """Choose a random element from values; first elements are chosen more frequently than later ones. Weights are linearly assigned; given n elements, the first has weight n, second has n-1, ... last has weight 1. """ n = len(values) sumw = ((n + 1) * n) // 2 x = randint(1, sumw) F = 0 for i in xrange(1, n+1): F += i if x<=F: return values[-i] # test from collections import defaultdict values = ["a", "b", "c", "d", "e"] stats = defaultdict(int) for i in xrange(150000): stats[biased_choice(values)] += 1 for key in sorted(stats): print key, stats[key] --- end --- Output: a 50023 b 39869 c 30256 d 19784 e 10068 -- Gabriel Genellina Aug 28 '07 #10

 P: n/a I saw your article is very good, I like it very much. I will continue to pay attention to your article, the following are the points I hope that I have similar concerns. http://www.game-win.com http://www.game-win.com/wow-powerleveling.html http://www.game-win.com/faq.html http://www.game-win.com/power-leveling.html http://www.game-win.com/World-of-War...-leveling.html http://www.paper-cup-machine.com http://www.sinceremachine.com http://www.clickra.com http://www.clickra.com/NewsList_2.htm http://www.east163.com http://www.ruian2machine.cn http://www.rajayj.cn http://www.mycvcv.com http://www.dgajm.com http://www.icansee.cn http://www.icansee.cn/chanpin.htm http://www.pjwsdyb.com http://www.shdyf.com http://www.nonwoven-bags.cn http://www.plastic-thermoforming-machine.com http://www.gopowerlevel.com Aug 28 '07 #11

 P: n/a On Mon, 27 Aug 2007 22:42:45 +0200, Ivan Voras wrote: I have a list of items, and need to choose several elements from it, "almost random". The catch is that the elements from the beginning should have more chance of being selected than those at the end (how much more? I don't care how the "envelope" of probability looks like at this point - can be linear). I see that there are several functions in Python standard libraries for various distribution, but is there an easy pythonic way to make them do what I need? If you really just want to tend towards the beginning of the list, and don't care what the envelope looks like, how about this: def biasedselection(thelist): index = random.random() * random.random() * len(thelist) # or index random.random() ** 2 * len(thelist) return thelist[index] Dan -- Dan Sommers A death spiral goes clock-

 P: n/a On Aug 27, 4:42 pm, Ivan Voras

 P: n/a Ivan Voras wrote: Hi, I have a list of items, and need to choose several elements from it, "almost random". The catch is that the elements from the beginning should have more chance of being selected than those at the end (how much more? I don't care how the "envelope" of probability looks like at this point - can be linear). I see that there are several functions in Python standard libraries for various distribution, but is there an easy pythonic way to make them do what I need? If you take the difference between two uniformly distributed random variables, the probability density function forms an isosceles triangle centered at 0. Take the absolute value of that variable and the pdf is a straight line with maximum value at 0 tapering to 0 at max. Thus, z = abs(randint(0, max) - randint(0, max)) ought to do the trick. -- Jeffrey Barish Aug 28 '07 #14

 P: n/a Jeffrey Barish wrote: If you take the difference between two uniformly distributed random variables, the probability density function forms an isosceles triangle centered at 0. Take the absolute value of that variable and the pdf isa straight line with maximum value at 0 tapering to 0 at max. Thus, z = abs(randint(0, max) - randint(0, max)) ought to do the trick. It's elegant :) I've noticed something interesting in my test: the value 0 appears less often than other values (which behave as they should). I wrote this test script: from random import randint map = {} for x in xrange(11): map[x] = 0 for a in xrange(500): x = abs(randint(0, 10) - randint(0, 10)) map[x] += 1 print map and here are some results: python a.py {0: 49, 1: 66, 2: 72, 3: 73, 4: 64, 5: 55, 6: 40, 7: 36, 8: 18, 9: 18, 10: 9} python a.py {0: 34, 1: 90, 2: 77, 3: 61, 4: 56, 5: 52, 6: 42, 7: 34, 8: 27, 9: 15, 10: 12} python a.py {0: 33, 1: 80, 2: 84, 3: 62, 4: 52, 5: 46, 6: 51, 7: 31, 8: 35, 9: 16, 10: 10} python a.py {0: 50, 1: 100, 2: 69, 3: 53, 4: 63, 5: 47, 6: 41, 7: 26, 8: 30, 9: 11, 10: 10} python a.py {0: 31, 1: 84, 2: 70, 3: 70, 4: 74, 5: 50, 6: 38, 7: 28, 8: 31, 9: 15, 10: 9} python a.py {0: 43, 1: 101, 2: 79, 3: 66, 4: 41, 5: 59, 6: 37, 7: 30, 8: 26, 9: 15, 10: 3} python a.py {0: 44, 1: 108, 2: 74, 3: 50, 4: 53, 5: 54, 6: 39, 7: 34, 8: 28, 9: 13, 10: 3} python a.py {0: 42, 1: 72, 2: 70, 3: 72, 4: 61, 5: 54, 6: 41, 7: 36, 8: 30, 9: 15, 10: 7} If I ignore the case when the delta is 0, it works fine, but I don't understand why should the 0-case happen less often than others. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG1yC5ldnAQVacBcgRAg6CAKCm4HB5AFHtwRgLHTgB4f cpgA3AsQCferjS PHe06TZqYJoH/dybU7mIN9o= =VIwB -----END PGP SIGNATURE----- Aug 30 '07 #15

 P: n/a On Aug 30, 3:55 pm, Ivan Voras

 P: n/a Ivan Voras wrote: Jeffrey Barish wrote: >If you take the difference between two uniformly distributed randomvariables, the probability density function forms an isosceles trianglecentered at 0. Take the absolute value of that variable and the pdf is astraight line with maximum value at 0 tapering to 0 at max. Thus,z = abs(randint(0, max) - randint(0, max))ought to do the trick. It's elegant :) I've noticed something interesting in my test: the value 0 appears less often than other values (which behave as they should). The distribution of the difference (before the abs()) looks like this (max=4): # ### ##### ####### ---0+++ 321 123 Taking the absolute value doubles up the non-zero masses, but there's no "negative 0" to add to the 0s stack. # # ### ### #### #### 0123 The method does not work because of that. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco Aug 30 '07 #17

 P: n/a One possible fix: do > x = randint(0, 10) - randint(0, 10) x = abs(x) - (x<0) This collapses -1 and 0 to 0, -2 and 1 to 1, etc. Or a slightly simpler formula that ends up producing the same distribution: x = min(randint(0, 10), randint(0, 10)) Mark Aug 30 '07 #18

 P: n/a Robert Kern wrote: Ivan Voras wrote: >Jeffrey Barish wrote: >>If you take the difference between two uniformly distributed randomvariables, the probability density function forms an isosceles trianglecentered at 0. Take the absolute value of that variable and the pdf isastraight line with maximum value at 0 tapering to 0 at max. Thus,z = abs(randint(0, max) - randint(0, max))ought to do the trick. It's elegant :)I've noticed something interesting in my test: the value 0 appears lessoften than other values (which behave as they should). The distribution of the difference (before the abs()) looks like this (max=4): # ### ##### ####### ---0+++ 321 123 Taking the absolute value doubles up the non-zero masses, but there's no "negative 0" to add to the 0s stack. # # ### ### #### #### 0123 The method does not work because of that. The math says that it works, so we must not be implementing it correctly. I suspect that our mistake is quantizing the random variable first and then taking the difference and absolute value. What result do you get when you quantize once? That is, obtain two random values (floats) with uniform pdf. Take the difference. Abs. Round to int. This way, everything in the band from (-0.5, +0.5) goes to 0, and that's the highest part of the triangle. -- Jeffrey Barish Aug 31 '07 #19

 P: n/a Mark Dickinson wrote: That's because the call to abs() usually collapses two values to one (e.g. -2 and 2 both end up being 2), but there's only one integer n for which abs(n) == 0. Ah. Need to sleep more. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG2EtmldnAQVacBcgRAj4kAJ9MHofb5nXj9E2nZ4AFi/zoj+MPYQCgn3ii xpgzHsrFPztv/Ni7Jp3KuXA= =MhAM -----END PGP SIGNATURE----- Aug 31 '07 #20

 P: n/a Jeffrey Barish wrote: Robert Kern wrote: >Ivan Voras wrote: >>Jeffrey Barish wrote:If you take the difference between two uniformly distributed randomvariables, the probability density function forms an isosceles trianglecentered at 0. Take the absolute value of that variable and the pdf isastraight line with maximum value at 0 tapering to 0 at max. Thus,z = abs(randint(0, max) - randint(0, max))ought to do the trick.It's elegant :)I've noticed something interesting in my test: the value 0 appears lessoften than other values (which behave as they should). The distribution of the difference (before the abs()) looks like this(max=4): # ### ##### ####### ---0+++ 321 123Taking the absolute value doubles up the non-zero masses, but there's no"negative 0" to add to the 0s stack. # # ### ### #### #### 0123The method does not work because of that. The math says that it works, so we must not be implementing it correctly. "The math" says nothing of the kind about the method that was stated. I suspect that our mistake is quantizing the random variable first and then taking the difference and absolute value. What result do you get when you quantize once? That is, obtain two random values (floats) with uniform pdf. Take the difference. Abs. Round to int. This way, everything in the band from (-0.5, +0.5) goes to 0, and that's the highest part of the triangle. That's a very different "it". The difference is not just implementation. If you change "round" to "truncate", that method should work, though. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco Aug 31 '07 #21

 P: n/a I'm sorry that I took the time to respond. -- Jeffrey Barish Aug 31 '07 #22

 P: n/a Jeffrey Barish wrote: I'm sorry that I took the time to respond. I'm sorry. I didn't intend my post to be as harsh as it was. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco Aug 31 '07 #23

### This discussion thread is closed

Replies have been disabled for this discussion. 