468,736 Members | 1,780 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,736 developers. It's quick & easy.

Problem with rand() % range+1

Hi,

I've found several sites on google telling me that I shouldn't use

rand() % range+1

and I should, instead, use something like:

lowest+int(range*rand()/(RAND_MAX + 1.0))

They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?
Aug 25 '08 #1
10 2661
On 25 Aug., 22:35, Rafael Cunha de Almeida <n...@email.provided>
wrote:
Hi,

I've found several sites on google telling me that I shouldn't use

* * * * rand() % range+1

and I should, instead, use something like:

* * * * lowest+int(range*rand()/(RAND_MAX + 1.0))

They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?
Take an example of a random generator delivering numbers in the range
0..99. If you want numbers in the range 0..79, your solution would
give an expectancy of numbers 0..19 twice as high as the range 20..79.
So your method is bad if you require an even distribution (but might
be ok if you just want somewhat random distribution as eg for a game).
The other method is better, but not perfect.

/Peter
Aug 25 '08 #2
On 2008-08-25 16:35:21 -0400, Rafael Cunha de Almeida <no@email.providedsaid:
>
I've found several sites on google telling me that I shouldn't use

rand() % range+1

and I should, instead, use something like:

lowest+int(range*rand()/(RAND_MAX + 1.0))

They fail to make it clear why.
That's because it's nonsense.
There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean.
Both have the same problem. Pretend for the moment that RAND_MAX is 4
(that is, rand() returns 0, 1, 2, 3, or 4) and range is also 4 (that
is, you want values of 0, 1, 2, or 3). Since rand() generates five
different values, you can't get four distinct values with equal
likelihood. Try it with pencil and paper. Both ways.

The number of values that you start with has to be an exact multiple of
the number of values that you want to get. The way to do that is to
throw away some values from rand(), namely, those that exceed the
maximum multiple of range that's less than RAND_MAX+1. Like this:

int mult = (RAND_MAX + 1) / range;
int max = range * mult;
int temp = rand();
while (max < temp)
temp = rand();
return temp % range;

Now, that doesn't address the possibility of less randomness in the low
bits, but that's a separate issue, and depends on how rand() is
implemented.
Would it not be normally distributed or something like that?
rand() produces a uniform distribution, not a normal distribution, and
usually when someone is looking for a restricted range they're still
looking for a uniform distribution.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #3
On 2008-08-25 17:00:48 -0400, peter koch <pe***************@gmail.comsaid:
On 25 Aug., 22:35, Rafael Cunha de Almeida <n...@email.provided>
wrote:
>Hi,

I've found several sites on google telling me that I shouldn't use

* * * * rand() % range+1

and I should, instead, use something like:

* * * * lowest+int(range*rand()/(RAND_MAX + 1.0))

They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?

Take an example of a random generator delivering numbers in the range
0..99. If you want numbers in the range 0..79, your solution would
give an expectancy of numbers 0..19 twice as high as the range 20..79.
So your method is bad if you require an even distribution (but might
be ok if you just want somewhat random distribution as eg for a game).
The other method is better, but not perfect.
Well, the other method is better in that it makes the non-randomness
less obvious. <gBut with the same sample ranges it still produces
twenty values twice as often as the rest; it's just that they're not
the twenty lowest ones any more.

Probability is fundamentally just counting. If you can reduce the
problem to a small enough set of values, all you need to do is count
the results. For each possible value that rand() can produce, what
value does each of the formulae above produce? It's a bit tedious to do
for rand() generating numbers in the range 0..99, but for 0..9, with a
target range of 0..7, it won't take more than a few minutes. Don't
trust your intuition until you've gone through several examples and
seen how they really work.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #4
On 25 Aug., 23:15, Pete Becker <p...@versatilecoding.comwrote:
On 2008-08-25 17:00:48 -0400, peter koch <peter.koch.lar...@gmail.comsaid:


On 25 Aug., 22:35, Rafael Cunha de Almeida <n...@email.provided>
wrote:
Hi,
I've found several sites on google telling me that I shouldn't use
* * * * rand() % range+1
and I should, instead, use something like:
* * * * lowest+int(range*rand()/(RAND_MAX + 1.0))
They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?
Take an example of a random generator delivering numbers in the range
0..99. If you want numbers in the range 0..79, your solution would
give an expectancy of numbers 0..19 twice as high as the range 20..79.
So your method is bad if you require an even distribution (but might
be ok if you just want somewhat random distribution as eg for a game).
The other method is better, but not perfect.

Well, the other method is better in that it makes the non-randomness
less obvious. <gBut with the same sample ranges it still produces
twenty values twice as often as the rest; it's just that they're not
the twenty lowest ones any more.
You are right - the other method also produces twenty numbers twice as
frequent as the first one - just spread evenly. To my embarassment, I
must admit that I did not realise this when I replied. So much for
perfection!

/Peter
Aug 25 '08 #5
On Aug 25, 2:05 pm, Pete Becker <p...@versatilecoding.comwrote:
The number of values that you start with has to be an exact multiple of
the number of values that you want to get. The way to do that is to
throw away some values from rand(), namely, those that exceed the
maximum multiple of range that's less than RAND_MAX+1. Like this:

int mult = (RAND_MAX + 1) / range;
There is a problem if RAND_MAX equals INT_MAX. Adding one would wrap
the value and mult would become zero. So RAND_MAX could be casted to a
suitable type and I think 'unsigned int' works. Or an unsigned 1 would
work:

int mult = (RAND_MAX + 1u) / range;

I remember reading about this algorithm and that correction to it by
Andrew Koenig on clc++m posts.
int max = range * mult;
int temp = rand();
while (max < temp)
temp = rand();
return temp % range;
Ali
Aug 25 '08 #6
On 2008-08-25 17:21:18 -0400, ac******@gmail.com said:
On Aug 25, 2:05 pm, Pete Becker <p...@versatilecoding.comwrote:
>The number of values that you start with has to be an exact multiple of
the number of values that you want to get. The way to do that is to
throw away some values from rand(), namely, those that exceed the
maximum multiple of range that's less than RAND_MAX+1. Like this:

int mult = (RAND_MAX + 1) / range;

There is a problem if RAND_MAX equals INT_MAX. Adding one would wrap
the value and mult would become zero.
Yes, usually when I post this information I put in a note that it's
only a sketch, and probably has an off-by-one error. Or maybe two of
them.
So RAND_MAX could be casted to a
suitable type and I think 'unsigned int' works. Or an unsigned 1 would
work:
Maybe. RAND_MAX is required to be an integer constant expression, which
doesn't preclude it being unsigned to begin with.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #7
On 2008-08-25 17:18:54 -0400, peter koch <pe***************@gmail.comsaid:
You are right - the other method also produces twenty numbers twice as
frequent as the first one - just spread evenly. To my embarassment, I
must admit that I did not realise this when I replied. So much for
perfection!
Don't be embarassed. It's a very common mistake. Most people's
intuition here leads them astray, which is why it's useful to work
through some simple examples to retrain it.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #8
On Aug 25, 3:35 pm, Rafael Cunha de Almeida <n...@email.provided>
wrote:
Hi,

I've found several sites on google telling me that I shouldn't use

rand() % range+1

and I should, instead, use something like:

lowest+int(range*rand()/(RAND_MAX + 1.0))
Or function doing roughly the equivalent using integer math, which
might be useful if range is large (to avoid signed integer overflow).
They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?
If rand() is implemented with a linear congruential generator f(x)=ax
+b mod m, then it is a relatively easy automated check that the lower
bits are in general highly correlated (given the first, the second is
somewhat predictable). Directly verifying algebraically is harder,
but a standard homework exercise in the field. The lowest separation
required to demonstrate correlation is usually two through six
invocations apart.

More sophisticated algorithms are harder to verify, but reputedly have
similar problems unless explicitly designed otherwise. [As an extreme
case, the Mersenne Twister has been formally verified to not to be
correlated across 637 consecutive invocations.]

It's moderately unusual to use a high-quality RNG to implement rand().

The overall results would still be as approximately uniformly
distributed as rand(), except for unavoidable discretization errors as
mentioned in earlier replies (which wouldn't be that large for small
moduli since RAND_MAX should be at least 32767).
Aug 25 '08 #9
On 2008-08-25 17:47:03 -0400, za*****@zaimoni.com said:
On Aug 25, 3:35 pm, Rafael Cunha de Almeida <n...@email.provided>
wrote:
>Hi,

I've found several sites on google telling me that I shouldn't use

rand() % range+1

and I should, instead, use something like:

lowest+int(range*rand()/(RAND_MAX + 1.0))

Or function doing roughly the equivalent using integer math, which
might be useful if range is large (to avoid signed integer overflow).
>They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?

If rand() is implemented with a linear congruential generator f(x)=ax
+b mod m, then it is a relatively easy automated check that the lower
bits are in general highly correlated (given the first, the second is
somewhat predictable). Directly verifying algebraically is harder,
but a standard homework exercise in the field. The lowest separation
required to demonstrate correlation is usually two through six
invocations apart.

More sophisticated algorithms are harder to verify, but reputedly have
similar problems unless explicitly designed otherwise. [As an extreme
case, the Mersenne Twister has been formally verified to not to be
correlated across 637 consecutive invocations.]

It's moderately unusual to use a high-quality RNG to implement rand().

The overall results would still be as approximately uniformly
distributed as rand(), except for unavoidable discretization errors as
mentioned in earlier replies (which wouldn't be that large for small
moduli since RAND_MAX should be at least 32767).
Of course, if the goal is to get working code quickly, the best answer
to all of this is to unask the question. Don't try to implement this
stuff; use the random number generators in TR1. There's an adaptor to
generate uniformly distributed values within a specified range from a
generator that produces uniformly distributed values in a different
range.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #10
On Aug 25, 4:47 pm, zaim...@zaimoni.com wrote:
More sophisticated algorithms are harder to verify, but reputedly have
similar problems unless explicitly designed otherwise. [As an extreme
case, the Mersenne Twister has been formally verified to not to be
correlated across 637 consecutive invocations.]
Wrong: 623 consecutive invocations. Please accept my apologies.
Aug 26 '08 #11

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

41 posts views Thread by AngleWyrm | last post: by
6 posts views Thread by ChasW | last post: by
6 posts views Thread by Sen-Lung Chen | last post: by
4 posts views Thread by Bill Burris | last post: by
10 posts views Thread by Frank Silvermann | last post: by
26 posts views Thread by Gary Wessle | last post: by
13 posts views Thread by Spiros Bousbouras | last post: by
18 posts views Thread by Jordan Glassman | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
xarzu
2 posts views Thread by xarzu | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.