472,958 Members | 2,316 Online

# Problem with rand() % range+1

Hi,

I've found several sites on google telling me that I shouldn't use

rand() % range+1

and I should, instead, use something like:

lowest+int(range*rand()/(RAND_MAX + 1.0))

They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?
Aug 25 '08 #1
10 2955
On 25 Aug., 22:35, Rafael Cunha de Almeida <n...@email.provided>
wrote:
Hi,

I've found several sites on google telling me that I shouldn't use

* * * * rand() % range+1

and I should, instead, use something like:

* * * * lowest+int(range*rand()/(RAND_MAX + 1.0))

They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?
Take an example of a random generator delivering numbers in the range
0..99. If you want numbers in the range 0..79, your solution would
give an expectancy of numbers 0..19 twice as high as the range 20..79.
So your method is bad if you require an even distribution (but might
be ok if you just want somewhat random distribution as eg for a game).
The other method is better, but not perfect.

/Peter
Aug 25 '08 #2
On 2008-08-25 16:35:21 -0400, Rafael Cunha de Almeida <no@email.providedsaid:
>
I've found several sites on google telling me that I shouldn't use

rand() % range+1

and I should, instead, use something like:

lowest+int(range*rand()/(RAND_MAX + 1.0))

They fail to make it clear why.
That's because it's nonsense.
There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean.
Both have the same problem. Pretend for the moment that RAND_MAX is 4
(that is, rand() returns 0, 1, 2, 3, or 4) and range is also 4 (that
is, you want values of 0, 1, 2, or 3). Since rand() generates five
different values, you can't get four distinct values with equal
likelihood. Try it with pencil and paper. Both ways.

The number of values that you start with has to be an exact multiple of
the number of values that you want to get. The way to do that is to
throw away some values from rand(), namely, those that exceed the
maximum multiple of range that's less than RAND_MAX+1. Like this:

int mult = (RAND_MAX + 1) / range;
int max = range * mult;
int temp = rand();
while (max < temp)
temp = rand();
return temp % range;

Now, that doesn't address the possibility of less randomness in the low
bits, but that's a separate issue, and depends on how rand() is
implemented.
Would it not be normally distributed or something like that?
rand() produces a uniform distribution, not a normal distribution, and
usually when someone is looking for a restricted range they're still
looking for a uniform distribution.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #3
On 2008-08-25 17:00:48 -0400, peter koch <pe***************@gmail.comsaid:
On 25 Aug., 22:35, Rafael Cunha de Almeida <n...@email.provided>
wrote:
>Hi,

I've found several sites on google telling me that I shouldn't use

Â* Â* Â* Â* rand() % range+1

and I should, instead, use something like:

Â* Â* Â* Â* lowest+int(range*rand()/(RAND_MAX + 1.0))

They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?

Take an example of a random generator delivering numbers in the range
0..99. If you want numbers in the range 0..79, your solution would
give an expectancy of numbers 0..19 twice as high as the range 20..79.
So your method is bad if you require an even distribution (but might
be ok if you just want somewhat random distribution as eg for a game).
The other method is better, but not perfect.
Well, the other method is better in that it makes the non-randomness
less obvious. <gBut with the same sample ranges it still produces
twenty values twice as often as the rest; it's just that they're not
the twenty lowest ones any more.

Probability is fundamentally just counting. If you can reduce the
problem to a small enough set of values, all you need to do is count
the results. For each possible value that rand() can produce, what
value does each of the formulae above produce? It's a bit tedious to do
for rand() generating numbers in the range 0..99, but for 0..9, with a
target range of 0..7, it won't take more than a few minutes. Don't
trust your intuition until you've gone through several examples and
seen how they really work.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #4
On 25 Aug., 23:15, Pete Becker <p...@versatilecoding.comwrote:
On 2008-08-25 17:00:48 -0400, peter koch <peter.koch.lar...@gmail.comsaid:

On 25 Aug., 22:35, Rafael Cunha de Almeida <n...@email.provided>
wrote:
Hi,
I've found several sites on google telling me that I shouldn't use
* * * * rand() % range+1
and I should, instead, use something like:
* * * * lowest+int(range*rand()/(RAND_MAX + 1.0))
They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?
Take an example of a random generator delivering numbers in the range
0..99. If you want numbers in the range 0..79, your solution would
give an expectancy of numbers 0..19 twice as high as the range 20..79.
So your method is bad if you require an even distribution (but might
be ok if you just want somewhat random distribution as eg for a game).
The other method is better, but not perfect.

Well, the other method is better in that it makes the non-randomness
less obvious. <gBut with the same sample ranges it still produces
twenty values twice as often as the rest; it's just that they're not
the twenty lowest ones any more.
You are right - the other method also produces twenty numbers twice as
frequent as the first one - just spread evenly. To my embarassment, I
must admit that I did not realise this when I replied. So much for
perfection!

/Peter
Aug 25 '08 #5
On Aug 25, 2:05 pm, Pete Becker <p...@versatilecoding.comwrote:
The number of values that you start with has to be an exact multiple of
the number of values that you want to get. The way to do that is to
throw away some values from rand(), namely, those that exceed the
maximum multiple of range that's less than RAND_MAX+1. Like this:

int mult = (RAND_MAX + 1) / range;
There is a problem if RAND_MAX equals INT_MAX. Adding one would wrap
the value and mult would become zero. So RAND_MAX could be casted to a
suitable type and I think 'unsigned int' works. Or an unsigned 1 would
work:

int mult = (RAND_MAX + 1u) / range;

Andrew Koenig on clc++m posts.
int max = range * mult;
int temp = rand();
while (max < temp)
temp = rand();
return temp % range;
Ali
Aug 25 '08 #6
On 2008-08-25 17:21:18 -0400, ac******@gmail.com said:
On Aug 25, 2:05 pm, Pete Becker <p...@versatilecoding.comwrote:
>The number of values that you start with has to be an exact multiple of
the number of values that you want to get. The way to do that is to
throw away some values from rand(), namely, those that exceed the
maximum multiple of range that's less than RAND_MAX+1. Like this:

int mult = (RAND_MAX + 1) / range;

There is a problem if RAND_MAX equals INT_MAX. Adding one would wrap
the value and mult would become zero.
Yes, usually when I post this information I put in a note that it's
only a sketch, and probably has an off-by-one error. Or maybe two of
them.
So RAND_MAX could be casted to a
suitable type and I think 'unsigned int' works. Or an unsigned 1 would
work:
Maybe. RAND_MAX is required to be an integer constant expression, which
doesn't preclude it being unsigned to begin with.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #7
On 2008-08-25 17:18:54 -0400, peter koch <pe***************@gmail.comsaid:
You are right - the other method also produces twenty numbers twice as
frequent as the first one - just spread evenly. To my embarassment, I
must admit that I did not realise this when I replied. So much for
perfection!
Don't be embarassed. It's a very common mistake. Most people's
intuition here leads them astray, which is why it's useful to work
through some simple examples to retrain it.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #8
On Aug 25, 3:35 pm, Rafael Cunha de Almeida <n...@email.provided>
wrote:
Hi,

I've found several sites on google telling me that I shouldn't use

rand() % range+1

and I should, instead, use something like:

lowest+int(range*rand()/(RAND_MAX + 1.0))
Or function doing roughly the equivalent using integer math, which
might be useful if range is large (to avoid signed integer overflow).
They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?
If rand() is implemented with a linear congruential generator f(x)=ax
+b mod m, then it is a relatively easy automated check that the lower
bits are in general highly correlated (given the first, the second is
somewhat predictable). Directly verifying algebraically is harder,
but a standard homework exercise in the field. The lowest separation
required to demonstrate correlation is usually two through six
invocations apart.

More sophisticated algorithms are harder to verify, but reputedly have
similar problems unless explicitly designed otherwise. [As an extreme
case, the Mersenne Twister has been formally verified to not to be
correlated across 637 consecutive invocations.]

It's moderately unusual to use a high-quality RNG to implement rand().

The overall results would still be as approximately uniformly
distributed as rand(), except for unavoidable discretization errors as
mentioned in earlier replies (which wouldn't be that large for small
moduli since RAND_MAX should be at least 32767).
Aug 25 '08 #9
On 2008-08-25 17:47:03 -0400, za*****@zaimoni.com said:
On Aug 25, 3:35 pm, Rafael Cunha de Almeida <n...@email.provided>
wrote:
>Hi,

I've found several sites on google telling me that I shouldn't use

rand() % range+1

and I should, instead, use something like:

lowest+int(range*rand()/(RAND_MAX + 1.0))

Or function doing roughly the equivalent using integer math, which
might be useful if range is large (to avoid signed integer overflow).
>They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?

If rand() is implemented with a linear congruential generator f(x)=ax
+b mod m, then it is a relatively easy automated check that the lower
bits are in general highly correlated (given the first, the second is
somewhat predictable). Directly verifying algebraically is harder,
but a standard homework exercise in the field. The lowest separation
required to demonstrate correlation is usually two through six
invocations apart.

More sophisticated algorithms are harder to verify, but reputedly have
similar problems unless explicitly designed otherwise. [As an extreme
case, the Mersenne Twister has been formally verified to not to be
correlated across 637 consecutive invocations.]

It's moderately unusual to use a high-quality RNG to implement rand().

The overall results would still be as approximately uniformly
distributed as rand(), except for unavoidable discretization errors as
mentioned in earlier replies (which wouldn't be that large for small
moduli since RAND_MAX should be at least 32767).
Of course, if the goal is to get working code quickly, the best answer
to all of this is to unask the question. Don't try to implement this
stuff; use the random number generators in TR1. There's an adaptor to
generate uniformly distributed values within a specified range from a
generator that produces uniformly distributed values in a different
range.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #10
On Aug 25, 4:47 pm, zaim...@zaimoni.com wrote:
More sophisticated algorithms are harder to verify, but reputedly have
similar problems unless explicitly designed otherwise. [As an extreme
case, the Mersenne Twister has been formally verified to not to be
correlated across 637 consecutive invocations.]
Wrong: 623 consecutive invocations. Please accept my apologies.
Aug 26 '08 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

### Similar topics

 4 by: August1 | last post by: A handful of articles have been posted requesting information on how to use these functions in addition to the time() function as the seed to generate unique groups (sets) of numbers - each group... 41 by: AngleWyrm | last post by: I have created a new container class, called the hat. It provides random selection of user objects, both with and without replacement, with non-uniform probabilities. Uniform probabilities are a... 6 by: ChasW | last post by: given the following example: Using gcc, this compiles, runs, and outputs as expected, but on vc++.net 2003, the first number is always the same despite the time seed. #include ... 6 by: Sen-Lung Chen | last post by: Dear All: I have a question about this below function.The purpose of this function is to generate one number between a and b. -------------------------- int gennum(int a, int b) {... 4 by: Bill Burris | last post by: Hi, With VS .NET 2003 the rand() function sometimes returns a number equal to RAND_MAX. The docs say: The rand function returns a pseudorandom integer in the range 0 to RAND_MAX. Does this... 10 by: Frank Silvermann | last post by: #include #include #include #include #define MIN_WORD_LENGTH 9 26 by: Gary Wessle | last post by: Hi I need help to generate some random numbers between 2 and 8. #include using std::rand; the following was out of my range, int main() { 13 by: Spiros Bousbouras | last post by: The standard says that rand() should return a pseudo-random number but what does pseudorandom mean ? If an implementation of rand() always returned the same number would it be conforming ? What if... 18 by: Jordan Glassman | last post by: Trying to do something fairly routine... drop output into a file to graph, but the following command at the bash command line: ising output produces a blinking cursor, an empty file named... 2 by: isladogs | last post by: The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central... 0 by: tracyyun | last post by: Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to... 2 by: giovanniandrean | last post by: The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions... 4 by: NeoPa | last post by: Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :... 3 by: NeoPa | last post by: Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all... 0 by: isladogs | last post by: The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on... 3 by: nia12 | last post by: Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of... 0 by: NeoPa | last post by: Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it... 0 by: isladogs | last post by: The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...