473,416 Members | 1,486 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,416 software developers and data experts.

Problem with rand() % range+1

Hi,

I've found several sites on google telling me that I shouldn't use

rand() % range+1

and I should, instead, use something like:

lowest+int(range*rand()/(RAND_MAX + 1.0))

They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?
Aug 25 '08 #1
10 2985
On 25 Aug., 22:35, Rafael Cunha de Almeida <n...@email.provided>
wrote:
Hi,

I've found several sites on google telling me that I shouldn't use

* * * * rand() % range+1

and I should, instead, use something like:

* * * * lowest+int(range*rand()/(RAND_MAX + 1.0))

They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?
Take an example of a random generator delivering numbers in the range
0..99. If you want numbers in the range 0..79, your solution would
give an expectancy of numbers 0..19 twice as high as the range 20..79.
So your method is bad if you require an even distribution (but might
be ok if you just want somewhat random distribution as eg for a game).
The other method is better, but not perfect.

/Peter
Aug 25 '08 #2
On 2008-08-25 16:35:21 -0400, Rafael Cunha de Almeida <no@email.providedsaid:
>
I've found several sites on google telling me that I shouldn't use

rand() % range+1

and I should, instead, use something like:

lowest+int(range*rand()/(RAND_MAX + 1.0))

They fail to make it clear why.
That's because it's nonsense.
There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean.
Both have the same problem. Pretend for the moment that RAND_MAX is 4
(that is, rand() returns 0, 1, 2, 3, or 4) and range is also 4 (that
is, you want values of 0, 1, 2, or 3). Since rand() generates five
different values, you can't get four distinct values with equal
likelihood. Try it with pencil and paper. Both ways.

The number of values that you start with has to be an exact multiple of
the number of values that you want to get. The way to do that is to
throw away some values from rand(), namely, those that exceed the
maximum multiple of range that's less than RAND_MAX+1. Like this:

int mult = (RAND_MAX + 1) / range;
int max = range * mult;
int temp = rand();
while (max < temp)
temp = rand();
return temp % range;

Now, that doesn't address the possibility of less randomness in the low
bits, but that's a separate issue, and depends on how rand() is
implemented.
Would it not be normally distributed or something like that?
rand() produces a uniform distribution, not a normal distribution, and
usually when someone is looking for a restricted range they're still
looking for a uniform distribution.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #3
On 2008-08-25 17:00:48 -0400, peter koch <pe***************@gmail.comsaid:
On 25 Aug., 22:35, Rafael Cunha de Almeida <n...@email.provided>
wrote:
>Hi,

I've found several sites on google telling me that I shouldn't use

Â* Â* Â* Â* rand() % range+1

and I should, instead, use something like:

Â* Â* Â* Â* lowest+int(range*rand()/(RAND_MAX + 1.0))

They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?

Take an example of a random generator delivering numbers in the range
0..99. If you want numbers in the range 0..79, your solution would
give an expectancy of numbers 0..19 twice as high as the range 20..79.
So your method is bad if you require an even distribution (but might
be ok if you just want somewhat random distribution as eg for a game).
The other method is better, but not perfect.
Well, the other method is better in that it makes the non-randomness
less obvious. <gBut with the same sample ranges it still produces
twenty values twice as often as the rest; it's just that they're not
the twenty lowest ones any more.

Probability is fundamentally just counting. If you can reduce the
problem to a small enough set of values, all you need to do is count
the results. For each possible value that rand() can produce, what
value does each of the formulae above produce? It's a bit tedious to do
for rand() generating numbers in the range 0..99, but for 0..9, with a
target range of 0..7, it won't take more than a few minutes. Don't
trust your intuition until you've gone through several examples and
seen how they really work.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #4
On 25 Aug., 23:15, Pete Becker <p...@versatilecoding.comwrote:
On 2008-08-25 17:00:48 -0400, peter koch <peter.koch.lar...@gmail.comsaid:


On 25 Aug., 22:35, Rafael Cunha de Almeida <n...@email.provided>
wrote:
Hi,
I've found several sites on google telling me that I shouldn't use
* * * * rand() % range+1
and I should, instead, use something like:
* * * * lowest+int(range*rand()/(RAND_MAX + 1.0))
They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?
Take an example of a random generator delivering numbers in the range
0..99. If you want numbers in the range 0..79, your solution would
give an expectancy of numbers 0..19 twice as high as the range 20..79.
So your method is bad if you require an even distribution (but might
be ok if you just want somewhat random distribution as eg for a game).
The other method is better, but not perfect.

Well, the other method is better in that it makes the non-randomness
less obvious. <gBut with the same sample ranges it still produces
twenty values twice as often as the rest; it's just that they're not
the twenty lowest ones any more.
You are right - the other method also produces twenty numbers twice as
frequent as the first one - just spread evenly. To my embarassment, I
must admit that I did not realise this when I replied. So much for
perfection!

/Peter
Aug 25 '08 #5
On Aug 25, 2:05 pm, Pete Becker <p...@versatilecoding.comwrote:
The number of values that you start with has to be an exact multiple of
the number of values that you want to get. The way to do that is to
throw away some values from rand(), namely, those that exceed the
maximum multiple of range that's less than RAND_MAX+1. Like this:

int mult = (RAND_MAX + 1) / range;
There is a problem if RAND_MAX equals INT_MAX. Adding one would wrap
the value and mult would become zero. So RAND_MAX could be casted to a
suitable type and I think 'unsigned int' works. Or an unsigned 1 would
work:

int mult = (RAND_MAX + 1u) / range;

I remember reading about this algorithm and that correction to it by
Andrew Koenig on clc++m posts.
int max = range * mult;
int temp = rand();
while (max < temp)
temp = rand();
return temp % range;
Ali
Aug 25 '08 #6
On 2008-08-25 17:21:18 -0400, ac******@gmail.com said:
On Aug 25, 2:05 pm, Pete Becker <p...@versatilecoding.comwrote:
>The number of values that you start with has to be an exact multiple of
the number of values that you want to get. The way to do that is to
throw away some values from rand(), namely, those that exceed the
maximum multiple of range that's less than RAND_MAX+1. Like this:

int mult = (RAND_MAX + 1) / range;

There is a problem if RAND_MAX equals INT_MAX. Adding one would wrap
the value and mult would become zero.
Yes, usually when I post this information I put in a note that it's
only a sketch, and probably has an off-by-one error. Or maybe two of
them.
So RAND_MAX could be casted to a
suitable type and I think 'unsigned int' works. Or an unsigned 1 would
work:
Maybe. RAND_MAX is required to be an integer constant expression, which
doesn't preclude it being unsigned to begin with.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #7
On 2008-08-25 17:18:54 -0400, peter koch <pe***************@gmail.comsaid:
You are right - the other method also produces twenty numbers twice as
frequent as the first one - just spread evenly. To my embarassment, I
must admit that I did not realise this when I replied. So much for
perfection!
Don't be embarassed. It's a very common mistake. Most people's
intuition here leads them astray, which is why it's useful to work
through some simple examples to retrain it.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #8
On Aug 25, 3:35 pm, Rafael Cunha de Almeida <n...@email.provided>
wrote:
Hi,

I've found several sites on google telling me that I shouldn't use

rand() % range+1

and I should, instead, use something like:

lowest+int(range*rand()/(RAND_MAX + 1.0))
Or function doing roughly the equivalent using integer math, which
might be useful if range is large (to avoid signed integer overflow).
They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?
If rand() is implemented with a linear congruential generator f(x)=ax
+b mod m, then it is a relatively easy automated check that the lower
bits are in general highly correlated (given the first, the second is
somewhat predictable). Directly verifying algebraically is harder,
but a standard homework exercise in the field. The lowest separation
required to demonstrate correlation is usually two through six
invocations apart.

More sophisticated algorithms are harder to verify, but reputedly have
similar problems unless explicitly designed otherwise. [As an extreme
case, the Mersenne Twister has been formally verified to not to be
correlated across 637 consecutive invocations.]

It's moderately unusual to use a high-quality RNG to implement rand().

The overall results would still be as approximately uniformly
distributed as rand(), except for unavoidable discretization errors as
mentioned in earlier replies (which wouldn't be that large for small
moduli since RAND_MAX should be at least 32767).
Aug 25 '08 #9
On 2008-08-25 17:47:03 -0400, za*****@zaimoni.com said:
On Aug 25, 3:35 pm, Rafael Cunha de Almeida <n...@email.provided>
wrote:
>Hi,

I've found several sites on google telling me that I shouldn't use

rand() % range+1

and I should, instead, use something like:

lowest+int(range*rand()/(RAND_MAX + 1.0))

Or function doing roughly the equivalent using integer math, which
might be useful if range is large (to avoid signed integer overflow).
>They fail to make it clear why. There seems to be less randomness in the
lower bits or something like that. I don't know what less randomness
would mean. Would it not be normally distributed or something like that?

If rand() is implemented with a linear congruential generator f(x)=ax
+b mod m, then it is a relatively easy automated check that the lower
bits are in general highly correlated (given the first, the second is
somewhat predictable). Directly verifying algebraically is harder,
but a standard homework exercise in the field. The lowest separation
required to demonstrate correlation is usually two through six
invocations apart.

More sophisticated algorithms are harder to verify, but reputedly have
similar problems unless explicitly designed otherwise. [As an extreme
case, the Mersenne Twister has been formally verified to not to be
correlated across 637 consecutive invocations.]

It's moderately unusual to use a high-quality RNG to implement rand().

The overall results would still be as approximately uniformly
distributed as rand(), except for unavoidable discretization errors as
mentioned in earlier replies (which wouldn't be that large for small
moduli since RAND_MAX should be at least 32767).
Of course, if the goal is to get working code quickly, the best answer
to all of this is to unask the question. Don't try to implement this
stuff; use the random number generators in TR1. There's an adaptor to
generate uniformly distributed values within a specified range from a
generator that produces uniformly distributed values in a different
range.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Aug 25 '08 #10
On Aug 25, 4:47 pm, zaim...@zaimoni.com wrote:
More sophisticated algorithms are harder to verify, but reputedly have
similar problems unless explicitly designed otherwise. [As an extreme
case, the Mersenne Twister has been formally verified to not to be
correlated across 637 consecutive invocations.]
Wrong: 623 consecutive invocations. Please accept my apologies.
Aug 26 '08 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: August1 | last post by:
A handful of articles have been posted requesting information on how to use these functions in addition to the time() function as the seed to generate unique groups (sets) of numbers - each group...
41
by: AngleWyrm | last post by:
I have created a new container class, called the hat. It provides random selection of user objects, both with and without replacement, with non-uniform probabilities. Uniform probabilities are a...
6
by: ChasW | last post by:
given the following example: Using gcc, this compiles, runs, and outputs as expected, but on vc++.net 2003, the first number is always the same despite the time seed. #include <ctime>...
6
by: Sen-Lung Chen | last post by:
Dear All: I have a question about this below function.The purpose of this function is to generate one number between a and b. -------------------------- int gennum(int a, int b) {...
4
by: Bill Burris | last post by:
Hi, With VS .NET 2003 the rand() function sometimes returns a number equal to RAND_MAX. The docs say: The rand function returns a pseudorandom integer in the range 0 to RAND_MAX. Does this...
10
by: Frank Silvermann | last post by:
#include <stdio.h> #include <stdlib.h> #include <time.h> #include <math.h> #define MIN_WORD_LENGTH 9
26
by: Gary Wessle | last post by:
Hi I need help to generate some random numbers between 2 and 8. #include <cstdlib> using std::rand; the following was out of my range, int main() {
13
by: Spiros Bousbouras | last post by:
The standard says that rand() should return a pseudo-random number but what does pseudorandom mean ? If an implementation of rand() always returned the same number would it be conforming ? What if...
18
by: Jordan Glassman | last post by:
Trying to do something fairly routine... drop output into a file to graph, but the following command at the bash command line: ising output produces a blinking cursor, an empty file named...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.