468,771 Members | 1,600 Online

rand - is RAND_MAX how long before the same # will occur again?

When I use rand(), is the RAND_MAX value how long I am guaranteed that
the same value will not appear twice?

And is this a floating window?

For example, if RAND_MAX is 32767, and I make 500,000 consecutive rand
calls then is the rand() algorithm going to guarantee me that no
floating window of calls over that 500,000 rand calls will have the
same value twice? The only way that would work is if the series was
identical everytime.

I just found some code that someone wrote that uses rand() to create a
unique id, and I'm thinking it would make more sense just to take a 32
bit int, and increment it everytime with a conditional check that
restarts it at 1 when the max value for the int is reached. I cannot
see any obvious reason why rand() was used instead of this approach.

Nov 14 '05 #1
8 4142
"Jack" <js*********@yahoo.com> writes:
When I use rand(), is the RAND_MAX value how long I am guaranteed that
the same value will not appear twice?

No. It is very unlikely that no value will appear twice. If the
random number generator is uniformly random, the chance that no
value will appear twice in RAND_MAX calls, assuming RAND_MAX ==
32767, is

(32766 / 32767) * (32765 / 32767) * ... * (1 / 32767)
== 32766! / (32767)**32766
~= 1.347 * 10**-14228 (according to Emacs calc)

This result is much, much smaller than the maximum value for
LDBL_MIN.

(Someone should check my math, I'm not so good at this. Also,
smartasses, the above are math formulas, not C expressions.)
--
"What is appropriate for the master is not appropriate for the novice.
You must understand the Tao before transcending structure."
--The Tao of Programming
Nov 14 '05 #2
On 29 Jan 2005 08:51:11 -0800, "Jack" <js*********@yahoo.com> wrote:
When I use rand(), is the RAND_MAX value how long I am guaranteed that
the same value will not appear twice?
No, if the numbers are random then it is entirely possible for the
first two to be the same.

RAND_MAX is simply the largest value that rand can return.

And is this a floating window?
What is a floating window?

For example, if RAND_MAX is 32767, and I make 500,000 consecutive rand
calls then is the rand() algorithm going to guarantee me that no
floating window of calls over that 500,000 rand calls will have the
same value twice? The only way that would work is if the series was
identical everytime.
I don't know what you mean here but if you make 500,000 calls then on
average each value will occur 16-17 times.

I just found some code that someone wrote that uses rand() to create a
unique id, and I'm thinking it would make more sense just to take a 32
bit int, and increment it everytime with a conditional check that
restarts it at 1 when the max value for the int is reached. I cannot
see any obvious reason why rand() was used instead of this approach.

<<Remove the del for email>>
Nov 14 '05 #3
"Jack" writes:
I just found some code that someone wrote that uses rand() to create a
unique id, and I'm thinking it would make more sense just to take a 32
bit int, and increment it everytime with a conditional check that
restarts it at 1 when the max value for the int is reached. I cannot
see any obvious reason why rand() was used instead of this approach.

My guess: the author goofed, it's a relatively easy mistake to make. A
reason that has at least *some* plausibility: Information hiding, he didn't
want the users to glean any information from their relative numbers.
Nov 14 '05 #4
Ben Pfaff wrote:
"Jack" <js*********@yahoo.com> writes:

When I use rand(), is the RAND_MAX value how long I am guaranteed that
the same value will not appear twice?

No. It is very unlikely that no value will appear twice. If the
random number generator is uniformly random, the chance that no
value will appear twice in RAND_MAX calls, assuming RAND_MAX ==
32767, is

(32766 / 32767) * (32765 / 32767) * ... * (1 / 32767)
== 32766! / (32767)**32766
~= 1.347 * 10**-14228 (according to Emacs calc)

This result is much, much smaller than the maximum value for
LDBL_MIN.

(Someone should check my math, I'm not so good at this. Also,
smartasses, the above are math formulas, not C expressions.)

FWIW, a different tool gives me the same result. If we're
wrong, I at least am in good company.

However, this calculation assumes successive rand() values
are independent, which is certainly not the case. rand() is
required to be deterministic in the sense that it produces the
exact same sequence of values for a given explicit or implied
srand() argument; the srand() argument completely determines
the sequence of subsequent rand() values.

Drifting into implementation specifics, it is also worth
noting that if rand() is a full-period linear congruential
generator it produces a permutation of { 0 .. RAND_MAX }, so
the probability of a repeated value in RAND_MAX calls or even
in RAND_MAX+1 calls is zero! Of course, the probability of a
repetition in RAND_MAX+2 calls rises abruptly to unity. For
a pure multiplicative generator with prime modulus, the generated
values are a permutation of { 1 .. RAND_MAX } and the probability
transition lies between RAND_MAX and RAND_MAX+1 calls.

For the benefit of the O.P. (Ben already knows this), the
Standard does not specify what algorithm underlies rand(), and
different implementations use different generators. It is
certainly an error to assume that rand() will not repeat itself
within RAND_MAX calls; it is even possible that two rand() calls
in a row can return the same value.

--
Eric Sosman
es*****@acm-dot-org.invalid
Nov 14 '05 #5
"Jack" <js*********@yahoo.com> writes:
When I use rand(), is the RAND_MAX value how long I am guaranteed that
the same value will not appear twice?
You're not. If the values returned by rand() were truly random, there
would be a 1.0/(RAND_MAX+1) chance that two successive calls to rand()
would yield the same value.

Of course rand() returns pseudo-random values, and the sequence is
required to be reproducible for a given seed (the argument passed to
srand().

If the internal state is no bigger than the value returned (e.g., if
RAND_MAX is 32767 and the system stores only 15 bits of internal
state), then rand() can never return the same result twice -- if it
did, it would continue to return that same result indefinitely. In
that case, the results will repeat with a cycle of *at most*
RAND_MAX+1.

If it keeps a larger internal state, the results are going to look
more like real random numbers, though of course they'll still be
deterministic. For example, if there are 1024 bits of internal state,
that makes 2**1024 possible states; srand() lets you select one of
UINT_MAX+1 of those states.

So, depending on the implementation, two successive calls to rand()
might never return the same value, or they might do so once in
RAND_MAX+1 calls (periodically or probabilistically), or they might do
something else if the implementation is more "pseudo" than "random".

[snip]
I just found some code that someone wrote that uses rand() to create a
unique id, and I'm thinking it would make more sense just to take a 32
bit int, and increment it everytime with a conditional check that
restarts it at 1 when the max value for the int is reached. I cannot
see any obvious reason why rand() was used instead of this approach.

If all you care about is uniqueness, incrementing a variable is easier
and more effective than using rand(). If you also want
unpredictability, you can use a random number, but you have to allow
for the possibility of repeated values, probably by keeping track of
all the ids you've already used. RAND_MAX can be as small as 32767,
so it's not useful if you want a large number of unique ids. Finally,
rand() is often not very good, and can be easy to predict if you know
the algorithm; it's almost certainly not suitable for cryptographic
applications.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #6
Eric Sosman wrote:
It is
certainly an error to assume that rand() will not repeat itself
within RAND_MAX calls; it is even possible that two rand() calls
in a row can return the same value.

It is even possible that *all* rand() calls can return the same value.

--
pete
Nov 14 '05 #7
pete <pf*****@mindspring.com> writes:
Eric Sosman wrote:
It is
certainly an error to assume that rand() will not repeat itself
within RAND_MAX calls; it is even possible that two rand() calls
in a row can return the same value.

It is even possible that *all* rand() calls can return the same value.

C99 7.20.2p2:

The rand function computes a sequence of pseudo-random integers in
the range 0 to RAND_MAX.

I don't think you can stretch the meaning of "pseudo-random" to
include a repeated sequence of the same value.

(On the other hand, if it generated truly random values, there would
be a finite probability that it could produce an arbitrarily long
sequence of a single value.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #8
Keith Thompson wrote:

pete <pf*****@mindspring.com> writes:
Eric Sosman wrote:
It is
certainly an error to assume that rand() will not repeat itself
within RAND_MAX calls; it is even possible that two rand() calls
in a row can return the same value.

It is even possible that *all* rand() calls can return the same value.

C99 7.20.2p2:

The rand function computes a sequence of pseudo-random integers in
the range 0 to RAND_MAX.

I don't think you can stretch the meaning of "pseudo-random" to
include a repeated sequence of the same value.

(On the other hand, if it generated truly random values, there would
be a finite probability that it could produce an arbitrarily long
sequence of a single value.)

It's a quality of implementation issue,
like a malloc that always returns NULL.
There's nothing in the standard which prohibits
a very poor quality rand().

--
pete
Nov 14 '05 #9