rand() % n Revisited

Rich Fife

Quick rand() question:

I know you're not supposed to use "rand() % 1024" for instance,
because it focuses on the lower bits. However, it seems to me that
given that the argument is not a power of two (or near a power of
two), that this is not an issue. The upper bits will participate
equally in the result with the lower. Am I missing something?

Thanks!

-- Rich --

Oct 23 '08 #1

Subscribe Post Reply

3910

Paul Hsieh

On Oct 22, 6:04 pm, Rich Fife <rf...@amug.orgwrote:

Quick rand() question:

I know you're not supposed to use "rand() % 1024" for instance,
because it focuses on the lower bits. However, it seems to me
that given that the argument is not a power of two (or near a
power of two), that this is not an issue. The upper bits will
participate equally in the result with the lower. Am I missing
something?

Yes, you are missing some mathematical analysis to back up what you
just said. If you do (rand() % 1023) on Microsoft Visual C++ or
WATCOM C/C++, 32 of the possible outputs will have an extra 3% bias no
matter how good your random number generator is. No C compiler's
rand() that I have ever seen has, by itself, a worse effect on random
output than that.

The C.L.C. FAQ about this gives extremely misleading advice on this
point and it should seriously be ignored. If you want to seriously
deal with random numbers just read my page about it:

http://www.pobox.com/~qed/random.html

I build up a *REAL* ranged random number generator with reasonable
performance characteristics, culminating in the randrange() function
that removes all the primary problems with ranged random numbers. If
you want something with pure random bit quality you can always use the
Mersenne Twister or Fortuna as a base for my generator function.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Oct 23 '08 #2

lawrence.jones

Paul Hsieh <we******@gmail.comwrote:

No C compiler's
rand() that I have ever seen has, by itself, a worse effect on random
output than that.

Then you've never seen the truely bad BSD rand() that fostered most of
the paranoia about rand. With it, rand() % 1 generated the sequence 0,
1, 0, 1, 0, 1, ....
--
Larry Jones

This game lends itself to certain abuses. -- Calvin

Oct 23 '08 #3

Kelsey Bjarnason

On Thu, 23 Oct 2008 10:46:39 -0400, lawrence.jones wrote:

Paul Hsieh <we******@gmail.comwrote:
>No C compiler's
rand() that I have ever seen has, by itself, a worse effect on random
output than that.

Then you've never seen the truely bad BSD rand() that fostered most of
the paranoia about rand. With it, rand() % 1 generated the sequence 0,
1, 0, 1, 0, 1, ....

That would be bad, as rand() % 1 should only ever produce 0.

Oct 23 '08 #4

Keith Thompson

Kelsey Bjarnason <ke*****@lgisp.netwrites:

On Thu, 23 Oct 2008 10:46:39 -0400, lawrence.jones wrote:

>Paul Hsieh <we******@gmail.comwrote:
>>No C compiler's
rand() that I have ever seen has, by itself, a worse effect on random
output than that.

Then you've never seen the truely bad BSD rand() that fostered most of
the paranoia about rand. With it, rand() % 1 generated the sequence 0,
1, 0, 1, 0, 1, ....

That would be bad, as rand() % 1 should only ever produce 0.

Obviously Larry is using a font that doesn't distinguish clearly
enough between '%' and '&'. Yeah, that's it.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 23 '08 #5

CBFalconer

la************@siemens.com wrote:

Paul Hsieh <we******@gmail.comwrote:

>No C compiler's rand() that I have ever seen has, by itself, a
worse effect on random output than that.

Then you've never seen the truely bad BSD rand() that fostered
most of the paranoia about rand. With it, rand() % 1 generated
the sequence 0, 1, 0, 1, 0, 1, ....

FYI the value of "rand() % 1" is identically 0. Except it may be
considerably slower than just writing "0".

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Oct 23 '08 #6

Phil Carmody

Keith Thompson <ks***@mib.orgwrites:

Kelsey Bjarnason <ke*****@lgisp.netwrites:
>On Thu, 23 Oct 2008 10:46:39 -0400, lawrence.jones wrote:

>>Paul Hsieh <we******@gmail.comwrote:
No C compiler's
rand() that I have ever seen has, by itself, a worse effect on random
output than that.

Then you've never seen the truely bad BSD rand() that fostered most of
the paranoia about rand. With it, rand() % 1 generated the sequence 0,
1, 0, 1, 0, 1, ....

That would be bad, as rand() % 1 should only ever produce 0.

Obviously Larry is using a font that doesn't distinguish clearly
enough between '%' and '&'. Yeah, that's it.

I had presumed that ``const int l=2;'' was the line before.

Obfuscatorially yours,
Phil
--
The fact that a believer is happier than a sceptic is no more to the
point than the fact that a drunken man is happier than a sober one.
The happiness of credulity is a cheap and dangerous quality.
-- George Bernard Shaw (1856-1950), Preface to Androcles and the Lion

Oct 23 '08 #7

Charles Richmond

CBFalconer wrote:

la************@siemens.com wrote:
>Paul Hsieh <we******@gmail.comwrote:

>>No C compiler's rand() that I have ever seen has, by itself, a
worse effect on random output than that.
Then you've never seen the truely bad BSD rand() that fostered
most of the paranoia about rand. With it, rand() % 1 generated
the sequence 0, 1, 0, 1, 0, 1, ....

FYI the value of "rand() % 1" is identically 0. Except it may be
considerably slower than just writing "0".

Not if your compiler has a good optimizer... ;-)

--
+----------------------------------------------------------------+
| Charles and Francis Richmond richmond at plano dot net |
+----------------------------------------------------------------+

Oct 24 '08 #8

Keith Thompson

Charles Richmond <fr*****@tx.rr.comwrites:

CBFalconer wrote:
>la************@siemens.com wrote:
>>Paul Hsieh <we******@gmail.comwrote:

No C compiler's rand() that I have ever seen has, by itself, a
worse effect on random output than that.
Then you've never seen the truely bad BSD rand() that fostered
most of the paranoia about rand. With it, rand() % 1 generated
the sequence 0, 1, 0, 1, 0, 1, ....
FYI the value of "rand() % 1" is identically 0. Except it may be
considerably slower than just writing "0".

Not if your compiler has a good optimizer... ;-)

Maybe. rand() has side effects, so a call to it can't be optimized
away, *unless* the compiler can prove that the side effects don't
affect the program's output.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 24 '08 #9

Eric Sosman

Keith Thompson wrote:

Charles Richmond <fr*****@tx.rr.comwrites:
>CBFalconer wrote:
>>la************@siemens.com wrote:
Paul Hsieh <we******@gmail.comwrote:

No C compiler's rand() that I have ever seen has, by itself, a
worse effect on random output than that.
Then you've never seen the truely bad BSD rand() that fostered
most of the paranoia about rand. With it, rand() % 1 generated
the sequence 0, 1, 0, 1, 0, 1, ....
FYI the value of "rand() % 1" is identically 0. Except it may be
considerably slower than just writing "0".

Not if your compiler has a good optimizer... ;-)

Maybe. rand() has side effects, so a call to it can't be optimized
away, *unless* the compiler can prove that the side effects don't
affect the program's output.

srand( rand() % 1 );

--
Eric Sosman
es*****@ieee-dot-org.invalid

Oct 24 '08 #10

William Hughes

On Oct 22, 11:48 pm, Paul Hsieh <websn...@gmail.comwrote:

On Oct 22, 6:04 pm, Rich Fife <rf...@amug.orgwrote:

Quick rand() question:

I know you're not supposed to use "rand() % 1024" for instance,
because it focuses on the lower bits. However, it seems to me
that given that the argument is not a power of two (or near a
power of two), that this is not an issue. The upper bits will
participate equally in the result with the lower. Am I missing
something?

Yes, you are missing some mathematical analysis to back up what you
just said. If you do (rand() % 1023) on Microsoft Visual C++ or
WATCOM C/C++, 32 of the possible outputs will have an extra 3% bias no
matter how good your random number generator is.

Well, 3% certainly meets the informal meaning of small. If your
problem
is such that you are worried about the 3% you should probably be more
worried about the fact that the rand() you are using has an output
space of only 16 bits.

No C compiler's
rand() that I have ever seen has, by itself, a worse effect on random
output than that.

Then you have not seen a rand() implementation that switched parity
on each call. I understand such an implementation not only existed
but
was relatively widespread.

>
The C.L.C. FAQ about this gives extremely misleading advice on this
point and it should seriously be ignored.

No. Using the advice given in the C.L.C. means that you will get
reasonable
results, even if rand() implementation is poor, as long as rand()
produces
integers that are more or less uniformly distributed in 0...RAND_MAX.
If you use the "rand() % n" technique you have no such guarantee.
The bias is small. Do not confuse detectablity with importance.
(The use of "signifcance" in the term "statistical significance"
leads many people astray).

- William Hughes

Oct 24 '08 #11

Nate Eldredge

William Hughes <wp*******@hotmail.comwrites:

>No C compiler's
rand() that I have ever seen has, by itself, a worse effect on random
output than that.

Then you have not seen a rand() implementation that switched parity
on each call. I understand such an implementation not only existed
but
was relatively widespread.

Right you are. Here is the rand() implementation from the very
influential 4.4BSD-Lite.

#define RAND_MAX 0x7fffffff

static u_long next = 1;

int
rand()
{
return ((next = next * 1103515245 + 12345) % ((u_long)RAND_MAX + 1));
}

void
srand(seed)
u_int seed;
{
next = seed;
}

Oct 24 '08 #12

Paul Hsieh

On Oct 23, 7:45*pm, William Hughes <wpihug...@hotmail.comwrote:

On Oct 22, 11:48 pm, Paul Hsieh <websn...@gmail.comwrote:
On Oct 22, 6:04 pm, Rich Fife <rf...@amug.orgwrote:
Quick rand() question:

I know you're not supposed to use "rand() % 1024" for instance,
because it focuses on the lower bits. *However, it seems to me
that given that the argument is not a power of two (or near a
power of two), that this is not an issue. *The upper bits will
participate equally in the result with the lower. *Am I missing
something?

Yes, you are missing some mathematical analysis to back up what you
just said. *If you do (rand() % 1023) on Microsoft Visual C++ or
WATCOM C/C++, 32 of the possible outputs will have an extra 3% bias no
matter how good your random number generator is.

Well, 3% certainly meets the informal meaning of small.

You might like to explain that to the average casino. Card counting
in black
jack gives player around a 1% advantage over the house, and the
casinos kick
out such people whenever they are discovered. People who attack
defective
casino games rely on people with attitudes like yours.

Even if you are implementing something as simple as a 1d20 (where each
choice
itself is only 5%) in a dungeons and dragons game, the players will
easily see
that bias over time.

[...]*If your
problem is such that you are worried about the 3% you should probably
be more worried about the fact that the rand() you are using has an
output space of only 16 bits.

That statement doesn't follow any line of logic of any relevance. If
you
care, then you care, and you want to get a correct ranged random
number
generator. If you go up to 32 bits, but still have bias that's just
a
little smaller, how can you be happy? And if you want to write
portable
code, then what are you going to do?

No C compiler's
rand() that I have ever seen has, by itself, a worse effect on random
output than that.

Then you have not seen a rand() implementation that switched parity
on each call. *I understand such an implementation not only existed
but was relatively widespread.

True enough, but this fundamentally comes from the lack of analysis.
The
C.L.C. FAQ just continues this tradition by failing to give effective
analysis of the problem.

The C.L.C. FAQ about this gives extremely misleading advice on this
point and it should seriously be ignored.

No. *Using the advice given in the C.L.C. *means that you will get
reasonable results, even if rand() implementation is poor, as long
as rand() produces integers that are more or less uniformly
distributed in 0...RAND_MAX.

Did you know that a simple counter will produce numbers that are
exactly
uniformly distributed in 0 ... RAND_MAX? You know, basic
understanding is
sometimes actually useful on occasion.

If you use the "rand() % n" technique you have no such guarantee.

The technique shown in the CLC FAQ also has no such guarantee. Its
totally besides the point.

The bias is small.

Define small. If you want to test how often a hash function will map
to
a common bucket either (rand() % n) or
(rand() * (double) n / (RAND_MAX + 1)) will make no difference. It
will
produce worthless results no matter what.

[...]*Do not confuse detectablity with importance.

I assure you, I am not the one confused. The C.L.C. FAQ is giving a
solution that assumes a policy where low bit determinism is a worse
problem than pure measurable bias and also a worse problem than a
simple
range issue. In fact the CLC FAQ is promoting confusion by not
explaining the issue correctly and consequently how one might deal
with
the problem.

(The use of "significance" in the term "statistical significance"
leads many people astray).

What has that got to do with anything? If you wish to test something
with a very small probability which is lower than the bias being
introduced by such short-sighted techniques then what good is the
C.L.C.
FAQs discussion on the subject?

Who actually wants to use a PRNG which is biased or incapable of even
measuring what you want? The subject deserves to be discussed
usefully.
The C.L.C. just harps one single anomaly that has resulted for the
weakness of the ANSI C standard.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Oct 24 '08 #13

Richard Bos

Paul Hsieh <we******@gmail.comwrote:

The C.L.C. FAQ about this gives extremely misleading advice on this
point and it should seriously be ignored. If you want to seriously
deal with random numbers just read my page about it:

Or possibly don't. If the code on that page is as rotten as the HTML, I
wouldn't trust it.

Then, search this and other newsgroups for posts by George Marsaglia,
who _really_ knows what a good PRNG is like.

Richard

Oct 24 '08 #14

William Hughes

On Oct 23, 11:29 pm, Paul Hsieh <websn...@gmail.comwrote:

On Oct 23, 7:45 pm, William Hughes <wpihug...@hotmail.comwrote:

On Oct 22, 11:48 pm, Paul Hsieh <websn...@gmail.comwrote:
On Oct 22, 6:04 pm, Rich Fife <rf...@amug.orgwrote:
Quick rand() question:

I know you're not supposed to use "rand() % 1024" for instance,
because it focuses on the lower bits. However, it seems to me
that given that the argument is not a power of two (or near a
power of two), that this is not an issue. The upper bits will
participate equally in the result with the lower. Am I missing
something?

Yes, you are missing some mathematical analysis to back up what you
just said. If you do (rand() % 1023) on Microsoft Visual C++ or
WATCOM C/C++, 32 of the possible outputs will have an extra 3% bias no
matter how good your random number generator is.

Well, 3% certainly meets the informal meaning of small.

You might like to explain that to the average casino.

The explanation goes "Even differences that are usually
considered small, e.g. 3%. can be very very important."
If the casino is still in business they will tell
me to teach my Grandmother to suck eggs.

Card counting
in black
jack gives player around a 1% advantage over the house,

Teach your Grandmother to suck eggs

and the casinos kick
out such people whenever they are discovered. People who attack
defective casino games rely on people with attitudes like yours.

Even if you are implementing something as simple as a 1d20 (where each
choice
itself is only 5%) in a dungeons and dragons game, the players will
easily see
that bias over time.

Piffle (even if we are talking about a 3% bias and not
the less than .1% bias you get with rand()%20 ).
And even if someone took the trouble to notice (e.g. tabulated
1000's of rolls and applied statistical techniques) they would notice
that the bias had no practical import.

>
[...] If your
problem is such that you are worried about the 3% you should probably
be more worried about the fact that the rand() you are using has an
output space of only 16 bits.

That statement doesn't follow any line of logic of any relevance. If
you
care, then you care, and you want to get a correct ranged random
number
generator. If you go up to 32 bits, but still have bias that's just
a
little smaller, how can you be happy?

E.g. I am interested at tail distributions. The performance of
my random generator has gone from terrible to reasonable.

>And if you want to write portable code, then what are you going to do?

Either I don't need much, in which case I can use
the system rand() or I provide my_rand().

>

No C compiler's
rand() that I have ever seen has, by itself, a worse effect on random
output than that.

Then you have not seen a rand() implementation that switched parity
on each call. I understand such an implementation not only existed
but was relatively widespread.

True enough, but this fundamentally comes from the lack of analysis.
The
C.L.C. FAQ just continues this tradition by failing to give effective
analysis of the problem.

The C.L.C. FAQ about this gives extremely misleading advice on this
point and it should seriously be ignored.

No. Using the advice given in the C.L.C. means that you will get
reasonable results, even if rand() implementation is poor, as long
as rand() produces integers that are more or less uniformly
distributed in 0...RAND_MAX.

Did you know that a simple counter will produce numbers that are
exactly
uniformly distributed in 0 ... RAND_MAX?

Indeed, one needs more than uniformly distributed.
The basic point, that the rand() implementation needs
to be really bad to produce unreasonable results
with the FAQ technique, but the rand() implementation
only needs to be a bit bad to produce unreasonable
results with the rand()%n technique remains.

>You know, basic
understanding is
sometimes actually useful on occasion.

If you use the "rand() % n" technique you have no such guarantee.

The technique shown in the CLC FAQ also has no such guarantee. Its
totally besides the point.

The bias is small.

Define small. If you want to test how often a hash function will map
to
a common bucket either (rand() % n) or
(rand() * (double) n / (RAND_MAX + 1)) will make no difference. It
will
produce worthless results no matter what.

No. A test that looks for perfection vs bias, will find a bias,
but since there are lots and lots of ways of introducing a
insignficant (note I did _not_ say "statistically insignificant")
bias, a test that looks for perfection vs bias is stupid.

[...] Do not confuse detectablity with importance.

I assure you, I am not the one confused. The C.L.C. FAQ is giving a
solution that assumes a policy where low bit determinism is a worse
problem than pure measurable bias and also a worse problem than a
simple
range issue. In fact the CLC FAQ is promoting confusion by not
explaining the issue correctly and consequently how one might deal
with
the problem.

(The use of "significance" in the term "statistical significance"
leads many people astray).

What has that got to do with anything? If you wish to test something
with a very small probability which is lower than the bias being
introduced by such short-sighted techniques then what good is the
C.L.C.
FAQs discussion on the subject?

Who actually wants to use a PRNG which is biased

[I recall a wonderful poem about an archer who claimed
he was best, because, although he never came near
the target, he was unbiased, Lack of bias is not
everything!]
Lots of people don't care a fig. If I want to shuffle cards for a
bridge game
then I don't care about a 3% bias. (If I want to shuffle cards for
a computer poker game, then the fact that the average rand()
is about as cryptographicly secure as a Ceasar cypher is more
important than a 3% bias). If the wumpus alternates between
being in the left half of the maze and the right half of the maze,
I care a lot!

The CLC FAQ solves a real (although probably now historical) problem.

The system rand() may not be suitable for many applicatins.
Fixing one problem, which is not a problem in most applications
where the system rand() is suitable, does not magically make rand()
produce high quality random numbers.

- William Hughes

Oct 24 '08 #15

user923005

On Oct 24, 5:04*am, William Hughes <wpihug...@hotmail.comwrote:
[snip]

The CLC FAQ solves a real (although probably now historical) problem.

The system rand() may not be suitable for many applicatins.
Fixing one problem, which is not a problem in most applications
where the system rand() is suitable, does not magically make rand()
produce high quality random numbers.

An easy solution for this problem is to use the Mersenne Twister.
True, the quality of C compiler packaged rand() implementations is
spotty.
Equally true, the excellence of the open source Mersenne Twister is
well documented.

Suggestions to the ANSI C committee:
1. Make a version of the Mersenne Twister the standard C library
implementation.
2. Have versions that produce int (0-INT_MAX), long long (0-
LLONG_MAX), and double (0.0-1.0) outputs.

A final suggestion would be to carefully analyze available C library
source to find a "best of breed" class of standard library functions
{for the subset of functions that are fully portable} and have this
code base become a default library implementation. Then, when the
compiler vendor wants to write a snappy compiler, the workload will
consist only of files that improve upon the default set.

Oct 24 '08 #16

rand() % n Revisited

Similar topics