Better Rand() Algorithim

On Tue, 09 Sep 2003 20:13:45 GMT, Ben Justice <No*****@Justice.gov>
wrote:

For a program in c, I need some random numbers for a system were people are
placing bets. This is not a commerical project btw. Generally, I tend to
rely on things from the standard library, because they're written by people
with skills far above mine. Hence, I've always used rand() and FALSELY
assumed it could produce unpredictable random numbers and could be used in
many situations even if security was an issue.

Firstly, lets assume for the moment I have a way to get a random and
unpredictable seed for a pseudo random number generator. Then is it
possible to get a reasonably secure shuffle in which the user cannot
predict all the players cards after seeing the first five or so in the
series?

Yes.
For your purposes, the easiest is probably to get a copy of
MD5 or SHA1 in C.
Grab a unsigned long sized chunk of the output
and reduce with something like this.
#define RAND_MAX_COMBO ((unsigned long) 4294967295)

/*
range

returns a number between 0 and N-1

*/

int range(int n) {
unsigned long div;
int r;

div = RAND_MAX_COMBO/n;

do {
r = rand_combo() / div;
} while (r >= n);

return r;
}

(replace the rand_combo() function with your md5/sha1 function)

Keep the output, and feed it (plus the time and any other
information you can get easily)
back into the function the next time it's called.

Secondly, Is there a reasonable method for determing an unpredictatble
seed? I understand seeding from the timer is unsecure.

Maybe.
A quick and dirty approach is to time keystrokes as accurately as you
can, then take the resultant time value and hash it.
http://www.helsbreth.org/random/keyrand.html
has some example DOS code that's in the public domain.

That assumes of course, that you have a reasonable approximation
of kbhit(), which you don't if you're running, say, Linux.

Fortunately, most 'nix systems have /dev/urandom (and /dev/random,
but don't use that) so there's no need to reinvent the wheel -

drfp = fopen("/dev/urandom", "r");
fread(&unsigned_long_value, sizeof(unsigned long), 1, drfp);
fclose(drfp);

will get you a crypto quality unsigned long.
Finally, is there a better rand() facsimile in which the source code is
available in the public domain? One which would suit my needs.

TIA Bj

Lots.

http://www.helsbreth.org/random/shuffle.html
has one example of a pseudo random number generator suitable for
most purposes, but it wouldn't work very well for you,
because you need a generator which is unpredictable. (crypto-secure)

Google and you can find thousands more.

Scott Nelson <sc***@helsbreth.org>

Nov 13 '05 #12

Gordon Burditt

>For a program in c, I need some random numbers for a system were people are

placing bets.
If it's for real money, people will go to great effort to break
your system. Some people with time on their hands may do so anyway
even if it's not for real money (think about fraterity pranks).
This is not a commerical project btw. Generally, I tend to
rely on things from the standard library, because they're written by people
with skills far above mine. Hence, I've always used rand() and FALSELY
assumed it could produce unpredictable random numbers and could be used in
many situations even if security was an issue.
If you assume things where security is an issue, you're going to
get burned. Just watch what happens to Homeland Security: they
decided to use a Microsoft OS on the desktop, since it's obviously
safe against worms and viruses.
Firstly, lets assume for the moment I have a way to get a random and
unpredictable seed for a pseudo random number generator. Then is it
possible to get a reasonably secure shuffle in which the user cannot
predict all the players cards after seeing the first five or so in the
series?
There are 52! possible orders of a shuffled deck of cards. That
requires at least 226 bits of random, unpredictable seed. And your
replacement rand() function needs to actually use that many bits.
The requirements may be less if, for example, you are dealing poker
hands and the order of dealing the cards is not exposed.
Secondly, Is there a reasonable method for determing an unpredictatble
seed? I understand seeding from the timer is unsecure.
Well, maybe you could use an MD5 hash (128 bits) of the stock market
pages of the Wall Street Journal concatenated with a MD5 hash of
the front page of the New York Times (that gets you 256 bits, total).
Of course, you can only do this on days the stock market is open,
once each day. And it requires a lot of data entry. And you'd have
to either keep secret where you're getting your random data, or get
your copies of the papers before anyone else does.
Finally, is there a better rand() facsimile in which the source code is
available in the public domain? One which would suit my needs.

The BSD random() function might qualify. It can be used with enough
state bits. The source code is not public domain but the BSD
license is fairly liberal. See, for example, a FreeBSD source
distribution.

Gordon L. Burditt

Nov 13 '05 #13

On 10 Sep 2003 06:33:10 GMT, go****@gerbil.hammy.lonestar.org (Gordon
Burditt) wrote:

For a program in c, I need some random numbers for a system were people are
placing bets.

[snip happens]

Firstly, lets assume for the moment I have a way to get a random and
unpredictable seed for a pseudo random number generator. Then is it
possible to get a reasonably secure shuffle in which the user cannot
predict all the players cards after seeing the first five or so in the
series?

There are 52! possible orders of a shuffled deck of cards. That
requires at least 226 bits of random, unpredictable seed. And your
replacement rand() function needs to actually use that many bits.
The requirements may be less if, for example, you are dealing poker
hands and the order of dealing the cards is not exposed.

Suppose just for arguments sake that you used 64 bit PRNG.
Now use it to shuffle a deck and look at the top five cards.
Even though you can eliminate over 99.9999% of the possible
shuffles, you still have over a billion possible shuffles left.

With 128 bits, even examining the top 15 cards leaves
more possibilities than can reasonably be enumerated in
the span of a poker hand.

If instead of a PRNG, we use a CSPRNG, then even knowing
all the cards, you still can't determine the internal
state of the CSPRNG. Sure, you can do it /in theory/
but in practice there's just not enough compute power in
the world.

Scott Nelson <sc***@helsbreth.org>

Nov 13 '05 #14

Gordon Burditt

>Your compiler's rand may have weaknesses. The question is, when properly

seeded, can a would-be attacker exploit the weaknesses without doing an
exhaustive search. If an exhaustive search is needed, then no other random
number generator with the same size seed would be any better.
And 32 bits is nowhere near enough to shuffle a deck of 52 cards.
You need at least 226 bits (log base 2(52!) = 225.581 bits) of
random seed for a random shuffle.
Below is a simple program. It shuffles a deck card of card, and then deals
the first 50 cards face up. Your challenge is to guess the 51st card. A
priori you have a 50% chance. Can you with the knowledge of rand and the
code used by the program, but not the seed, develop an algorithm for winning
every time?
Ok, here's the algorithm, brute force (takes a heck of a lot of
memory, or speed, or both): list all the sequences generated by a
32-bit seed. Eliminate all those that don't begin with the first
few cards you do see. Pick one of the sequences that's left if
there's more than one, otherwise, pick the one sequence. If *NO*
sequence matches, someone's cheating.

Let's try it a bit differently. Deal out the first *SEVEN* cards
face up, and the rest face down. Apply the above algorithm. There
is less than a 1% chance you have more than 1 choice of sequences.
(log base 2 ( 52*51*50*49*48*47) = 39.29 bits. Subtract 32 bits
for the number of seeds. 7.27 bits gives about 156 possible sequences
of 7 cards per seed, so if rand() mixes up the numbers well, the
chances of hitting 2 seeds that produce the same sequence of 7 cards
is about 1/156.) And you can 99% of the time guess ALL of the rest
of the cards ...

With *SIX* cards face up you have about a 70% chance of being able
to guess *ALL* of the rest of the cards.
What if rand is only re-seeded every fifth shuffle?
I have to deal with 5 sets of sequences, if I don't know what shuffle
the current one is. Ok, now I have a 95% chance of guessing ALL
the rest of the cards from the first seven - and then I'll know
which shuffle you are on and what seed you are using. If you
aren't going to reseed next time, if I guessed right on this
first shuffle, I'll get all the others 100% certain until you reseed.
The program is seeded with time(0) which is not secure. I did that so that
the program is portable and functional. If your compiler supports assembly
and you are running on an x86, you can try to turn on the code that reads
the lower 32 bits of the timing register of the CPU. Because this register
is update at very high frequency, it appears to me to be a good source of
entropy for the random number generator.

32 bits is not enough. You'll need a MUCH faster processor to make
using the bottom 226 bits of the timer register practical (that is,
counts through all 226 bits and finishes before man is extinct).

Gordon L. Burditt

Nov 13 '05 #15

John E. Hadstate

"Ben Justice" <No*****@Justice.gov> wrote in message
news:Xn*******************************@68.12.19.6. ..

For a program in c, I need some random numbers for a system were people are placing bets. This is not a commerical project btw. Generally, I tend to
rely on things from the standard library, because they're written by people with skills far above mine. Hence, I've always used rand() and FALSELY
assumed it could produce unpredictable random numbers and could be used in
many situations even if security was an issue.

The typical rand() function from the C RTL belongs to a class called "Linear
Congruential" PRNGs. LCPRNGs are predictable with just a few consecutive
examples, but that's not their worst problem. To see what's really wrong
with them, right a quick-and-dirty program to print out 10 or 15 rand()
outputs as strings of bits. Look closely at the least significant bit of
each output. Do you see a pattern? Now look at the next LSB. Notice
anything?

The bits of a LCPRNG appear more random as you move toward the MSB. Thus,
all those programmers who code (rand() & 1) are shooting themselves in the
foot. If you must use rand(), use the MSB!

Nov 13 '05 #16

Carsten Hansen

"Gordon Burditt" <go***********@sneaky.lerctr.org> wrote in message
news:bj********@library2.airnews.net...

Your compiler's rand may have weaknesses. The question is, when properly
seeded, can a would-be attacker exploit the weaknesses without doing an
exhaustive search. If an exhaustive search is needed, then no other randomnumber generator with the same size seed would be any better.

And 32 bits is nowhere near enough to shuffle a deck of 52 cards.
You need at least 226 bits (log base 2(52!) = 225.581 bits) of
random seed for a random shuffle.
Below is a simple program. It shuffles a deck card of card, and then dealsthe first 50 cards face up. Your challenge is to guess the 51st card. A
priori you have a 50% chance. Can you with the knowledge of rand and the
code used by the program, but not the seed, develop an algorithm for winningevery time?

Ok, here's the algorithm, brute force (takes a heck of a lot of
memory, or speed, or both): list all the sequences generated by a
32-bit seed. Eliminate all those that don't begin with the first
few cards you do see. Pick one of the sequences that's left if
there's more than one, otherwise, pick the one sequence. If *NO*
sequence matches, someone's cheating.

Let's try it a bit differently. Deal out the first *SEVEN* cards
face up, and the rest face down. Apply the above algorithm. There
is less than a 1% chance you have more than 1 choice of sequences.
(log base 2 ( 52*51*50*49*48*47) = 39.29 bits. Subtract 32 bits
for the number of seeds. 7.27 bits gives about 156 possible sequences
of 7 cards per seed, so if rand() mixes up the numbers well, the
chances of hitting 2 seeds that produce the same sequence of 7 cards
is about 1/156.) And you can 99% of the time guess ALL of the rest
of the cards ...

With *SIX* cards face up you have about a 70% chance of being able
to guess *ALL* of the rest of the cards.
What if rand is only re-seeded every fifth shuffle?

I have to deal with 5 sets of sequences, if I don't know what shuffle
the current one is. Ok, now I have a 95% chance of guessing ALL
the rest of the cards from the first seven - and then I'll know
which shuffle you are on and what seed you are using. If you
aren't going to reseed next time, if I guessed right on this
first shuffle, I'll get all the others 100% certain until you reseed.
The program is seeded with time(0) which is not secure. I did that so thatthe program is portable and functional. If your compiler supports assemblyand you are running on an x86, you can try to turn on the code that reads
the lower 32 bits of the timing register of the CPU. Because this registeris update at very high frequency, it appears to me to be a good source of
entropy for the random number generator.

32 bits is not enough. You'll need a MUCH faster processor to make
using the bottom 226 bits of the timer register practical (that is,
counts through all 226 bits and finishes before man is extinct).

Gordon L. Burditt

I agree with most of what you are writing. But we disagree on the importance
(in terms of security).
The point that I was trying to make was that you have to analyze where your
system is vulnerable to an attack.
You have correctly identified that for the code that I posted, it lies in
having only 32-bits of entropy for the seed. This makes an exhaustive search
possible. Using the Mersenne Twister with a 32-bits seed would have made no
difference.

So, it is not the design of the random number generator which is the problem
here.

Exhaustive search of 2**32 (approx. 4 billion) is doable on today's
computers. If you only show the first 5 cards face up, for each possibility
you would have to do at least 48 calls to rand + 48 mod operations. Now we
are taking it out of doing it real time. This would probably be safe for
non-commercial use, which is a what the OP was talking about.

For commercial use I would use two 32-bits seeds. Only a tiny, tiny subset
of all shuffles would be possible. After all 64 is smaller than 226. But you
can't do an exhaustive search of 2**64 possibilities. If you do 3 billion
instructions per second, and each search takes just one instructions, it
would take more than 200 years. That is sufficient for practical security.
Use the first 32-bits seed to determine the initial order of the cards
(based on rank). The second 32-bits seed you use as before.

Computer security is based on things being infeasible, not on things being
impossible. E.g., some security systems are based on it being infeasible to
factor the product of two big primes (say 120 digits each). We of course all
know a simple algorithm for doing factorization.

Carsten Hansen

Nov 13 '05 #17

Alex

Nick <ni**@microsoft.com> wrote:

Arthur J. O'Dwyer wrote:
On Tue, 9 Sep 2003, Steve Zimmerman wrote:

Ben,

Thank you for your post.

I do not know if this is a reliable shuffle or not. Others will
be able to advise you on that better than I.

--Steve

Okay, Steve, I hate to say it, but your incessant posts are getting
very slightly on my nerves. I can't speak for anyone else, nor
can I do anything about it, and I know that at least you're getting
some pretty good practice at writing C code, but still... If you
don't know whether a solution is correct or not, *please* try looking
for the answer in the archives or on the Web *before* posting!

The below program is incorrect, and contains a lot of extraneous
material that can only confuse or annoy the OP. Even in answers,
please post the smallest complete program that exhibits the
solution. :)

-Arthur

Ok, I think this is a little harsh. I have seen no hostile action from
Steve in this group, and, even if his posts seem overmuch sometimes, the
purpose of this group is to help people learn. Programmers make
mistakes; beginners and even intermediates make lots o' mistakes. Show
someone the problem enough times, and they'll eventually start seeing
the errors.

I agree with Arthur. I think that it is much more appropriate to get
this kind of practice in:

alt.comp.lang.learn.c-c++

Steve: Not that we don't like you, but indeterminate answers tend
to to confuse those who are learning. If you wish your code to be
examined here, then post it separately and note the things that you
are unsure of.

Alex

Nov 13 '05 #18

pete

Richard Heathfield wrote:

Malcolm wrote:
void memshuffle(void *ptr, int N, size_t size)

Invasion of implementation namespace.

mem_shuffle()

memswap(hck + size * i, hck + size * (rand() % N), size);

And again.

mem_swap()

--
pete

Nov 13 '05 #19

Kevin Easton

In comp.lang.c Carsten Hansen <ha******@worldnet.att.net> wrote:
[...card gaming program...]

Exhaustive search of 2**32 (approx. 4 billion) is doable on today's
computers. If you only show the first 5 cards face up, for each possibility
you would have to do at least 48 calls to rand + 48 mod operations. Now we
are taking it out of doing it real time. This would probably be safe for
non-commercial use, which is a what the OP was talking about.

For commercial use I would use two 32-bits seeds. Only a tiny, tiny subset
of all shuffles would be possible. After all 64 is smaller than 226. But you
can't do an exhaustive search of 2**64 possibilities. If you do 3 billion
instructions per second, and each search takes just one instructions, it
would take more than 200 years. That is sufficient for practical security.
Use the first 32-bits seed to determine the initial order of the cards
(based on rank). The second 32-bits seed you use as before.

I agree that it is secure, but it isn't the game of poker (or blackjack,
or whatever) - in the true game of poker all shuffles are possible, but
in this version with a 64 bit seed that's not the case. I also agree
that it's a game statistically indistinguishable from poker (from the
point of view of the player - he would have to play the game an
unfeasible number of times in order to show that it's likely that only a
small subset of shuffles are possible), but "a game statistically
indistinguishable from poker" and "poker" are still two different
things.

- Kevin.

Nov 13 '05 #20

Gordon Burditt

>> And 32 bits is nowhere near enough to shuffle a deck of 52 cards.

You need at least 226 bits (log base 2(52!) = 225.581 bits) of
random seed for a random shuffle.
.... Ok, here's the algorithm, brute force (takes a heck of a lot of
memory, or speed, or both): list all the sequences generated by a
32-bit seed. Eliminate all those that don't begin with the first
few cards you do see. Pick one of the sequences that's left if
there's more than one, otherwise, pick the one sequence. If *NO*
sequence matches, someone's cheating.

Let's try it a bit differently. Deal out the first *SEVEN* cards
face up, and the rest face down. Apply the above algorithm. There
is less than a 1% chance you have more than 1 choice of sequences.
(log base 2 ( 52*51*50*49*48*47) = 39.29 bits. Subtract 32 bits
for the number of seeds. 7.27 bits gives about 156 possible sequences
of 7 cards per seed, so if rand() mixes up the numbers well, the
chances of hitting 2 seeds that produce the same sequence of 7 cards
is about 1/156.) And you can 99% of the time guess ALL of the rest
of the cards ...
The point that I was trying to make was that you have to analyze where your
system is vulnerable to an attack.
You have correctly identified that for the code that I posted, it lies in
having only 32-bits of entropy for the seed. This makes an exhaustive search
possible. Using the Mersenne Twister with a 32-bits seed would have made no
difference.

So, it is not the design of the random number generator which is the problem
here.
Agreed.
Exhaustive search of 2**32 (approx. 4 billion) is doable on today's
computers. If you only show the first 5 cards face up, for each possibility
you would have to do at least 48 calls to rand + 48 mod operations. Now we
are taking it out of doing it real time. This would probably be safe for
non-commercial use, which is a what the OP was talking about.
Consider the possibility of a pre-computed table (on disk). Look
at the first card. Index into a table. This either tells you the
seed in use or which table to go to for further cards. Continue
with the next cards. This is likely to take 1 disk access per table
(7 max per hand). Once you have the seed, you can run the sequence
with the right seed (once, not once per 2**32). This can be done
much faster than inputting the cards into a laptop. Note that even
if you have incomplete tables, you'll either know the seed or know
you haven't got it, so you could bet accordingly. You could still
make money if you only know all the cards 25% of the time, and
otherwise play half-decently without cheating.

A 7-D array of the seeds for the first 7 cards takes about 2700 GB.
(That's a bit much for current laptops, but certainly not impossible
for a NFS-based storage farm and not at all unusual for a commercial
USENET news server.) Since the seed is often determined by the 6th
card or earlier, using tables that contain up to 52 elements of
either the seed or a pointer to the next table to try can leave out
entries for already-determined data. I'm not sure how much storage
this would save. A complete set of data for SIX cards would take
58GB, and I'm guessing that since you have a 70% chance of a unique
answer for 6 cards, with good techniques of sparse-array storage,
you could put the tables in around 58GB, which is within range for
today's laptops. Except for inputting the cards, I suspect this
could run faster than 1000 X real time even on a last-generation
laptop, where you're giving presumed humans reasonable time to
decide on their bet.
For commercial use I would use two 32-bits seeds. Only a tiny, tiny subset
of all shuffles would be possible. After all 64 is smaller than 226. But you
can't do an exhaustive search of 2**64 possibilities. If you do 3 billion
instructions per second, and each search takes just one instructions, it
would take more than 200 years. That is sufficient for practical security.
Use the first 32-bits seed to determine the initial order of the cards
(based on rank). The second 32-bits seed you use as before.

Be careful that the use of 2 32-bit seeds doesn't collapse into a lot less
than 2**64 possibilities. In pathological situations, it could leave
you no better off than 1 32-bit seed.

Gordon L. Burditt

Nov 13 '05 #21

On Tue, 09 Sep 2003 20:13:45 +0000, Ben Justice wrote:

For a program in c, I need some random numbers for a system were people are
placing bets. This is not a commerical project btw. Generally, I tend to
rely on things from the standard library, because they're written by people
with skills far above mine. Hence, I've always used rand() and FALSELY
assumed it could produce unpredictable random numbers and could be used in
many situations even if security was an issue.

Firstly, lets assume for the moment I have a way to get a random and
unpredictable seed for a pseudo random number generator. Then is it
possible to get a reasonably secure shuffle in which the user cannot
predict all the players cards after seeing the first five or so in the
series?

Secondly, Is there a reasonable method for determing an unpredictatble
seed? I understand seeding from the timer is unsecure.

Finally, is there a better rand() facsimile in which the source code is
available in the public domain? One which would suit my needs.

First of all, when you are even slightly worried about attackers you
should use a cryptographical random number generator. Several replies
in this have proposed random number generators that are not
cryptographically strong, in most cases these are very easy to break.

I would strongly advise you to look at Yarrow
(http://www.counterpane.com/yarrow.html). The article gives a good
analysis of the problems with generating random numbers. You can also
find an Windows implementation of Yarrow for on the site, but I would
not recommend it. It does not fully implement the algorithm that is
described in the paper, and the code is not always very readable and
certainly not portable to other systems.

When your code runs on a Unix system that has /dev/random and/or
/dev/urandom, you should use these.

When you are developing for another platform you will probably
have to write it yourself.

A simple solution that is probably sufficiently secure for your
application is the following: Take a strong block cipher (AES) and use
it to encrypt a counter that is incremented after each time it is
used. An attacker will not be able to predict the outputs, unless he
is able to break the block cipher (highly unlikely) or has other means
to determine the key. A small practical problem for your
implementation is that the size of the counter is equal to the block
size (that is 16 bytes for AES).

This random number generator must be initialised by choosing a random
value for the key. Now this is the hard part. You should use a value
that cannot be predicted by an attacker. The easiest solution is to
find as many sources of unpredictable information as is practically
feasible. Most operating systems have internal statistical counters
that are hard to predict. Now put all the results in one large buffer
and apply a cryptographical hash function (SHA-1 or SHA-256).
Cryptographical hash functions have the nice properties that all bits
of the output depend on the entire input. So even when an attacker can
guess most of the inputs he won't be able to predict the outputs as
long as there are sufficient bits in the input that he can't guess.

It should not be too difficult to implement this, you can easily find
code on the Web that implements AES and SHA.

The system should be secure as long as an attacker is not able to
guess the initial key (that's why you should use as many sources as
possible) and if there is no way that he could steal the generated key
from the running program, e.g. by examining the memory or the swap
file.

For dealing the cards I would suggest that you simply use the
following algorithm.

Take a "deck" that initially contains all the cards.

At each step randomly select one of the cards from the
deck and remove that card from the deck.

This algorithm is obviously correct (unlike some of the other
proposals in this thread). For this algorithm you need a random number
generator that generates uniform numbers between 0 and n-1 where n <=
52. A few days ago I posted some code that converts between random
number generator with different ranges that you could use.

greetings,

Ernst Lippe

Nov 13 '05 #22

Ernst Lippe wrote:

Take a "deck" that initially contains all the cards.

At each step randomly select one of the cards from the
deck and remove that card from the deck.

This algorithm is obviously correct (unlike some of the other
proposals in this thread). For this algorithm you need a random number
generator that generates uniform numbers between 0 and n-1 where n <=
52. A few days ago I posted some code that converts between random
number generator with different ranges that you could use.

Bad idea. If you do this it is easy to show that some of the draws will
be biased, unless your PRNG state is evenly divisible by every number
between 2 and 52. (Yes, this only shows the exsistance of a small bias.
But it also means that there is no easy way to prove that there isn't
a large bias for some N.) A better approach is as follows. Select 52
random numbers and assign one to each card. Sort the deck using the
random values as a key.

This has two advantages. First, your random number generator need not
be unbiased. (Or more formally stated, the choice of permutations will
only depend on the independence of the values generated.) Second you
can use tests of randomness such as the runs test to validate your
generator. If the generator passes the runs test, all other sins, if
any, won't show up in your output. (Outside of a flawed sort algorithm.)
--
Robert I. Eachus

"As far as I'm concerned, war always means failure." -- Jacques Chirac,
President of France
"As far as France is concerned, you're right." -- Rush Limbaugh

Nov 13 '05 #23

On Sat, 13 Sep 2003 21:07:45 GMT, "Robert I. Eachus"
<ri******@attbi.com> wrote:

Ernst Lippe wrote:
Take a "deck" that initially contains all the cards.

At each step randomly select one of the cards from the
deck and remove that card from the deck.

This algorithm is obviously correct (unlike some of the other
proposals in this thread). For this algorithm you need a random number
generator that generates uniform numbers between 0 and n-1 where n <=
52. A few days ago I posted some code that converts between random
number generator with different ranges that you could use.
Bad idea. If you do this it is easy to show that some of the draws will
be biased, unless your PRNG state is evenly divisible by every number
between 2 and 52.

If, as assumed, you have an RNG that generates uniform numbers
between 0 and n-1 where n <= 52, then there is no bias at all
with this method.

A better approach is as follows. Select 52
random numbers and assign one to each card. Sort the deck using the
random values as a key.

If you can generate duplicate values, then the method is biased.
If you can't generate duplicate values, then you'd need a generator
that can produce numbers between 0 and N-1 where N is as large as
the range of the generator.
This has two advantages. First, your random number generator need not
be unbiased. (Or more formally stated, the choice of permutations will
only depend on the independence of the values generated.)

Imagine that my RNG is me throwing a six sided die.
Are you really trying to assert that assigning the numbers
1 through 6 randomly to each card and then sorting will
produce an unbiased shuffle?
Scott Nelson <sc***@helsbreth.org>

Nov 13 '05 #24

Scott Nelson wrote:

If, as assumed, you have an RNG that generates uniform numbers
between 0 and n-1 where n <= 52, then there is no bias at all
with this method.
Definitely not true. Let's say for specificity that n = 100. How do you
assign the values 0..99 to cards so that the first card chosen is not
biased? The second card? For choosing the third card it is easy to see
several unbiased ways to do that. And so on.

A better approach is as follows. Select 52
random numbers and assign one to each card. Sort the deck using the
random values as a key.

If you can generate duplicate values, then the method is biased.
Actually, no. Assuming that the number of possible values generated by
the RNG is greater than or equal to 52, you can discard duplicates. In
practice if the number of values generated is much larger than 52, or if
the generator doesn't produce duplicates in short runs, you don't need
to do that. I use a BBS generator if use the generated values directly
there would be no chance of duplicates. But the generator is mapped to
double precision values between 0.0 and 1.0. The rounding to double
precision allows for the possibility of generating two identical values
in a short sequence. But if you were to check every list for duplicate
values--easy after the sorting--you might discard one sequence in every
2^48 or so.
Imagine that my RNG is me throwing a six sided die.
Are you really trying to assert that assigning the numbers
1 through 6 randomly to each card and then sorting will
produce an unbiased shuffle?

No. But if you throw the die three times per card (giving 216 possible
values) and eliminate duplicates by either method above, it won't matter
how loaded your die is.

--
Robert I. Eachus

"As far as I'm concerned, war always means failure." -- Jacques Chirac,
President of France
"As far as France is concerned, you're right." -- Rush Limbaugh

Nov 13 '05 #25

On Sun, 14 Sep 2003 20:26:28 GMT, "Robert I. Eachus"
<ri******@attbi.com> wrote:

Scott Nelson wrote:
If, as assumed, you have an RNG that generates uniform numbers
between 0 and n-1 where n <= 52, then there is no bias at all
with this method.

Definitely not true. Let's say for specificity that n = 100. How do you
assign the values 0..99 to cards so that the first card chosen is not
biased? The second card? For choosing the third card it is easy to see
several unbiased ways to do that. And so on.

You're asking me to assume that n <= 52, and that n = 100.

I'm guessing that once you realize why Ernst chose to use
N and not a specific number, you'll realize that his method
is perfectly unbiased.

A better approach is as follows. Select 52
random numbers and assign one to each card. Sort the deck using the
random values as a key.

If you can generate duplicate values, then the method is biased.

Actually, no. Assuming that the number of possible values generated by
the RNG is greater than or equal to 52, you can discard duplicates.

If you discard duplicates, then your method can't generate duplicates.

Discarding duplicates wasn't mentioned in your original method.
Bottom line:
You're assuming that Ernst suggested generating a random number
of a fixed sized.

He didn't.

He suggested first generating a random number from 0 to 51.
Then generating a random number from 0 to 50
Then generating a random number from 0 to 49
Then generating a random number from 0 to 48
Then generating a random number from 0 to 47
Then generating a random number from 0 to 46
Then generating a random number from 0 to 45
Then generating a random number from 0 to 44
Then generating a random number from 0 to 43
Then generating a random number from 0 to 42

And so on.

Scott Nelson <sc***@helsbreth.org>

Nov 13 '05 #26

On Sat, 13 Sep 2003 21:07:45 +0000, Robert I. Eachus wrote:

Ernst Lippe wrote:
Take a "deck" that initially contains all the cards.

At each step randomly select one of the cards from the
deck and remove that card from the deck.

This algorithm is obviously correct (unlike some of the other
proposals in this thread). For this algorithm you need a random number
generator that generates uniform numbers between 0 and n-1 where n <=
52. A few days ago I posted some code that converts between random
number generator with different ranges that you could use.
Bad idea. If you do this it is easy to show that some of the draws will
be biased, unless your PRNG state is evenly divisible by every number
between 2 and 52.

Did you actually look at the code that I posted?
(Somehow I have the feeling that you did not)

If you have found bugs I would prefer that you post a more detailed
follow-up to that post.
(Yes, this only shows the exsistance of a small bias.
But it also means that there is no easy way to prove that there isn't
a large bias for some N.) The procedure that I proposed does not have any bias when the
original inputs are indeed uniformly distributed.

I believe that may be referring to the "traditional" way of converting
a uniform random number r with range 0..m-1 to a random number with
range 0..n-1 by computing r mod n. But for this procedure it is easy
to prove that the bias is at most 1/m.
A better approach is as follows. Select 52
random numbers and assign one to each card. Sort the deck using the
random values as a key.

This has two advantages. First, your random number generator need not
be unbiased. (Or more formally stated, the choice of permutations will
only depend on the independence of the values generated.)
But how do you handle the case where two cards are assigned the
same random number? (I expect that many programmers will make the
wrong decision in this case)
Second you
can use tests of randomness such as the runs test to validate your
generator. If the generator passes the runs test, all other sins, if
any, won't show up in your output. (Outside of a flawed sort algorithm.)

These tests are pointless for the algorithm that I proposed.

My algorithm starts with encrypting a counter with a block cipher.
You don't need any statistical tests here, you should only test if the
block cipher implementation is correct and if the counter is always
incremented correctly. Statistical tests of the outputs are pointless,
because any statistical significant results would imply that the
underlying block cipher is broken

The second step is the conversion of this random number to a smaller
range 0..n-1. You should prove that the implementation is correct, my
implementation already contained invariants that could be used as a
basis for an even more rigorous proof for those who really want such
proofs. The only tests that you need is a simple test that all the
possible outputs occur with the same frequency.

Testing the shuffle algorithm is also very straightforward because it
is extremely likely that any errors in its implementation will show up
as duplicated cards in the hands that are dealt.

greetings,

Ernst Lippe

Nov 13 '05 #27

On Sun, 14 Sep 2003 20:26:28 +0000, Robert I. Eachus wrote:

Scott Nelson wrote:
If you can generate duplicate values, then the method is biased.
Actually, no. Assuming that the number of possible values generated by
the RNG is greater than or equal to 52, you can discard duplicates. In
practice if the number of values generated is much larger than 52, or if
the generator doesn't produce duplicates in short runs, you don't need
to do that. I use a BBS generator if use the generated values directly
there would be no chance of duplicates. But the generator is mapped to
double precision values between 0.0 and 1.0.

(Just a side note, I am a bit surprised about your use of BBS. There
is a rigorous proof that it contains at least one cryptographically
secure bit. It is also possible to use a few more bits under certain
circumstances. Do you have any security proof for the large number of
bits that you are using?)
The rounding to double
precision allows for the possibility of generating two identical values
in a short sequence. But if you were to check every list for duplicate
values--easy after the sorting--you might discard one sequence in every
2^48 or so.
Imagine that my RNG is me throwing a six sided die.
Are you really trying to assert that assigning the numbers
1 through 6 randomly to each card and then sorting will
produce an unbiased shuffle?

No. But if you throw the die three times per card (giving 216 possible
values) and eliminate duplicates by either method above, it won't matter
how loaded your die is.

If I understand you correctly the first method that you propose
is to throw away a random value, if you had already drawn the
same value earlier in the sequence. This method is biased.

Imagine that the dice are heavily loaded in such a way that 1 occurs
very frequently. Then obviously they is a high probabilty that the
random number that is assigned to the first card is 3. But this means
that the probability that this card will also be the first card that
is dealt is at least as high as the probability that you throw 3 on
the first throw.

The root of this problem is of course, that, when you eliminate
duplicates one at a time, the random numbers in your output are no
longer independent.

greetings,

Ernst Lippe

Nov 13 '05 #28

Matthew Montchalin

On 10 Sep 2003, Gordon Burditt wrote:
|>Your compiler's rand may have weaknesses. The question is, when properly
|>seeded, can a would-be attacker exploit the weaknesses without doing an
|>exhaustive search. If an exhaustive search is needed, then no other random
|>number generator with the same size seed would be any better.
|
|And 32 bits is nowhere near enough to shuffle a deck of 52 cards.
|You need at least 226 bits (log base 2(52!) = 225.581 bits) of
|random seed for a random shuffle.

Isn't it even worse if it used for fortune telling, and the orientation
of the cards (face up, yet is it 'up' or 'down' for the face cards)?

If the random function has to entertain the possibility that face cards
may be turned up, and yet are upside-down, would that make it 226! bits?

Nov 13 '05 #29

Michael Amling

Matthew Montchalin wrote:

On 10 Sep 2003, Gordon Burditt wrote:
|>Your compiler's rand may have weaknesses. The question is, when properly
|>seeded, can a would-be attacker exploit the weaknesses without doing an
|>exhaustive search. If an exhaustive search is needed, then no other random
|>number generator with the same size seed would be any better.
|
|And 32 bits is nowhere near enough to shuffle a deck of 52 cards.
|You need at least 226 bits (log base 2(52!) = 225.581 bits) of
|random seed for a random shuffle.

Isn't it even worse if it used for fortune telling, and the orientation
of the cards (face up, yet is it 'up' or 'down' for the face cards)?

If the random function has to entertain the possibility that face cards
may be turned up, and yet are upside-down, would that make it 226! bits?

More like 52! * 2**12, requiring 226+12 bits of randomness.

--Mike Amling

Nov 13 '05 #30

Ernst Lippe wrote:

(Just a side note, I am a bit surprised about your use of BBS. There
is a rigorous proof that it contains at least one cryptographically
secure bit. It is also possible to use a few more bits under certain
circumstances. Do you have any security proof for the large number of
bits that you are using?)
I am not using a large number of bits. If you generate 52 values and
sort the cards based on those values you use just under 6 bits per value
on average. The BBS generator is secure if you use less than log2 N
values, so a 64-bit N would suffice for cryptographic level security.

But there is a bit of procedural knowledge there which needs to be taken
into account. If you know N but not the factors of N, and the final
state of the generator, such knowledge is relatively worthless.
However, if you know N and the initial state of the generator, it is
easy to "predict" the sequence. In the game setting proposed, this
means that you have to be very careful both that the initial state
cannot be predicted, and the initial state cannot be forced by external
agents to a given value.

Of course, for any PRNG, allowing an adversary to force the intial
generator state or to know the intial generator state is bad.
The rounding to double precision allows for the possibility of
generating two identical values in a short sequence. But if you
were to check every list for duplicate values--easy after the
sorting--you might discard one sequence in every 2^48 or so.

Imagine that my RNG is me throwing a six sided die. Are you
really trying to assert that assigning the numbers 1 through 6
randomly to each card and then sorting will produce an unbiased
shuffle?

No. But if you throw the die three times per card (giving 216
possible values) and eliminate duplicates by either method above,
it won't matter how loaded your die is.

If I understand you correctly the first method that you propose is to
throw away a random value, if you had already drawn the same value
earlier in the sequence. This method is biased.
Two corrections. I did not propose to throw away duplicates. I just
showed that whatever you do with duplicates is irrelevant with the
generator I proposed. If you generate a pair of duplicates once every
few billion years any procedure that you use to disambiguate the pair is
reasonable. Using a sorting algorithm that 1) is known to produce some
order rapidly, and 2) it not "stable," in other words the algorithm may
reverse the order of identical values, seems acceptable to me. You may
feel differently.

But the real problem and it is a real problem is the amount of entropy
in your generator state. If your generator has less than 52! potential
initial states, there are deck sequences you won't produce. If the
number of initial states is not evenly divisible by 52!, then some deck
permutations will appear more often than others. What you will accept
in this case is again a judgement matter. For example, when generating
bridge hands, I feel that as long as the generator state--and initial
state--contain significantly more than log2((13!)^4/(52!)) bits of
entropy, I can sleep at nights. (It may not be possible to generate all
deck permutations, but you will probably be able to generate all
possible deals.)

Imagine that the dice are heavily loaded in such a way that 1 occurs
very frequently. Then obviously they is a high probabilty that the
random number that is assigned to the first card is 3. But this means
that the probability that this card will also be the first card that
is dealt is at least as high as the probability that you throw 3 on
the first throw.

The root of this problem is of course, that, when you eliminate
duplicates one at a time, the random numbers in your output are no
longer independent.

So if you discard duplicates, and the generator is biased, the resulting
sequence will be biased. Agreed. But as discussed above, if the number
of potential generator states is large enough the bias introduced is
substantially reduced. At the limit, as discussed above, you will be
able to play from now to the end of the universe, and still not have a
detectable amount of bias.

The selection based on selecting one from the remaining cards and
repeating on the other hand, does result in serious detectable bias if
the underlying generator is even slightly biased. This robustness of
the sorting approach is what I was addressing. Use your weighted dice
then check how many spades are in the first five cards. A great way to
detect bias, not a good way to generate random permutations.
--
Robert I. Eachus

Ryan gunned down the last of his third white wine and told himself it
would all be over in a few minutes. One thing he'd learned from
Operation Beatrix: This field work wasn't for him. --from Red Rabbit by
Tom Clancy.

Nov 13 '05 #31

Matthew Montchalin

Nov 13 '05 #32

On Tue, 23 Sep 2003 21:56:25 +0000, Robert I. Eachus wrote:

Ernst Lippe wrote:
> (Just a side note, I am a bit surprised about your use of BBS. There
> is a rigorous proof that it contains at least one cryptographically
> secure bit. It is also possible to use a few more bits under certain
> circumstances. Do you have any security proof for the large number of
> bits that you are using?)
I am not using a large number of bits. If you generate 52 values and
sort the cards based on those values you use just under 6 bits per value
on average.

I assume that you compute your floating point random number by
dividing the internal state x_i by the modulus N. But when you use the
result for sorting, you effectively use the most significant bits of
x_i. I am only aware of proofs that show the cryptographical security
of the least significant bits. I would be surprised if you could give
a proof that your procedure is cryptographically secure. After all
when x_i < sqrt(N) it is certain that x_(i+1) > x_i. So, it would
seem that successive outputs are not independent.
The BBS generator is secure if you use less than log2 N
values, so a 64-bit N would suffice for cryptographic level security.
According to the HAC you can use c lg ln N least significant bits, but
unfortunately there are secure ranges for c known.

Also a 64-bit N should be easy to factorize, so I can't see why you
would expect any cryptographic security with such low N anyhow.

>> No. But if you throw the die three times per card (giving 216
>> possible values) and eliminate duplicates by either method above,
>> it won't matter how loaded your die is.

> If I understand you correctly the first method that you propose is to
> throw away a random value, if you had already drawn the same value
> earlier in the sequence. This method is biased.

Two corrections. I did not propose to throw away duplicates.

Then please explain what "either method" for eliminating duplicates
is. I assumed that one method was to throw away the entire sequence
when it contains any duplicates and that the other was to throw
away any duplicates as soon as they were encountered.
But the real problem and it is a real problem is the amount of entropy
in your generator state. If your generator has less than 52! potential
initial states, there are deck sequences you won't produce. If the
number of initial states is not evenly divisible by 52!, then some deck
permutations will appear more often than others. What you will accept
in this case is again a judgement matter. For example, when generating
bridge hands, I feel that as long as the generator state--and initial
state--contain significantly more than log2((13!)^4/(52!)) bits of
entropy, I can sleep at nights. (It may not be possible to generate all
deck permutations, but you will probably be able to generate all
possible deals.) But how can you then prove that your method will uniformly generate all
possible deals?
The selection based on selecting one from the remaining cards and
repeating on the other hand, does result in serious detectable bias if
the underlying generator is even slightly biased.

But the construction that I proposed should give you an unbiased RNG
whenever the underlying block cipher is cryptographically secure in
CTR mode.

Summarizing my main objections against your approach:
* Your proofs are not very rigorous
* BBS is a very slow random number generator, so I
would expect that your system does not have a very
high performance.

greetings,

Ernst Lippe

Nov 13 '05 #33

Ernst Lippe wrote:

(An excellent criticsm by the way...)

I assume that you compute your floating point random number by
dividing the internal state x_i by the modulus N. But when you use the
result for sorting, you effectively use the most significant bits of
x_i. I am only aware of proofs that show the cryptographical security
of the least significant bits. I would be surprised if you could give
a proof that your procedure is cryptographically secure. After all
when x_i < sqrt(N) it is certain that x_(i+1) > x_i. So, it would
seem that successive outputs are not independent.
First, the proof actually shows that any n bits (n <= log2 N) are
cryptographically as secure as anything that depends on not being able
to factor N. But there is a neat little lemma in here that you have to
see. If I know N and the state of the generator when it generated x_i,
I can, of course, generate x_i+1, x_i+2 and so on. But I can't generate
x_i-1 without the factors of N, or more formally if I can even guess one
bit of x_i-1 with probability 0.5 + epsilon without the factors, I can
use that fact to find the factors. So if you know x_i to be 42, of
course (for any reasonable N) you know that x_i+1 is 1764. But what is
x_i-1? You don't know. (If the state is 49 or any perfect square over
the integers, you do know the previous state. But let's get back to my
algorithm...)

If I assign the values to be sorted to the cards by taking x_i, x_i+1,
x_i+2... it looks as if there may be a problem. But what if I assign
the values x_i-1, x_i-2, x_i-3,... to the deck in reverse order, and
then sort? Now it is clear that even given N and x_i, you are better
off trying to factor N (assuming it is larger than 64 bits) than to
guess at a starting value and generate a sequence trying to find one
that ends at x_i, or trying to work out what the values must have been
from the ordering of the deck.

But the two deck orders are identical. All that has changed is the
value of i in x_i. I call this the symmetric property. As long as it
holds for the way you use the x_i from the BBS algorithm, you maintain
the other properties that BB&S provides.

The BBS generator is secure if you use less than log2 N
values, so a 64-bit N would suffice for cryptographic level security.

Also a 64-bit N should be easy to factorize, so I can't see why you
would expect any cryptographic security with such low N anyhow.
I don't. My point was that above a minimal (and quite small) value of N
the number of bits I use will be less than lg2 N.
According to the HAC you can use c lg ln N least significant bits, but
unfortunately there are secure ranges for c known.
I don't understand this statement. I think that you meant to say that
unfortunately there are no secure ranges for c known. If so I agree.
But this argument is based on factoring N. Unless someone comes up with
a proof that N /= NP or that factoring is in P, we will sit in that
boat. And I expect that situation to last for a while.
Then please explain what "either method" for eliminating duplicates
is. I assumed that one method was to throw away the entire sequence
when it contains any duplicates and that the other was to throw
away any duplicates as soon as they were encountered.
Again, my actual method is to ignore duplicates, and use a non-stable
sort algorithm. Even a stable sort so that the first duplicate always
came before the second would not introduce significant bias. But a
non-stable sort can be chosen where the order of the duplicates depends
on the final order of the other cards.
But how can you then prove that your method will uniformly generate all
possible deals?
I can't. But I can prove that it is not possible to detect that fact
assuming less than e deals are examined, and e usually turns out to be
ridiculously large. Let me give a hypothetical approach to "cheating"
given this approach to dealing cards, by someone who knows the algorithm
and N:

1. Generate all possible starting seeds and count how often each
deal shows up.
2. Choose a betting scheme based on this information.

Without quantum computers, that is going to take some ridiculously large
number times the age of the universe, and storing the counts will take
more matter than is in the universe without entangling millions of bits
of data into each electron.

But there is a problem here! If the (hidden) generator that creates the
decks is never reset, the actual generator will only generate 1/4, 1/8,
or 1/16 of those deck orders. How do you know which fraction that is?
Factor N, or get a value from the actual generator, and use it to
generate your deck order map.

Again, from a rigorous mathematical standpoint, there are problems.
From a rigorous statistical standpoint, those problems don't exist. (If
you have a biased coin, but cannot detect whether the bias is to heads
or tails by flipping it a trillion times, do you care? Especially since
the coin will wear away to nothing before you get that far.)
But the construction that I proposed should give you an unbiased RNG
whenever the underlying block cipher is cryptographically secure in
CTR mode.
I understand your approach. I can send you programs that run your
approach and my approach with small decks, then run Chi-Square numbers
on long runs. I then do K-S tests on the results. When I have done
this, your approach fails, even if the underlying generator is
cryptographically secure. Now:

I cannot create an ironclad proof that your method should work. The
problem is the independence of the 52 (or whatever) generators, assuming
they all use the same underlying PRNG algorithm. The bias that I detect
toward less random distributions of Chi-Square is very small. But why
bother. My method doesn't have the independence problem, and the data
looks good.
Summarizing my main objections against your approach:
* Your proofs are not very rigorous
They are. They are not mathematically rigorous, they are statistically
rigorous. As I said, the proof approach I take is demonstrating that
any bias or pattern cannot be detected, given certain very broad
assumptions. (Factoring is hard, no generator is run for min(p,q)
iterations, etc.)
* BBS is a very slow random number generator, so I would expect that your
system does not have a very high performance.

How many times have I heard that said based on slow implementations of
BB&S? The trick is to choose N correctly, which of course takes work,
and then don't square x_i then do the mod operation, compute x_i^2 mod N
in one operation. If you do both, the generator itself can be very
fast. For example, look for an N such that N is 2^j - k for relatively
large j and small k. This makes the modulus calculation of x_i^2 mod N
easy. There are other tricks. For example, X^2 mod N = ((x-N)* x) mod
N. The quantity (x-N)*x < 1/4 * N^2. Not a big deal you would think,
but it allows you to choose N > 2*k for some convenient k say 512, and
not need to ever store numbers greater than 2^512.

Oops, forgot the biggest trick of all. Represent x as a n digit number
base M where M = 2^k for some useful k like 64. Let's call these digits
x_j to distinguish from the x_i above. The partial products are of the
form x_j * x_j' * M^i for some i. If M^i is greater than N replace it
with M^i mod N. (And remember that due to the way we chose N, M^i mod N
will have lots of zero digits.) Accumulate all the partials in the ith
place, multiply by M^i mod N, and now add the partials together mod N.
If N is chosen correctly as above, the mod operations can be performed
by splitting x into parts A * 2^j and B, then computing B + A * k, then
if the result is greater than N, subtract N. For the additions, you
only need to compare to N and if necessary subtract. Sounds complex,
but it simplifies the arithmetic greatly.

Yeah, it trades days of computer time finding nice values of N for a
factor of 100 or so speedup in evaluating X*X mod N. But it is worth
it. (At least as useful as letting my machine work on an RSA challenge,
fold proteins, or looking for phone calls from ET.)
--
Robert I. Eachus

Ryan gunned down the last of his third white wine and told himself it
would all be over in a few minutes. One thing he'd learned from
Operation Beatrix: This field work wasn't for him. --from Red Rabbit by
Tom Clancy.

Nov 13 '05 #34

On Thu, 25 Sep 2003 03:23:16 +0000, Robert I. Eachus wrote:

Ernst Lippe wrote:

(An excellent criticsm by the way...)
I assume that you compute your floating point random number by
dividing the internal state x_i by the modulus N. But when you use the
result for sorting, you effectively use the most significant bits of
x_i. I am only aware of proofs that show the cryptographical security
of the least significant bits. I would be surprised if you could give
a proof that your procedure is cryptographically secure. After all
when x_i < sqrt(N) it is certain that x_(i+1) > x_i. So, it would
seem that successive outputs are not independent.
First, the proof actually shows that any n bits (n <= log2 N) are
cryptographically as secure as anything that depends on not being able
to factor N.

I think that you are using the wrong formula here, both AC and
HAC state that n <= log2 k, where k is the bit length of N,
in other words n <= log2 log2 N.

Again, can you give a reference that it is safe to use the
most significant bits, because all proofs that I have seen
only refer to the least significant bits.

Even when you can find such a proof, there is still the problem that
you don't use a fixed number of bits. When you generate two numbers
that have the same value for the highest bits, the outcome of the
comparison is determined by the highest bit that is different in
both. But this bit position can be outside the set of secure
bits. Using the birthday limit, when you have s secure bits you can
expect to use an insecure bit in your comparisons when you generate
more than 2^(s/2) random numbers.

But when you adapt your sorting algorithm in such a way that it only
uses secure bits, you need a rather large amount of secure bits. When
you want to assign a random number to each card and want to have 50%
chance that there are no collisions you need at least 52^2 different
possible numbers (again the birthday limit), in other words each
number must be at least 12 bits. So for one deal you need 52*12=624
secure random bits. Now this is not very efficient because there are
only 52! different deals and log2 52! is somewhat less than 226. The
method I proposed can be implemented in such a way that it does not
require much more than 226 secure bits (it is probably not worth to do
it, because I also proposed a very efficient random number generator,
but with a much slower random number generator such as BBS it might be
necessary.)

> According to the HAC you can use c lg ln N least significant bits, but
> unfortunately there are secure ranges for c known.

I don't understand this statement. I think that you meant to say that
unfortunately there are no secure ranges for c known. If so I agree.

Yes, that's what I meant. I assume that the precise value of c depends
on the computational complexity of factoring which is still unknown.

But how can you then prove that your method will uniformly generate all
possible deals?

I can't.

But with my method, it is obvious that it will uniformly generate all deals
when the random generators are uniform and independent.

But the construction that I proposed should give you an unbiased RNG
whenever the underlying block cipher is cryptographically secure in
CTR mode.

I understand your approach. I can send you programs that run your
approach and my approach with small decks, then run Chi-Square numbers
on long runs. I then do K-S tests on the results. When I have done
this, your approach fails, even if the underlying generator is
cryptographically secure.

Please explain this. If you can do this, you can break AES,
because you have just described a method to distinguish AES
from a random permutation.

I cannot create an ironclad proof that your method should work. The
problem is the independence of the 52 (or whatever) generators, assuming
they all use the same underlying PRNG algorithm. The bias that I detect
toward less random distributions of Chi-Square is very small. But why
bother. My method doesn't have the independence problem, and the data
looks good.

Again if you can find any form of dependence between the outputs
you have broken the underlying block cipher.

Summarizing my main objections against your approach:
* Your proofs are not very rigorous

They are. They are not mathematically rigorous, they are statistically
rigorous.

So, statistics is no longer part of (applied) mathematics?

* BBS is a very slow random number generator, so I would expect that your
> system does not have a very high performance.

How many times have I heard that said based on slow implementations of
BB&S? The trick is to choose N correctly, which of course takes work,
and then don't square x_i then do the mod operation, compute x_i^2 mod N
in one operation. If you do both, the generator itself can be very
fast. For example, look for an N such that N is 2^j - k for relatively
large j and small k. This makes the modulus calculation of x_i^2 mod N
easy. There are other tricks. For example, X^2 mod N = ((x-N)* x) mod
N. The quantity (x-N)*x < 1/4 * N^2. Not a big deal you would think,
but it allows you to choose N > 2*k for some convenient k say 512, and
not need to ever store numbers greater than 2^512.

Oops, forgot the biggest trick of all. Represent x as a n digit number
base M where M = 2^k for some useful k like 64. Let's call these digits
x_j to distinguish from the x_i above. The partial products are of the
form x_j * x_j' * M^i for some i. If M^i is greater than N replace it
with M^i mod N. (And remember that due to the way we chose N, M^i mod N
will have lots of zero digits.) Accumulate all the partials in the ith
place, multiply by M^i mod N, and now add the partials together mod N.
If N is chosen correctly as above, the mod operations can be performed
by splitting x into parts A * 2^j and B, then computing B + A * k, then
if the result is greater than N, subtract N. For the additions, you
only need to compare to N and if necessary subtract. Sounds complex,
but it simplifies the arithmetic greatly.

Yeah, it trades days of computer time finding nice values of N for a
factor of 100 or so speedup in evaluating X*X mod N. But it is worth
it.

It is certainly useful to have fast BBS implementations, but I am
still not convinced that you can get a performance that is anywhere
near that of a block cipher in CTR mode. Can you give any timings?

greetings,

Ernst Lippe

Nov 13 '05 #35

Ernst Lippe wrote:

I think that you are using the wrong formula here, both AC and
HAC state that n <= log2 k, where k is the bit length of N,
in other words n <= log2 log2 N.
If so, that is a result I haven't seen. I assume AC is Applied
Cryptography, and HAC is Handbook of Applied Cryptography.
Again, can you give a reference that it is safe to use the
most significant bits, because all proofs that I have seen
only refer to the least significant bits.
Then reread the original Blum, Blum and Shub paper. It shows that the
position you take the bits from doesn't matter--and why.
Even when you can find such a proof, there is still the problem that
you don't use a fixed number of bits. When you generate two numbers
that have the same value for the highest bits, the outcome of the
comparison is determined by the highest bit that is different in
both. But this bit position can be outside the set of secure
bits. Using the birthday limit, when you have s secure bits you can
expect to use an insecure bit in your comparisons when you generate
more than 2^(s/2) random numbers.
I could equally argue that the ordering of any two cards depends on ONE
bit. And if you look at the BBS paper, you would see that for the two
card case I am exactly right. But what is the right number to use for
pairwise orderings in a deck? Rather than use 52*51/2 I chose to use 52
per card which is an overstatement. I can see that your arguement is
that the randomness is not equally distributed among the cards. I
agree. But I think that argues for a maximum entropy resolution per
card of 51 for the first card drawn. The next question is whether that
means 51 bits or ln2 51 bits.

Let me propose a thought experiment. Break all the cards into two
groups by looking at the first (or if you prefer, last) bit. Now look
at the next bit and divide the cards again. Continue dividing each pile
in this manner. But if a bit doesn't divide a subset, don't use that
bit. ;-) In theory there will be cases where you use 51 bits from a
single number. But actually you don't. Assign all the entropy used to
the numbers from the smaller of the two groups. Worst case, your first
division splits the cards into two piles one with 51 cards one with 1.
I argue that in that case you have used ln2(51) bits of entropy. And as
you can see, that is the worst case.

But the construction that I proposed should give you an unbiased RNG
whenever the underlying block cipher is cryptographically secure in
CTR mode.

I understand your approach. I can send you programs that run your
approach and my approach with small decks, then run Chi-Square numbers
on long runs. I then do K-S tests on the results. When I have done
this, your approach fails, even if the underlying generator is
cryptographically secure.

Please explain this. If you can do this, you can break AES,
because you have just described a method to distinguish AES
from a random permutation.

If so, I don't understand your method. I assumed you were talking about
creating 51 different generators generating numbers from 0 to 51, 0 to
50, 0 to 49, and so on. But if you take values from a single generator
and convert them in an unbiased manner into the numbers you need it
doesn't help.

Let me give you a concrete example. Take three cards, say the Aces of
Clubs, Diamonds, and Hearts in that order, and two dice, say one red and
one white. Hold the cards in that order in your hand, Ace of clubs on
the left. Roll the dice. If the Red die is a one or a two, put the Ace
of Clubs face down on the table. If the Red die is a three or four, put
the Ace of Diamonds down again face down. If the Red die is a five or
six, you guessed it, the Ace of Hearts. Now if the White die is even,
put the leftmost card on top of the face down card, if it is odd, put
the rightmost card down. Finally put the remaining card on top.

You have just taken three cards and arranged them in one of 3! = six
possible orders. If you have some four sided or eight sided dice, you
can add another die and extend this. A ten-sided die, and another
ordinary die on top of that will allow you to extend the experiment to
six cards. But ignore that for now.

The bottommost card in your randomized deck depends only on the Red die
right? But the other two cards--whatever they are--landed in positions
that depended on both dice. Now, we can assume the numbers on the two
dice were independent, so this is fine. But what if there is a
correlation between the dice? For the moment assume the dice are
magnetized so that when you look at one die, it is random, but the
results are not indepentent. What happens with the cards. The bottom
card is still random, but the other cards are biased.

Now go get those other dice. Do you see where this is going? Choosing
a "random" deck order this way entangles more and more information into
the topmost card. If there is no bias, AND all the values are
independent, the topmost card collects most of the bias and interactions.

So I actually use this particular test as a very sensitive sniffer to
see if a particular generator has any bias or lack of independence
between N variates, where N is the size of the deck.

But back to your comment about AES. What you say may be true about this
technique applied to an AES-based generator breaking AES. I think that
is correct, but it is late at night. However, I have tried this with
SHA-1, and rejecting randomness there doesn't invalidate DSA.

And that is why I rail so strongly against the approach you advocate.
From experience with testing poor to very good PRNGs, my technique
works even with very poor PRNGs, the technique where you select from an
ever diminishing deck concentrates any deviations from true randomness
in the last few cards. For example, take my algorithm and replace the
RNG with 1/2 rand(), or sin(10 * rand()), or whatever and it still gives
similarly random results. (But there are some LCG generators that fail
even with my algorithm. Which as far as I am concerned tells you just
how bad LCG generators are. ;-)
Again if you can find any form of dependence between the outputs
you have broken the underlying block cipher. So, statistics is no longer part of (applied) mathematics?
No, statistical rigor is different from mathematical rigor. In
statistics, if two distributions cannot be distinguished they are
identical. In math, two functions which always give the same result are
not necessarily identical, but they are equivalent.

It is a subtle distinction but it matters here. If I create a
notational universe limited in some ways, and in that universe a
particular distribution cannot be distinguished from a random
distribution, then in that universe, the distribution is random, even if
there are other universes where it is not random.

In this particular case, I use that freedom to define a universe which
looks suspiciously like ours. If the size of our universe increases by
a factor of 100 again soon, it won't bother me that much. (Inflation
theories.) But factoring in P would break everything.
It is certainly useful to have fast BBS implementations, but I am
still not convinced that you can get a performance that is anywhere
near that of a block cipher in CTR mode. Can you give any timings?

The right question to ask is how long will it take you to find a BBS
generator that is better than this block cipher? Give me some realistic
timings and we will see. (I think I can beat a software AES version
with about the same degree of cryptographic security. I'd have to think
about implementing things in hardware.)

But the real problem is that to get FAST versions of BBS takes copious
computer cycles to find useful values of p and q. (Picking a nice N
then factoring it will obviously take too long. ;-) You end up picking
the range you want N to be in, generating a p, then seeing if there is
a q that puts N in that range.

--
Robert I. Eachus

"Quality is the Buddha. Quality is scientific reality. Quality is the
goal of Art. It remains to work these concepts into a practical,
down-to-earth context, and for this there is nothing more practical or
down-to-earth than what I have been talking about all along...the repair
of an old motorcycle." -- from Zen and the Art of Motorcycle
Maintenance by Robert Pirsig

Nov 13 '05 #36